The Technical Journal for the Electronic Design Automation Community
www.edatechforum.com
Volume 6
Issue 4
September 2009
Embedded ESL/SystemC Digital/Analog Implementation Tested Component to System Verified RTL to Gates Design to Silicon INSIDE: Junot Diaz brings art to science Extending the power of UPF OS horses for embedded courses AMS design at the system level Getting the most out of USB 2.0
COMMON PLATFORM TECHNOLOGY Industry availability of real innovation in materials science, process technology and manufacturing for differentiated customer solutions.
Chartered Semiconductor Manufacturing, IBM and Samsung provide you the access to innovation you need for industry-changing 32/28nm high-k metal-gate (HKMG) technology with manufacturing alignment, ecosystem design enablement, and flexibility of support through Common Platform technology. Collaborating with some of the world’s premier IDMs to develop leading-edge technology as part of a joint development alliance, Chartered, IBM and Samsung provide access to this technology as well as qualified IP and robust ecosystem offerings to help you get to market faster, with less risk and more choice in your manufacturing options. Visit www.commonplatform.com today to find out how you can get your access to innovation. To find out more, visit us at these upcoming EDA TF locations: September 1 - Shanghai, China September 3 - Santa Clara, CA, USA September 4 - Tokyo, Japan October 8 - Boston, MA, USA
www.commonplatform.com
EDA Tech Forum September 2009
contents < COMMENTARY >
< TECH FORUM >
6
16
Start here
Embedded
Stepping up
Linux? Nucleus?... Or both?
Engineers must not forget that project management means tough choices.
8 Analysis
Reading the runes
The latest consumer electronics forecasts mix the good with the bad.
10 Interview
Engineering creativity
Pulitzer prize-winner Junot Diaz helps MIT students find their own voices.
12 Low Power
Extending UPF for incremental growth Now IEEE approved, the standard is adding new verification and abstraction capabilities.
Mentor Graphics
20 ESL/SystemC
Bringing a coherent system-level design flow to AMS The MathWorks
28 Verified RTL to gates
A unified, scalable SystemVerilog approach to chip and subsystem verification LSI
34 Digital/analog implementation
Implementing a unified computing architecture Netronome Systems
38 Design to silicon
System level DFM at 22nm EDA Tech Forum Volume 6, Issue 4 September 2009
EDA Tech Forum
44 Tested component to system
Pushing USB 2.0 to the limit Atmel and Micrium
50 Tested component to system
Ensuring reliability through design separation Altera
EDA Tech Forum Journal is a quarterly publication for the Electronic Design Automation community including design engineers, engineering managers, industry executives and academia. The journal provides an ongoing medium in which to discuss, debate and communicate the electronic design automation industry’s most pressing issues, challenges, methodologies, problem-solving techniques and trends. EDA Tech Forum Journal is distributed to a dedicated circulation of 50,000 subscribers.
EDA Tech Forum is a trademark of Mentor Graphics Corporation, and is owned and published by Mentor Graphics. Rights in contributed works remain the copyright of the respective authors. Rights in the compilation are the copyright of Mentor Graphics Corporation. Publication of information about third party products and services does not constitute Mentor Graphics’ approval, opinion, warranty, or endorsement thereof. Authors’ opinions are their own and may not reflect the opinion of Mentor Graphics Corporation.
3
4
team < EDITORIAL TEAM > On September 3, 2009, Breker Verification Systems will present information on Model Based Scenario Generation. Adnan Hamid, founder of Breker, will lead a Lunch and Learn discussion at the Tech Forum which is being held at the Santa Clara Convention Center in Santa Clara, CA. This informative discussion will entail information on how scenario models improve productivity by more than 2X, achieve coverage goals and reduce testcase redundancy. Some highlights from this discussion include information on how Model based Scenarios: -
Achieve 10X reduction in testbench code Provide 2X productivity improvement Enable faster simulations Reduce test sequence redundancy Visualize pre-simulation scenarios Annotate test coverage results onto visual models Reuse verification IP – both vertically and horizontally.
Sign up for this http://www.edatechforum.com
session
at
About Breker Verificaiton Systems Breker’s product, Trek™, the Scenario Modeling tool for Advanced Testbench Automation, provides functional verification engineers with an automated solution for generating input stimulus, checking output results and measuring scenario coverage. Trek is a proven technology that demonstrates a 10X reduction in test-bench development and a 3X improvement in simulation throughput, freeing up resources needed to meet today’s aggressive design goals. Architected to run in your current verification environment, this technology also provides powerful graphical visualization and analysis of your design’s verification space For more information about this leading edge technology, visit us at www.brekersystems.com or call 512-415-1199.
Editor-in-Chief Paul Dempsey +1 703 536 1609 pauld@rtcgroup.com
Managing Editor Marina Tringali +1 949 226 2020 marinat@rtcgroup.com
Copy Editor Rochelle Cohn
< CREATIVE TEAM > Creative Director Jason Van Dorn jasonv@rtcgroup.com
Art Director
Kirsten Wyatt kirstenw@rtcgroup.com
Graphic Designer Christopher Saucier chriss@rtcgroup.com
< EXECUTIVE MANAGEMENT TEAM > President
John Reardon johnr@rtcgroup.com
Vice President
Cindy Hickson cindyh@rtcgroup.com
Vice President of Finance Cindy Muir cindym@rtcgroup.com
Director of Corporate Marketing Aaron Foellmi aaronf@rtcgroup.com
< SALES TEAM > Advertising Manager Stacy Mannik +1 949 226 2024 stacym@rtcgroup.com
Advertising Manager Lauren Trudeau +1 949 226 2014 laurent@rtcgroup.com
Advertising Manager Shandi Ricciotti +1 949 573 7660 shandir@rtcgroup.com
Untitled-11 1
8/14/09 11:54:23 AM
intelligent, connected devices. Choose your architecture wisely. 15 billion connected devices by 2015.* How many will be yours? intel.com/embedded * Gantz, John. The Embedded Internet: Methodology and Findings, IDC, January 2009. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. 2009 Intel Corporation. All rights reserved.
Š
6
< COMMENTARY > START HERE
start here Stepping up During the CEO Panel at this year’s Design Automation Conference, the men leading the three largest EDA vendors stressed that their industry can do well in a slump because it both contributes to the ongoing battle against technological limits, and enables the delivery of ever greater efficiency. But another, parallel question raised by this year’s DAC and by today’s broader semiconductor environment is, “Just how much can EDA do for semiconductor vendors?” The point here is not to highlight the limits of design software, or even, as has been previously suggested, ask whether the vendors should limit their horizons to give clients room to differentiate products. Rather, we need to concentrate on the broad and— more pointedly—blunt issue of the fabbed, fabless or sub-contracted design manager. One of the great economic advances enabled by EDA recently has given ESL abstractions the capability to underpin more easily executable and financially quantifiable designs. Identify both your goals and your main problems early on and chances are you will get to market quicker and more profitably. Yet, even allowing that ESL has varied in its adoption rate across both geography and sector, the overall respin rate remains worryingly high. IBM fellow Carl Anderson gave an excellent but alarming speech at DAC during a session previewing the 22nm node (see page 38). There is a plethora of tools and the tightest constraints in terms of time and money, but a common reason for major project delays remains late stage changes. When a project manager must decide whether to stay the course or make concessions, he all too often strays too far down the latter course, incurring risks that history tells us usually do not pay off. For these circumstances, Anderson offers an interesting challenge: “innovation inside the box.” Tools can only do so much—the responsibility comes down to the manager and the team wielding them. What is needed is a combination of craftsman, creator, professional and military general. Anderson had a slide showing both Albert Einstein and George S. Patton, the point being not so much the comparison as the characteristics each represents for different aspects of the industry. The issue goes further. For example, we are on the cusp of a new age of systemsin-package (SiPs) technologies as an alternative to monolithic integration within systems-on-chip (SoCs). However, those who believe SiP is simply a poor man’s SoC are deluding themselves. One design company describes the comparison more as “entering the world of relative pain.” There are good reasons why TSMC has only just added SiP to its Reference Flow, and so far qualified only one EDA vendor, Cadence Design Systems, which itself remains cautious about the road ahead. And why— because, as you’ve guessed, the SoC vs. SiP decision is a lot tougher than we have been led to believe, and success will depend on the commitment design managers are prepared to show throughout a project’s life. Tools can and will do an awful lot, but now more than ever, Spiderman’s Uncle Ben was bang on the money: “With great power, comes great responsibility.” Paul Dempsey Editor-in-Chief
8
< COMMENTARY > ANALYSIS
Reading the runes We revisit some of the latest consumer electronics forecasts. Paul Dempsey reports. in housing (down by just over a third). When we first reviewed the consumer An important factor here may be an atelectronics market at the beginning titudinal shift among consumers. of the year, there were still hopes that “We’ve been saying for quite some growth could remain statistically flat time that many types of consumer elecdespite global economic woes. Howtronics have moved from being seen as ever, at the beginning of the summer, luxuries to essential purchases,” says the Consumer Electronics Association Stephen Baker, vice president, industry revised its forecast down from Januanalysis at the NPD Group, one of the ary’s -0.7% to -7.7%, implying total facleading retail research groups. “A lot of tory-gate sales of $165B. The CEA is also that is not about perception, but how now predicting that 2010 will see a more we are becoming a digital society. For modest rebound, in the 1-2% range. example, as we get closer to the fall, you This may appear to be at odds with see increased spending on laptops and second quarter financial results posted other products that students need for by companies such as Intel that firmly the new academic year.” exceeded Wall Street’s expectations. There was more evidence of this in Research firm iSuppli also recently esJune when Apple announced that the timated that Q2 consumer electronics iPhone 3G S upgrade shipped one milsales did rise sequentially on the belion units during its first three days on ginning of the year, by 4.2% to $71.1B, sale in the U.S. A high proportion of although that number is 11.3% down One million units in just three days buyers attributed their ‘had-to-have’ year-on-year. purchase decisions to productivity rathThe good news—with due deference er than fashion. Another almost ‘obligatory’ purchase in the to Pollyanna—may therefore be that the research is now U.S. has been a new television, after the switch-off of the more a reflection of a slump that is behind the high technolanalog signal finally went ahead on June 12. ogy sector, but which was more severe than anticipated. But There were concerns that extra federal subsidy for the even here, a devil may lurk in the details. converter box program would slow sales, but according to Speaking shortly before the release of his association’s rethe CEA this did not happen to any significant degree. Digivised forecast, CEA economist Shawn DuBravac said that tal displays represented 15% of sales in the first half of the he expected the recession to bottom out during August. year, and unit shipments are expected to rise by 8%. Strong However, the Digital Downtown event that saw this comcost-down pressures here will mean that full-year revenues ment was also marked by significant disagreement among fall in line with the broader market by 6% to $24B. However, analysts from the CEA and elsewhere as to whether the another factor here is that consumers are now moving onto subsequent recovery curve will be V-shaped (i.e., relatively secondary displays at smaller sizes for bedrooms and elsequick) or U-shaped (i.e., bumping along the floor for a while where in the home beyond the main lounge. This secondyet). The CEA’s revised number suggests that it is taking a ary market is holding up well despite the recession and also relatively conservative position between the two. helping to establish LCD as the main HDTV technology. If there is a more solid cause for optimism, it may well In the longer term, though, analysts and leading technollie in the relative strength of consumer electronics during ogy companies are looking to the mobile Internet device segthis downturn. The forecast decline in factory sales is the ment to help restore industry growth into 2010 and through first since the 2001 dot.bomb, and is substantially lower 2011. The smartphone is already a strong player and one than those seen in the automotive market (40% down) and
EDA Tech Forum September 2009
of the few product segments still expected to grow in revenue terms by the CEA this year, albeit by a modest 3% to $14B. However, the netbook market is still surging ahead with shipments set to rise by 85% to 8.5M and revenues set for $3.4B. One thing that netbooks have going for them, of course, is the public perception that in many cases they are a cheaper alternative to a traditional laptop, and therefore the recession can make them a more attractive buy. In that light, more and more alliances are looking to exploit this emerging space. Julyâ&#x20AC;&#x2122;s Design Automation Conference saw ARM and the Common Platform foundry alliance add EDA vendor Synopsys to their existing drive to harness high-K metal gate technology at the 28nm and 32nm nodes to SoCs for mobile applications. At the same event, Men-
Untitled-9 1
tor Graphics also rolled out the latest extensions to its Vista software tools that allow low-power architectural optimization at the system level through transaction-level modeling. When companies start talking about potential 80% savings on power budgets, again the mobile device market immediately springs to mind. Driving down board and components cost looks therefore like it will remain the prevailing theme this year, but innovation on the mobile front is also recognized as a necessity in the latest market data.
8/14/09 10:23:52 AM
9
10
< COMMENTARY > INTERVIEW
Engineering creativity Pulitzer-winning author. MIT professor. Junot Diaz shows that those who can do, also teach. To the world at large, Junot Diaz is well on the way to becoming a literary superstar. His novel The Brief Wondrous Life of Oscar Wao has already earned critical superlatives and the 2008 Pulitzer Prize for Literature. And deservedly so. It is a fabulous, accessible book, expressed in language that—when you buy a copy (and, trust me, you will)—immediately exposes the poverty of my attempts to praise it here. However, Oscar is not the main reason for our interest. Like most authors—even the most successful ones—Diaz has ‘the day job’, and in this case it is as the Rudge and Nancy Allen Professor of Creative Writing at the Massachusetts Institute of Technology (MIT). Yes, he spends a large part of his time teaching scientists and engineers who are, in his words “only visiting the humanities.” What Diaz is trying to achieve goes beyond the ‘challenge’ traditionally associated with science students and the written word. There is a lot of talk today about turning engineers into bloggers, smoothing communication within multinational teams, and the profession’s need to explain itself to the outside world. Diaz aims to stimulate their creativity. While that might sound less practical, when I mentioned this interview to a senior Silicon Valley executive, his eyes lit up. In his view, the one thing that university courses often fail to address is students’ more imaginative capabilities. The industry does need team players and ascetically analytical thinkers, but given the obstacles it faces today, it desperately requires more newcomers who can make innovative leaps and deliver original ideas. Diaz, for his part, does not approach his engineering majors from the perspective that they are all that much different from those focused on the humanities. At heart, it is about potential and engagement, not stereotypes. “Yes, they’re brilliant and yes, they’ve sacrificed a lot of social time to their studies, and yes, they’re intense. But at
the arts level, they seem a lot like my other students,” he says. “I think there’s a tendency at a place like MIT to focus on what’s weird about the kids, but really what amazes me is how alike they are to their non-science peers. Some are moved deeply by the arts and wish to exercise their passion for it; many are not.” At the same time, he has noticed that science must, by its nature, place some emphasis on a very factual, very dry discourse and a hardcore empiricism. That is not a bad thing, rather the reflection of the demands of a different discipline. “But it is in a class like mine where students are taught that often the story is the digression or the thing not said—that real stories are not about Occam’s Razor but something far more beautiful, messy, and I would argue, elegant,” he says. There is another side to this dialogue— what the students bring to the classes as well. On one level, there is an inherently international side to MIT, given its reputation as a global center of excellence. “I get to read stories from all over the world. What a wonderful job to have, to be given all these limited wonderful glimpses of our planet,” says Diaz. “Having a student from Japan use her own background to provide constructive criticism about describing family pressures to a young person from Pakistan is something to cherish.” Multi-culturalism is something that strikes a chord with Diaz for personal reasons. He is himself an immigrant from the Dominican Republic, and his experiences as part of that diaspora, that country’s often bloody national history and his own youth in New Jersey all deeply inform Oscar Wao and his short stories. “Many of my students are immigrants, so I know some of the silences that they are wrestling with,” Diaz says. “It makes me sympathetic but also willing to challenge them to confront some of the stuff that we often would rather look away from: the agonies of assimilation, the virulence of
EDA Tech Forum September 2009
Non-arts majors are much more willing to question the received wisdom. some of our internalized racial myths, and things like that.” However, those students with a science background also inspire him. “In every class, I see at least one student who thinks the arts are ridiculous frippery, become, by the end of term, a fierce believer in the power of the arts to describe and explode our realities as human beings,” says Diaz. “Why I love non-arts majors is that they are much more willing to question the received wisdom of, say, creative writing, and that has opened up entirely new lines of inquiry. My MIT students want you to give them two, maybe three different definitions for character. And that’s cool.” Diaz’s own taste in authors puts something else in the mix here. The fantasies of Tolkien prove a frequent touchstone in his novel, although he seems even closer to a group of other English science-fiction writers who—arguably taking their cue from the genre’s father H.G. Wells—explore nakedly apocalyptic themes, especially John Christopher. His prescience is indeed seeing him ‘rediscovered’ right now in the UK, for works such as The Death of Grass (a.k.a. No Blade of Grass), which foresees an ecological catastrophe. “Christopher had a stripped-down economic style and often his stories were dominated by anti-heroes— in other words, ordinary human beings,” says Diaz. “He may not be well known, but that changes my opinion not a jot. Like [John] Wyndham, like [J.G.] Ballard, Christopher deploys the apocalyptic mode to critique both our civilization and our deeper hidden selves, but I don’t think anyone comes close to his fearless, ferocious vision of human weakness and of the terrible world we inhabit.” With opinions like that, MIT is obviously an excellent place to stir up the debate. “I often find myself defending fantasy to students who themselves have to defend science-fiction to some
of their other instructors,” says Diaz. “That’s why we’re in class, to explode these genres, to explore what makes up their power and why some people are immune to their charms.” So, it is a pleasure rather than a challenge for this author to bring arts and literature to those following what we are often led to believe is a very separate academic and intellectual path. In fact, his parting thought is that the combination may in some respects be an improvement, more wellrounded even. “All I can say is this one thing about engineering and science students: they are so much more accustomed to working as a team, and that’s a pleasant relief from the often unbearable solipsistic individualism of more humanitiesoriented student bodies,” says Diaz.
The Brief Wondrous Life of Oscar Wao and Junot Diaz’s earlier collection of short stories, Drown, are both published in the USA by Riverhead Books.
11
12
< COMMENTARY > LOW POWER
Extending UPF for incremental growth The latest revision of the Unified Power Format offers more flexible design and verification abstraction, explain Erich Marschner and Yatin Trivedi. Accellera’s Unified Power Format (UPF) is in production use today, delivering the low-power system-on-chip (SoC) designs that are so much in demand. Building upon that success, IEEE Std 1801-2009 [UPF] offers additional features that address the challenges of low-power design and verification. These include more abstract specifications for power supplies, power states, and other elements of the power management architecture, all of which provide additional flexibility for specification of low-power intent. Support for the incremental development of power architectures extends the usefulness of the standard into IP-based, hierarchical methodologies where base power architectures may have been established independent of the IP components used in the overall design. This article reviews some of the new features in IEEE Std 1801-2009 that can make UPF-based flows even more effective. The Accellera standard, fully incorporated in IEEE Std 1801-2009, introduced the concept of a power architecture specification. This includes power domain definitions (i.e., groups of design elements with the same primary power supply), power distribution and switching networks, and power management elements such as isolation and level shifting cells that mediate the interfaces between power domains. The specification of state retention across power down/up cycles was also included. These capabilities encompass both the verification and the implementation semantics, so the same UPF specification can be used in both contexts. IEEE Std 1801-2009 builds on this concept to offer additional capability, flexibility and usability.
Supply sets UPF introduced the capability of defining supply nets, which represent the power and ground rails in a design. In some cases, such as during the development and integration of soft-IP, it can be more useful to define nets that must be used in pairs to connect design elements to both power and ground. In addition, there may be other supplies
involved, such as is entailed in support for bias rails. IEEE Std 1801-2009 introduced the concept of a ‘supply set’ so that it can represent such a collection of related nets. Erich Marschner
Syntax: create_supply_set <set_name> { -function { <func_name> [ <net_name> ] } }* [ -reference_gnd <supply_net_name> ] [ -update ]
Each of the supply nets in a supply set contributes some abstract function to it. The funcYatin tion names, specified via the ‘–function’ opTrivedi tion can have either predefined values (e.g., ‘power’, ‘ground’, ‘nwell’, ‘pwell’, ‘deepnwell’, or ‘deeppwell’), or user-defined names. The predefined values allow the user to specify standard power functions such as the primary power and ground nets or bias power nets. The user-defined names may be used as placeholders for later modification or as potential extensions for analysis tools. A supply set can be defined incrementally, as progressively more information becomes available. Initially, a supply set can be defined in terms of functions. Later, those functions can be mapped to specific supply nets. Separately, the supply set can be associated with a power domain; a power switch; or a retention, isolation, or level shifting strategy. For example, the initial definition of a supply set might be given as follows: create_supply_set SS1 -function {PWR} –function {GND}
Later, the abstract functions ‘PWR’ or ‘GND’ can be mapped to specific supply nets, using the ‘–update’ option: create_supply_set SS1 –function {PWR vdd} –update create_supply_set SS1 –function {GND vss} –update
The function names defined as part of a supply set also contribute to automating the connection of supply nets to
EDA Tech Forum September 2009
Source: Accellera
Simstate
Combinational Logic Sequential Logic
Corruption Semantics
NORMAL
Fully Functional
Fully Functional
None
CORRUPT_STATE_ON_ACTIVITY
Fully Functional
Non-Functional
Regs powered by the supply corrupted when any input to the reg is active
CORRUPT_STATE_ON_CHANGE
Fully Functional
Non-Functional
Regs powered by the supply corrupted when the value of the register is changed
CORRUPT_ON_ACTIVITY
Non-Functional
Non-Functional
Wires driven by logic and regs powered by the supply corrupted when any input to the logic is active
CORRUPT
Non-Functional
Non-Functional
Wires driven by logic and regs powered by the supply corrupted immediately on entering state
NOT_NORMAL
Deferred
Deferred
By default, same as CORRUPT. Tool may provide an override
FIGURE 1 Simstates and their simulation semantics pins of design elements. When a supply set is connected to a design element, each individual supply net in the set is automatically connected to a supply port based on the correspondence between supply net function and supply port attributes. For example, for a given supply set S, the supply net that is the ‘power’ net within S will be associated with any supply port that has the ‘pg_type’ attribute ‘primary_power’.
Power States The Accellera UPF standard introduced several commands (‘add_port_state’, ‘create_pst’, ‘add_pst_state’) to define the power states of the system in terms of the power states of the power domains of which it is comprised. These in turn are defined in terms of the states and voltage levels of supply nets. In IEEE Std 1801-2009, these capabilities are expanded to include an additional approach to defining power states: the ‘add_power_state’ command. This new command applies to both supply sets and power domains.
Syntax: add_power_state <object_name> -state <state_name>
{ [ -supply_expr <boolean_function> ]
[ -logic_expr <boolean_function> ]
[ -simstate <simstate>
[ -legal
[ -update ]
|
-illegal ]
}
Power states can be defined abstractly at first, using ‘–logic_expr’, the argument of which specifies the condition or conditions under which the object is in the given power state. The condition is given as a boolean expression that can refer to logic nets or supply nets. This is useful when power states are being defined before the power distribution network has been defined. Later, when the power distribution network is in place, the power state definition can be refined by specifying the state in terms of the states of supply nets Continued on next page
13
14
< COMMENTARY > LOW POWER
only, using ‘–supply_expr’. The supply expression definition is the golden specification of the power state. When both ‘–logic_expr’ and ‘–supply_expr’ are given, ‘–supply_ expr’ is treated as the primary definition, and ‘–logic_expr’ is then used as an assertion, to check that the more specific supply expression is indeed true only when the logic expression is also true. This helps ensure consistency as the power architecture is elaborated and further refined during development. Incremental refinement for power states can be quite powerful. In this case, it can be applied to the boolean function that defines the power state. For example, an initial power state specification might be given as follows: add_power_state P1 –state FULL_ON –logic_expr PWR_ON
The end result is that object P1 will be in state ‘FULL_ON’ whenever logic net ‘PWR_ON’ evaluates to ‘True’. A subsequent command can refine the logic expression to further restrict it, as in: add_power_state P1 –state FULL_ON –logic_expr RDY –update
which would refine the logic expression for state ‘FULL_ ON’ of P1 so that it is now ‘(PWR_ON && RDY)’. The same kind of refinement can be performed on the supply expression.
Incremental refinement applies to simstates as well. The simstate of a given power state can be defined initially as ‘NOT_NORMAL’, indicating that it is a non-operational state, without being more specific. A later UPF command can update the definition to specify that it is any one of the simstates other than ‘NORMAL’. By default, the ‘NOT_NORMAL’ simstate is treated as if it were the ‘CORRUPT’ simstate, a conservative interpretation. By refining a ‘NOT_NORMAL’ simstate, the corruption effects can be limited so that they only apply to activity on wires or state elements, or to changes of state elements.
Summary The Accellera UPF standard provided an excellent foundation for the development of low-power design and verification capabilities and is in production use with customers today. As a standard, it has enabled the interoperability of tools, which has in turn made possible a variety of lowpower design and verification flows to meet different needs. Building upon this foundation, IEEE Std 1801-2009, which completely incorporates the Accellera UPF standard, defines new abstractions that can provide more flexible capabilities to support the incremental development of power architectures. These new capabilities can help UPF users address the low-power design and verification challenges that are becoming increasingly significant in today’s product development cycles.
Simstates The Accellera UPF standard introduced simulation semantics for UPF specifications. In IEEE Std 1801-2009, the simulation behavior semantics have been further developed under the concept of ‘simstates’. For any given power state, the simstate specifies the level of operational capability supported by that power state, in terms of abstractions suitable for digital simulation. Several levels of simstate are defined. ‘NORMAL’ represents normal operational status, with sufficient power available to enable full and complete operational capabilities with characteristic timing. ‘CORRUPT’ represents non-operational status, in which power is either off or so low that normal operation is not supported at all, and both nets and registers are in an indeterminate state. In between these two extremes are three other simstates: ‘CORRUPT_ ON_ACTIVITY’, ‘CORRUPT_STATE_ON_ACTIVITY’ and ‘CORRUPT_STATE_ON_CHANGE’. These represent intermediate levels of ability to maintain state despite lower than normal power. Simstates are defined for power states of supply sets. As the power state of a supply set changes during simulation, the corresponding simstate is applied to the design elements, retention registers, isolation cells, or level shifters to which the supply set is connected. The simulation semantics of the various simstates are shown in Figure 1 (p. 13).
Erich Marschner and Yatin Trivedi are members of the IEEE 1801 Working Group.
320,000,000 MILES, 380,000 SIMULATIONS AND ZERO TEST FLIGHTS LATER.
THAT’S MODEL-BASED DESIGN.
After simulating the final descent of the Mars Rovers under thousands of atmospheric disturbances, the engineering team developed and verified a fully redundant retro firing system to ensure a safe touchdown. The result—two successful autonomous landings that went exactly as simulated. To learn more, go to mathworks.com/mbd
©2005 The MathWorks, Inc.
Accelerating the pace of engineering and science
16
< TECH FORUM > EMBEDDED
Linux? Nucleus? …Or both? Colin Walls, Mentor Graphics Colin Walls is a member of the marketing team of the Mentor Graphics Embedded Systems Division and has more than 25 years of experience in the electronics industry. He is a frequent presenter at conferences and seminars, and the author of numerous technical articles and two books on embedded software.
Introduction Recent years have seen a lot of publicity and enthusiasm around implementations of Linux on embedded systems. It seems that another mobile device manufacturer announces its support for the Linux general purpose OS (GPOS) every week. At first, many developers viewed Linux as an outright competitor to established and conventional real-time OSs (RTOSs). Over time, though, as the various options have been tried out in the real world, a new reality has dawned. Each OS has its own strengths and weaknesses. No one OS fits all. To illustrate this new reality, this article takes a closer look at the differences between a commercial RTOS (Nucleus) and a GPOS (Linux)—and considers how the two OSs might even work together.
Embedded vs. desktop The phrase ‘operating system’ causes most people to think in terms of the controlling software on a desktop computer (e.g., Windows), so we need to differentiate between the programming environments for a PC and those for an embedded system. Figure 1 highlights the four key differences. It is interesting to note that as new OSs are announced (such as the Chrome OS for netbooks), the differences between a desktop computer and an embedded system become even less obvious. Further, more complex embedded systems now have large amounts of memory and more powerful CPUs; some often include a sophisticated graphical user interface and require the dynamic loading of applications. While all these resources appear to facilitate the provision of ever more sophisticated applications, they are at odds with demands for lower power consumption and cost control. Obviously OS selection is not a simple matter. It is driven by a complex mix of requirements, which are intrinsic in the design of a sophisticated device. The ideal choice is always a combination of technical and commercial factors.
Technical factors for OS selection From a technical perspective, the selection criteria for an OS revolve around three areas: memory usage, performance, and facilities offered by the OS. Memory All embedded systems face some kind of memory limitation. Because memory is more affordable nowadays, this constraint may not be too much of a problem. However, there are some situations where keeping the memory usage to a minimum is desired; this is particularly true with handheld devices. There are two motivations for this: cost—every cent counts in most designs; and power consumption—battery life is critical, and more memory consumes more power. For memory-constrained applications, it is clearly desirable to minimize the OS memory footprint. The key to doing this is scalability. All modern RTOS products are scalable, which means that only the OS facilities (i.e., the application program interface (API) call service code) used by the application are included in the memory image. For certain OSs, this only applies to the kernel. Of course, it is highly desirable for scalability to apply to all the OS components such as networking, file system, the user interface and so on. Performance The performance of a device on an embedded application is a function of three factors: application code efficiency, CPU power and OS efficiency. Given that the application code performance is fixed, a more efficient OS may enable a lower power CPU to be used (which reduces power consumption and cost), or may allow the clock speed of the existing CPU to be reduced (which can drastically reduce power consumption). This is a critical metric if the last ounce of performance must be extracted from a given hardware design. Facilities Ultimately, an OS is a set of facilities required in order to implement an application. So, the availability of those facilities is an obvious factor in the ultimate selection. The API is important, as it facilitates easy porting of code and leverages available expertise. Most RTOSs have a proprietary API, but POSIX is generally available as an option. POSIX is standard with Linux.
EDA Tech Forum September 2009
Until recently, operating system (OS) specification for embedded systems has been seen largely as an ‘either/ or’ exercise. Similarly, OSs that have their foundations in the embedded market and those that have grown out of desktop computers have been seen as competing rather than complementary technologies. Cost and performance criteria within specifications will often lead to one technology winning out over another. But as hardware moves increasingly to multicore architectures, it is also the case that different types of OS can be specified within a single overall end-product, each specificallly handling those tasks it carries out most efficiently.
Source: Mentor Graphics
Desktop Computer
Embedded System
Runs different programs at different times depending upon the needs of the user.
Runs a single, dedicated application at all times.
Has large amounts of (RAM) memory and disk space, both can be readily and cheaply expanded if required.
Has sufficient memory, but no excess; adding more is difficult or impossible.
All PCs have an essentially identical hardware architecture and run identical system software. Software is written for speed.
Embedded systems are highly variable, with different CPUs, peripherals, operating systems, and design priorities.
Boot up time may be measured in minutes as the OS is loaded from disk and initialized.
Boot up time is almost instantaneous— measured in seconds.
This article compares one real-time OS (Nucleus from Mentor Graphics) of the type traditionally associated with embedded systems with a general-purpose OS (Linux) that is increasingly being used in that market to identify their various advantages, and also the emerging opportunities for their use alongside one another. Source: Mentor Graphics
General Purpose OS (Linux) User App
User App
User App
Most embedded OSs employ a thread model; they do not make use of a memory management unit (MMU). This facilitates maximum OS performance, as the task context switch can be fast, but it does not provide any inter-task protection. Higher-end OSs, like Linux, use the process model where the MMU is employed to completely insulate each task from all of the other tasks. This is achieved by providing a private memory space at the expense of context switch speed. An interesting compromise is for a threadbased OS to utilize a MMU to protect other tasks’ memory, without re-mapping address spaces. This provides significant intertask protection, without so much context switch time overhead. Of course, there is much more to an OS than the kernel and the breadth and quality of the middleware. The availability of specific application code, preconfigured for the chosen OS, may be very attractive. Even if the current target application does not need all of these capabilities, possible upgrades must be anticipated.
Commercial factors for OS selection The primary consideration in any business decision is cost. Even though selecting an OS is apparently a technical endeavor, financial issues can strongly influence the choice. There are initial costs, which include the licensing of software or procurement of tools, and there are ongoing costs, such as runtime licenses or maintenance charges.
Application
Middleware Kernel
FIGURE 1 Differences between a desktop computer and an embedded system.
Real-Time OS (Nucleus)
Control Plane
Middleware IPC
Kernel Data Plane
FIGURE 2 High-level software architecture with the separation between the control and data planes in a multi-OS/AMP system. All software has some kind of license, so the costs of legal scrutiny must be factored in, as lawyers with appropriate technical skills do not come cheap. There is also the question of ongoing support and who is going to provide it. Finally, for most state-of-the-art devices, time-to-market is extremely tight, so the extent to which the choice of OS can accelerate development may be measured not as a cost, but rather, as a cost savings.
Linux or Nucleus? One company’s experience BitRouter, a successful software company from San Diego, California, builds turnkey software solutions for set-top box and television applications. The company has implemented solutions using the Mentor Graphics Nucleus RTOS, uC/OS-II, VxWorks, OS20, WIN32, commercial Linux, as well as embedded Debian Linux for ARM. Some of BitRouter’s main customers include Texas Instruments, Toshiba Semiconductors, NXP Semiconductors, ST Microelectronics, Motorola, RCA and NEC. BitRouter had the opportunity to implement similar digitalContinued on next page
17
< TECH FORUM > EMBEDDED
Source: Mentor Graphics
Software
External Memory General-purpose OS (Linux) RTOS (Nucleus)
Execution Units
Peripherals
CORE 1
UART
ETH
USB
TP
PCI
LCD
I2C
SPI
INT CTRL
TIMER
Shared Devices
CORE 0
RTOS Devices
GPOS Devices
IPC
Shared Devices
18
FIGURE 3 Partitioning of system resources in a multicore, multi-OS design. to-analog converter set-top boxes using Linux and Nucleus. The Nucleus-based set-top box had a Flash and RAM footprint of roughly half that required by the similar Linux-based set-top box. The boot time required for video to play was three seconds with Nucleus compared to ten seconds with Linux. This is just one example of how an application dictates the most suitable OS. In this situation, a commercial RTOS was better suited because it was small, compact, and it was being built into a high-volume system where memory footprint and boot-up time were key issues. BitRouter is nevertheless still a big supporter of Linux and believes Linux will be a good fit for Java-based set-top boxes and TV sets where the total RAM footprint can exceed 64MB— and where constrained space is not such a critical issue.
The next frontier: multicore, multi-OS designs The OS selection between Nucleus and Linux may not even comprise the right question. Perhaps it is better to ask how these two OSs can work together, maximizing their respective strengths to address ever-challenging performance and power efficiency requirements for today’s increasingly complex embedded applications. One solution is moving to multicore system development. And while multicore has been around for some time, what is new are the recent advancements in asymmetric multi-processing (AMP) systems. An AMP multicore system allows
load partitioning with multiple instantiation of the same or different operating systems running on the same cores. Figure 2 (p. 17) shows a basic, high-level software architecture for an AMP system. The design includes both a GPOS and an RTOS—each serving distinctly different purposes. The system is partitioned in such a way that the GPOS handles the control plane activities (e.g., drivers, middleware, GUI) and the RTOS handles data plane activities that are timesensitive, deterministic, and computationally more intensive. A key ingredient in the success of the multi-OS/AMP system is the Inter-Process Communication (IPC) method, which until recently, varied from one design to the next. IPC allows the OSs to communicate back and forth. Today, there are a number of open standards for IPC, which will further expedite multicore, multi-OS development. Figure 3 takes the multi-OS example one step further. It shows a few of the design decisions behind the integration of the GPOS and RTOS, and a real-world example can be envisioned in terms of what fabless chip vendor Marvell recently accomplished with its Sheeva line of embedded processors. The company is a specialist in storage, communications and consumer silicon, with a focus on low-power, high-performance devices. Sheeva, allows developers to use dual OSs to manage separate function requirements. For example, in one application for enterprise printing, Nucleus could be used for interoperational tasks where speed is of prime importance, while Linux could be used for networking and the user interface.
Conclusion Conventional embedded RTOSs and desktop-derived OSs each have a place in the embedded ecosystem. An RTOS makes less demand on resources and is a good choice if memory is limited, real-time response is essential, and power consumption must be minimized. Linux makes good sense when the system is less constrained and a full spectrum of middleware components can be leveraged. Finally, there are an increasing number of instances where multicore design can benefit from a multi-OS approach— on a single embedded application—maximizing the best of what Nucleus and Linux have to offer.
More information Colin Walls also blogs regularly at http://blogs.mentor.com/colinwalls Mentor Graphics Corporate Office 8005 SW Boeckman Rd Wilsonville OR 97070 USA T: +1 800 547 3000 W: www.mentor.com
A Powerful Platform for Amazing Performance Performance. To get it right, you need a foundry with an Open Innovation Platform™ and process technologies that provides the flexibility to expertly choreograph your success. To get it right, you need TSMC. Whether your designs are built on mainstream or highly advanced processes, TSMC ensures your products achieve maximum value and performance. Product Differentiation. Increased functionality and better system performance drive product value. So you need a foundry partner who keeps your products at their innovative best. TSMC’s robust platform provides the options you need to increase functionality, maximize system performance and ultimately differentiate your products. Faster Time-to-Market. Early market entry means more product revenue. TSMC’s DFM-driven design initiatives, libraries and IP programs, together with leading EDA suppliers and manufacturing data-driven PDKs, shorten your yield ramp. That gets you to market in a fraction of the time it takes your competition. Investment Optimization. Every design is an investment. Function integration and die size reduction help drive your margins. It’s simple, but not easy. We continuously improve our process technologies so you get your designs produced right the first time. Because that’s what it takes to choreograph a technical and business success. Find out how TSMC can drive your most important innovations with a powerful platform to create amazing performance. Visit www.tsmc.com
Copyright 2009 Taiwan Semiconductor Manufacturing Company Ltd. All rights reserved. Open Innovation Platform™ is a trademark of TSMC.
20
< TECH FORUM > ESL/SYSTEMC
Bringing a coherent systemlevel design flow to AMS Mike Woodward, The MathWorks Mike Woodward is industry manager for communications and semiconductors at The MathWorks. He has degrees in Physics, Microwave Physics and Microwave Semiconductor Physics, and was a leading player in a team that won the British Computer Society’s IT Award for Excellence in 2000. His work on audio signal processing has led to several patents.
There is a widespread belief that analog and mixed-signal (AMS) design cannot take advantage of abstractions and other ESL design techniques that have shortened design cycles and raised efficiency in the digital domain. This article will show that the reverse is true. Although the first generation of ESL tools tended to focus on linking hardware and software, there are ESL tools that enable AMS engineers to design at the system level and exploit the productivity advantages of ESL. These tools also improve the design and verification of the interface between the analog and digital worlds, where experience shows us that many bugs lurk. Changing any design flow does involve some risk, sometimes a considerable amount. However, the techniques discussed here are at the lower end of that scale, and demonstrate that even a few minor tweaks can have a dramatic effect.
The status quo First, let’s identify some existing problems that we can fix in a straightforward way. Figure 1 shows a typical AMS design flow that we will use as a vehicle. A project usually starts with a specification. An internal team—perhaps ‘a research group’ or ‘a systems and algorithms group’—creates models of the proposed system and starts running conceptual simulations. It then passes the resulting goals on to the digital and analog design groups for the implementation stage. The specification normally arrives as a paper (e.g., Acrobat, Word) document. At this point, the two teams go off and design their parts of the system. In principle, there should be steady back-and-forth communication between them as the design progresses from concept to implementation. After that we come to verification, then to a prototype as a chip or PCB that is typically fabricated by a third-party manufacturing partner.
Source: The MathWorks
Specification
Digital Design
Analog Design
DESIGN VERIFY
VERIFY
IMPLEMENT
IMPLEMENT
Verification PROTOTYPE
VERIFY • MEASURE • INTERPRET
FIGURE 1 A typical AMS/digital design flow What are the problems here? The left hand side of the diagram shows the digital designers working through a ‘design, verify and implement’ loop. They have access to some very good design abstractions that, over the last two decades, have enabled them to work at much higher levels, away from the circuit level. However, analog designers have not seen quite the same amount of advances. Typically, they still focus at the circuit level, so the loop there becomes ‘implement and verify’ only. Meanwhile, our flow assumes that there is efficient communication between the digital and analog teams, but we all know that is not often the case. This is another candidate for major improvement. Let’s also ask if we can do better than a paper specification. Could we make the design process more coherent as a project flows from concept to verification through implementation, and also execute the design in a way that really exploits the system knowledge generated during specification? This gives us three flow-based objectives that we will now address under the following headings: • Behavioral abstraction for AMS; • Linking tools and teams; and • A coherent design process.
EDA Tech Forum September 2009
For two decades, the benefits of ESL and abstractions have been supposedly confined to engineers working on digital designs and to system architects. Analog and mixed-signal (AMS) design has largely remained a ‘circuit level’ activity. This article shows that tools exist that now also allow AMS engineers to exploit abstraction, and that can make all types of design flow (analog only, but also where AMS/RF and digital elements are combined) more efficient and more comprehensive.
These are all key features of the Simulink tool suite. We are now going to look at them in both the generic sense and by way of some actual user experiences.
Behavioral abstraction for AMS Simulink enables you to take a more abstract, or behavioral, view of a system. Models are created by assembling components from pre-existing libraries, or by creating components if they do not already exist. This speeds up the design process compared with building each component from scratch every time. Let’s take a more specific view of how that works. Sigma delta modulator Figure 2 is a model of a second-order sigma delta analog-to-digital converter (ADC). This example gives us the opportunity to show how analog and digital elements are connected together. How did we get that? Simulink has a set of libraries available to it, and you construct models through a drag-and-drop interface. So, progressing to Figure 3 (p. 22), this is the interface where we find that we can drop in an integrator, a source and a gain. Some of these components are analog, and some are digital—we can connect them together directly. Having connected them up, we will want some kind of output so we put in a oscilloscope.
Software such as Simulink provides access to an extensive range of models, and the same tools can also provide a common communications platform between AMS and digital teams that helps both sides see how one part of a design affects another. Even at the level of the specification itself, cumbersome and sometimes ambiguous paper documentation can be replaced with digital files that define goals and intent throughout the life of a project. What if we want some behavior that is not the default behavior? Say we want a specific gain signal for example. To do that, you simply double-click on the gain block, and that opens up a dialogue box where you can enter the data (Figure 4, p. 23). What happens if you want some behavior that is not in an existing library? Then you can create your own Simulink blocks using C or Matlab. As with many mixed-signal systems this model has a feedback loop in it, something that can cause significant problems for some simulation environments. Simulink copes with feedback loops naturally, and in fact that capability was built-in right from the start. Variable time step handling The temporal behavior of a typical analog system is fairly constant and predictable over long periods, but can sometimes undergo dramatic change for short periods. This presents a significant challenge when you want to run simulations. A simulation can take very large time steps. That will save on computational power and time, but also means the simulation is likely to miss capturing the system’s Source: The MathWorks
FIGURE 2 Second order sigma-delta ADC Continued on next page
21
22
< TECH FORUM > ESL/SYSTEMC
behavior when it changes radically over a short period of time. Alternatively, it can take very small time steps throughout. This captures rapidly changing behavior, but is also unnecessarily lengthy and computationally very expensive. Simulink offers a third approach. Here, the simulaSource: The MathWorks
something akin to a ballpark option, you can use the default options. Executable specifications If this kind of system-model is developed early on, then it can be used as an executable specification. Paper specifications are ambiguous, but an executing system is not. An executable specification in the form of a functioning system model enables geographically separated groups to reason about the behavior of the system and to understand how their component has to fit in to the overall system. The bottom line This design process enables us to get something up and running very quickly—we can find an accurate behavioral solution rapidly, much more quickly than is usually the case in AMS design. Lastly, before we start putting all this together, we must note that while efficiencies in modeling and temporal analysis are important, there may be points where the granularity of circuit-level simulation is required. That is still available. You do not give it up when you move to a more system oriented flow, as we will now go on to discuss.
FIGURE 3 Libraries of analog and digital components tion takes a variable time step, so that when the system is changing very rapidly, it sets a small time step and when the system is hardly changing at all it sets larger ones. This provides the balance between accuracy and computational efficiency. Our ADC system has different data rates, something we can see by turning on the rate display as shown in Figure 5 (p. 24). Different colors show different data rates, with variable time step branches in black (analog components), and the different fixed steps in different colors (the digital components). As you can see, the model consists of multiple different time steps. Note how blocks operating at different rates are directly connected together. If various time steps are being used in the simulation, how can we control them? Figure 6 (p. 26) shows the Simulink configuration menu where the time steps are controlled. This is a simple interface where the default options are the best in almost all cases. If we have analog components in the model we can select a variable time step, or if the model is wholly digital we can select a fixed time step. If you need even greater control, you can change the solver, you can change the minimum tolerances, or you can change the maximum step sizes. All those advanced options are there. But if you just want to get
In the real world Semiconductor company IDT New Wave was looking to improve its mixed-signal simulations. Their previous method was based purely at the circuit level, and it used to take days to run. The feedback loops in the design slowed the simulation engine down greatly. In addition to the variable time step solver, Simulink has the capacity to deal with algebraic loops, so IDT was able to use the tools and concepts described above to shorten its design cycle and identify algorithmic flaws earlier in its flow. Let’s summarize the benefits of using this type of approach. Using traditional design models, you can easily become entangled in the details of the analog components, and because of the cost of changing these models, can only examine a few different architectures. By taking a more abstracted approach, you can quickly evaluate a wider range of architectures. This more comprehensive exploration will give you confidence that your final decision is the ‘right’ one. Its rapid design capability also substantially reduces the risk of serious errors being found later in the design process and helps avoid respins or late-stage ECOs.
Linking tools and teams Effective communication between analog and digital engineers has a tremendous impact on your flow’s efficiency once you move beyond behavioral evaluation toward actual implementation. Consider again Figure 1. Our simplified design flow shows no obstacles between the digital and analog groups, but in many cases there might as well be a brick
EDA Tech Forum September 2009
wall. One semiconductor company told us it was so hard to get the analog and digital teams to communicate during design that they waited until they had a test chip from the foundry before testing the analog-digital interface. The problem is not that these teams inherently dislike one other or do not appreciate the need for constant and productive communication; rather it lies in the lack of a common language. A digital engineer will at some point need to check that his design works with the design of his analog counterpart. So, he will ask his colleague to supply some test source, and that colleague will then run an analog design tool to output the necessary data. The data will almost certainly not be in a format that the digital tools can read. So, it will need some translation—and maybe a graduate engineer at the company has written some Perl scripts to do that. But, if there are 10, 20 or 30 seconds of data, it will be an enormous file and it is going to take time to process. Finally, though, the digital engineer can read the data into his work and get some answers. Then, inevitably, the analog designer asks his digital colleague to return the favor and we go through the process again, but in the other direction. There are several problems with this. • It is slow. The time taken for simulation is compounded by the time taken translating and transferring data. • It is cumbersome. Simply moving these enormous files around can be awkward. I have worked on a project where even with a fast link between two different sites in different countries, translation and processing meant that it was still quicker to burn DVDs and send them via a courier than to transfer the data online. • It is very static, and this is the biggest problem of all. We cannot simulate the dynamic analog-digital interaction. If something in the simulation causes the modulation scheme to change, this will affect the analog components, and this change in the analog behavior may in turn affect the digital parts of the system. This kind of interaction cannot be studied using a file-based simulation method. Both analog and digital designers need a common platform that removes these obstacles. This is where Simulink shows its capabilities. Not only can it simulate mixed-signal systems, but it has links to tools from other vendors for implementation and those other vendors have links from their products to Simulink. In the digital domain, there are co-simulation links from Simulink to Mentor Graphics’ ModelSim, Synopsys’ Discovery and Cadence Design Systems’ Incisive. In the analog domain there are links to Cadence’ PSpice and Spectre RF, Synopsys’ Saber and others. Co-simulation can be defined as the use of bidirectional runtime links between Simulink (and, for that matter,
Source: The MathWorks
FIGURE 4 Adding behavior Matlab) and other tools. You can run the two tools together and exchange data in real time as the simulation progresses. Basically for every model time step you take, you exchange data. This means you can see the dynamic changes of behavior between the different models in the system. In essence, we thus have Simulink acting as the common design platform. From a behavioral description, you can now call down from a higher level abstraction to a detailed implementation in a tool from another vendor—all at run time. Let’s take an example. A real RF receiver will introduce distortions into a signal that affect the behavior of a digital receiver. By making a call from Simulink to, say, ModelSim, a digital engineer can see straightaway how his digital receiver implementation copes with those distortions and decide if the result is within tolerance or if the design needs to be changed. Meanwhile, an analog engineer can call from the analog portions of his system in Simulink to see implementations
Continued on next page
23
24
< TECH FORUM > ESL/SYSTEMC
running on SpectreRF. He can thus see how his designs perform within the context of the digital simulation in Simulink. In both scenarios, Simulink is acting as a test harness, giving analog and digital designers confidence that the interplay between their work will actually meet the needs of the final system, and providing that information much more quickly and dynamically. • This is faster. There is no need to swap files. We can use just one model and isolate pieces that we want to test for by calling directly down to other appropriate software. • It is easier. There are no huge data files floating around. In fact, all that’s ‘floating around’ is the common and agreed Simulink model. • It’s very dynamic. We can almost immediately see how changes in the digital system affect the analog system and vice versa because they all execute in the same model at the same time. Two vendors’ simulation environments work together to enable you to study your environment much better. As well as the links cited above, Simulink also has links to test equipment that enables hardware in-theloop testing, and brings to bear the power of Matlab for data analysis to be used to interpret the test results. Simulink has links to test equipment from manufacturers such as Agilent, Anritsu, LeCroy, Rohde & Schwarz and others. We have moved from a very high level of abstraction into the implementation world by using Simulink models as the test harness for analog and digital designs. Reusing systemmodels in this way enables us to find errors much earlier in the flow.
In the real world Realtek was developing an audio IC and had the same need to improve intra-team communication while streamlining the project’s life cycle. Using Simulink as a common design platform, they were able to get the teams to share data from early in the design process and this made it far easier for them to work together. The teams could speak the same language and use the same test environment. Notably, the resulting design took a very high market share in its first year of release.
A coherent design process We are now ready to bring all these elements together in a single flow and create the coherent design process we discussed earlier. Simulink allows you to model multiple domain systems in the same model: continuous time, discrete time, discrete event, finite state machines, physical and circuit models. So, one simulation can include digital hardware, analog and RF hardware, embedded software and the environment, with each part interacting with the others as appropriate. Using Simulink, you can quickly create an abstract behavioral model of a system. This enables you to very rapidly choose between different system architectures, so giving you greater confidence the design will work on a first pass. Moving onto implementation, analog parts of a model can be removed and replaced by co-simulation links to an analog design tool. The analog implementation team can thus continue to use its existing design tools, and also dynamically test the behavior of its analog subsystem against the dynamic behavior of the digital subsystem. Similarly, the digital engineers can replace the digital portions of the model with co-simulation links and hence test the digital implementation against the analog behavior of the system. Source: The MathWorks
FIGURE 5 Exposing the clocks in the ADC Continued on next page
CONVERGENCE
www.apache-da.com
Power. Noise. Reliability. From prototype to signoff, we provide a unified CPS platform to manage noise and co-optimize your power delivery network across the system. Mitigate design failure risk, reduce system cost, and accelerate time-to-market with Apache.
26
< TECH FORUM > ESL/SYSTEMC
Source: The MathWorks
Summary By introducing abstraction into the design process we enable AMS designers to find a working architecture much more quickly than they could previously. We’ve linked different groups in the design process via a common design platform. And we’ve cut down on development effort by reusing the same model through the design process. In more specific terms for AMS designers, we have shown how they can gain practical advantages by taking a more abstract system-level view, and how this same process can be used to improve the communication between analog and digital engineers.
FIGURE 6 Controlling the time steps This approach flushes out interface errors much earlier in the design process. The final step in this process is the reuse of the systemmodel as a golden reference against which to compare the delivered hardware. This involves connecting Simulink to the test equipment and to the analysis and interpretation of the captured test results. In essence, we are reusing the same system-model again and again as we move through the design stages.
The MathWorks 3 Apple Hill Drive Natick MA 01760 USA T: +1 508 647 7000 W: www.mathworks.com
Do you wake up at night thinking about your project? We’ve got just what you need to see clearly each morning.
The DV Notebook gives you real-time status in any web browser. Achilles software automatically extracts, gathers, and organizes key results from your design and verication les as they are generated. Untitled-8 1
www.achillestest.com 8/14/09 10:21:42 AM
y g er En
TechCon
y
Ef
fic
ie
nc
FORMERLY ARM DEVELOPERSâ&#x20AC;&#x2122; CONFERENCE:
MC
U&
Too
ls
Int
e
t r ne
Ev
er y
wh
ere
DESIGN TO THE POWER OF THREE
IS COMING OCTOBER 21-23, 2009 TO THE SANTA CLARA CONVENTION CENTER
Get 3X the Solutions for your
ARM Powered Designs
Increase your knowledge exponentially and decrease your time-to-market by overcoming challenges in designs for everything from Motorcontrol to High-speed Communications that require energy conscious processors, tools and applications that are secure and connected.
Three Events in One!
Select from over a hundred classes in our intensive and value-packed Conference, and visit key industry experts on the exhibition floor for the latest in: Energy Efficiency
Energy Efficiency
MCU & Tools
Leveraging energy efficient SoC strategies to minimize power requirements
MCU & Tools
Internet Everywhere
Enabling successful on-time product development, integration, testing and production
Internet Everywhere Developing applications for a connected world
REGISTER NOW for only $495* Use Promo Code: ARM09EarlyBird *A Discount of $200 off Std. Registration
WWW.ARMTECHCON3.COM MEDIA SPONSORS: HEARST
BROUGHT TO YOU BY:
28
< TECH FORUM > VERIFIED RTL TO GATES
A unified, scalable SystemVerilog approach to chip and subsystem verification Thomas Severin, Robert Poppenwimmer, LSI Thomas Severin is a SoC product development engineer in LSI’s Storage Peripheral Division. He has 10 years of experience in the ASIC industry and joined LSI in 2008. His special focus is in the use of advanced verification methodologies for complex designs. Robert Poppenwimmer is a senior ASIC SoC design/ verification engineer in LSI’s Storage Peripheral Division. He joined LSI in 2000 and has a master degree in electrical engineering from the Technical University Munich.
The ability to perform chip and submodule verification within a unified and scalable SystemVerilog (SV) environment minimizes the effort required for testbench development and frees up resources for work on the primary test cases specified in the verification plan. The right combination of SV verification libraries (as found in the Advanced Verification Methodology (AVM) and the Open Verification Methodology (OVM)), SV Assertions and integrated processor models covers all verification needs. It offers the wide range of configurations needed to ultimately achieve dedicated submodule verification at maximum simulation speed without simulation overhead. This approach reduces the overall time needed for functional verification of a system-on-chip (SoC) and its corresponding submodules by exploiting several inherent advantages. These include scalability, automation, flexibility and reuse. • The test environment is configurable at a high level, so the user can focus tightly on the part of a design that must be verified. Both top module and submodule verification tests can be executed. • Automation includes test and data generation, selfchecking tests and regression mechanisms. • The user can employ multiple stimulus techniques— these can include direct stimulation for integration testing, randomized stimulation for higher levels of abstraction during block-level testing, and processor models. • Finally, the environment can be established as an open framework. The transactor models and scoreboards include standardized transaction-level modeling (TLM) interfaces. This greatly reduces ramp-up time for the verification of new designs.
Traditional verification methodologies use separate environments for chip and submodule verification. By contrast, the newer, more integrated strategy described here provides improvements in efficiency that more than repay the extra effort required during the initial set-up and to manage the increased complexity of the resulting environment. Moreover, only one environment now needs to be developed to cover all top-level and submodule verification tasks, and the increased complexity can be resolved by a well-defined class, directory structure and documentation. The paper describes the SoC verification process for a specific chip to illustrate the approach. The design was a state-of-the-art, multi-million-gate storage controller ASIC including a variety of interface standards, mixed-language RTL and intellectual property (IP) blocks. The top-level module was captured in VHDL. It contained VHDL, Verilog and SystemVerilog RTL components, with multiple IP blocks as both hard and soft macros (Figure 1). The SoC could be partitioned into three main parts: an ARM subsystem with three CPUs, a host subsystem containing the host interfaces, and a subsystem with customer logic.
The environment The SV environment was built on top of an AVM library to take advantage of the methodology’s TLM features. You can also use OVM or other libraries. Several transactors served the different external interfaces (e.g., SAS, SATA, Ethernet and various memory interfaces). We used SV’s ‘interface’ construct so that tests could access the device-under-test’s (DUT’s) internal and external interfaces. All transactors outside the SV environment (e.g., memory models, ARM trace ports) were instantiated in the top-level testbench along with the DUT and the SV verification environment. The SV environment itself was instantiated inside a program block in the top-level testbench file. This gave us the option of using ‘force’ instructions in tests where considered necessary. We also used a test factory class that generated a user-selectable environment object during runtime. To achieve this, the simulator call included a parameter that selected the type of environment class to be used as the test environment. This allowed us to construct different transactor configurations in each test. It also allowed us to
EDA Tech Forum September 2009
The article describes LSI’s work on the use of a single SystemVerilog-based (SV) verification environment for both the chip and its submodules. The environment is based on SV’s Advanced Verification Methodology (AVM) libraries, although alternatives are available. One particular reason for choosing AVM was that LSI wanted to leverage its transaction-level modeling capabilities as well as other “advantages.” “A verification environment that offers reusability, scalability and automation allows our verification experts to focus on the functional verification goals of a complex SoC much more efficiently,” says Thomas Kling, engineering manager for Custom SoC Products in LSI’s Storage Peripheral Division. run different tests without recompiling and restarting the simulator: when one test was finished, the corresponding environment object would be destroyed and the next test’s environment object was constructed and started. The environment consisted of a base class that included all objects common to each test as well as the three AHB drivers. All tests were derived classes from this base environment class formed after individual selection of additional testbench modules (e.g., scoreboards, reference models, monitors and drivers for the DUT’s internal interfaces).
The main part of the article describes the environment’s development and application to a specific design: a multimillion-gate storage controller ASIC equipped with a variety of interface standards and intellectual property blocks, and expressed at the RTL in multiple languages. “In using the described approach we were able to increase our engineering efficiency and maintain a high level of flexibility for reaching our verification goals on time,” says Kling. Source: LSI
Test strategy This primarily took two approaches to verification. Many tests covered control functions that were verified by applying directed tests, often supported by assertions to get feedback on the functional coverage. When it came to data path testing, we used a more suitable directed random approach. These tests were appropriate for testing the memory paths, Ethernet packet transfers, and SAS and SATA frame handling. So, for the directed random approach, we implemented memory-based, self-checking capabilities. Data accesses were randomly applied to both the path-to-be-tested and a virtual reference memory model. All the read data packets were then routed to a scoreboard for comparison. We made heavy use of assertions to make sure we covered all possible access modes on the buses and all functional capabilities of the relevant submodules (e.g., bridges). All test classes were completely self-checking and a detailed test report was provided after every simulation run.
Structure Our testbench’s top level was a SV module with different functional sections. In its first section, we defined the SV interfaces and necessary code to instantiate and connect the DUT. A section with conditional instantiations of simulation models followed. Depending on ‘define’ macros inside the starting script, we could attach several different models at the DUT’s boundaries (e.g., Fibre Channel transceivers, SAS and SATA models, DDR & flash memory models, and ARM boundary scan trickboxes (BSTs) & embedded trace macrocell (ETM) models). The next section included different groups of proprietary connections and several blocks of SV ‘bind’ instructions. All
FIGURE 1 Testbench structure. these were separated by their functionality into ‘include’ files and again included, depending on the ‘defines’. These blocks were used to connect empty wrapper ports inside the DUT to SV interface signals, and the ‘bind’ blocks brought additional assertion checker modules into the RTL design. The final section contained the definition of a program block (outside the testbench module) that oversaw the construction and control of the test environment and its instantiation. As shown in Figure 2, the environment base class (‘cl_env_base’) had all the internal drivers instantiated and connected to local virtual interfaces. Only the AI-1B drivers were left to be inserted on demand in a derived testcase class. As most of the drivers handled accesses to memorymapped regions, they were connected to ‘Memory Slave’ units that simulated memory arrays of customizable size. Continued on next page
29
30
< TECH FORUM > VERIFIED RTL TO GATES
Source: LSI
FIGURE 2 Verification environment. If we had used configurations in which real RTL was used instead of the empty wrappers, the affected drivers’ interface connections would simply be left unconnected. But as all were used in most testcases, they were implemented in the base class. As some tests involved using AI-1B bus functional models (BFMs) while others used ARM design simulation models (DSMs), we decided to instantiate the AI-1B transactors inside the testcase-specific classes. These were derived from the base class and, therefore, inherited all the base class’s transactors and connections. In each testcase class, we could define the specific AI-1B transactor to be used (or none where we used DSMs), as well as all the test-supporting infrastructural models (e.g., scoreboards, stimulus generators and reference models). The testcase class also contained the actual test inside the ‘run ()’ task. The general control flow of all involved models was implemented here. Through the SV interfaces and their connection to the environment, it was now very easy to build a largely customizable periphery around the DUT. Most test-specific transactors were defined inside the environment; only the static ones were directly instantiated at the top level. Even there we could customize the transactors using different ‘define’ parameters.
Given also the option to replace RTL parts of the DUT (or even whole subsystems) with empty wrappers connected by SV interfaces to dedicated transactors in the environment, we could now use one environment to test blocks or subsystems of the design as well as the whole chip. For example, we had some tests that verified only the Ethernet interface and the attached frame buffer memory, while other tests verified the different ARM subsystems on a stand-alone basis. Of course, we finally used the complete DUT design for chip-level verification. The AVM-based approach also allowed us to integrate large customer-designed parts that were provided late in the project schedule. We simply inserted empty wrappers, connected them to our transactors, and verified the interfaces to the customer logic. Later we replaced the wrappers with the real RTL, dynamically removed the transactors, and were able to reuse all the available tests.
Connectivity In the top-level testbench we defined SV interfaces and assigned DUT ports to their signals (Figure 1). For the internal connections to the empty wrapper modules in the design, we connected the wrapper’s ports to the corresponding SV in-
EDA Tech Forum September 2009
terfaces. Inside the environment base class, we had a virtual interface defined for each interface used in the top level. Both interfaces and virtual interfaces were connected at simulation start-up time to provide signal access to the environment. To make life a little easier, we defined an additional â&#x20AC;&#x2DC;mergerâ&#x20AC;&#x2122; interface that had all the other interfaces nested inside, so we only needed to route one interface through the environment hierarchy instead of a dozen. When a wrapper was later replaced by real RTL, the â&#x20AC;&#x2DC;includeâ&#x20AC;&#x2122; file that built the connections was not included, resulting in an unconnected interface. On the other side, we would not generate the corresponding driver anymore, thus maintaining a fully working environment. For some tests, especially DSM-related ones executed on an ARM CPU model, it is worth having a transactor connected to an internal interface even when the RTL code is used. We had some transactors that established a communication channel (through accesses to dedicated memory areas) between the C program running on an ARM DSM model and the SV testbench routine. For this to work we had to leave the â&#x20AC;&#x2DC;includeâ&#x20AC;&#x2122; files integrated after replacing the wrappers with RTL, effectively connecting the SV interface signals to the real RTL moduleâ&#x20AC;&#x2122;s ports. Another helpful technique was to add an SV interface for debugging purposes to the merger. As signals inside a SV interface can be displayed in a waveform (unlike SV dynamic variables or objects), we could assign values to such â&#x20AC;&#x2DC;debug interfaceâ&#x20AC;&#x2122; signals inside an SV transactor to watch
them in the waveforms. This took away a lot of pain during the SV transactor development and debugging process.
The right combination of SystemVerilog verification libraries, SV Assertions, and integrated processor models covers all verification needs. Verification components The most difficult task was the integration of all the required transactors, especially the provision of a simple and unified access method for the test writers. To illustrate: we had to use some drivers (e.g., SAS/SATA) that were available only in Verilog; our Al-TB driver was a reused and quite complex module written in pure SystemVerilog; and we needed to code several new transactor classes. We developed new drivers for several different internal bus protocols as well as a basic Ethernet driver, memory transactors, enhanced scoreboards capable of comparing out-of-order transactions, reference models and testbenchsupporting transactors. These transactors enabled synchroContinued on next page
+LJKHVW FDSDFLW\ )DVWHVW VSHHG (DVLHVW WR XVH %HVW YDOXH
7RROV WKLV IDVW VKRXOG EH LOOHJDO
,QGLJR 57/ $QDO\VLV70 $UUHVW VHULRXV 57/ LVVXHV LQ VHFRQGV &REDOW 7LPLQJ &RQVWUDLQW *HQHUDWLRQ70 'R ZHHNV RI KDUG ODERU LQ MXVW D IHZ PLQXWHV $]XUH 7LPLQJ &RQVWUDLQW 9DOLGDWLRQ70 &DSWXUH YLRODWLQJ WLPLQJ H[FHSWLRQV EHIRUH WDSH RXW ZLWKLQ D IHZ KRXUV
5HVXOWV EDVHG RQ 0 JDWH GHVLJQ
7KH 7HFKQRORJ\ /HDGHU LQ 57/ $QDO\VLV 7LPLQJ &RQVWUDLQW *HQHUDWLRQ 9DOLGDWLRQ 2OG ,URQVLGHV 6XLWH 6DQWD &ODUD &$ ZZZ %OXH3HDUO6RIWZDUH FRP
Untitled-5 1
8/14/09 10:12:29 AM
31
32
< TECH FORUM > VERIFIED RTL TO GATES
nization by event triggering and message passing between the SV environment and C routines running on the DUTâ&#x20AC;&#x2122;s ARM subsystems. As our goal was to take maximum advantage of the TLM features to simplify the interconnections between the transactors and unify their utilization, we put some effort into making as many components AVM-compliant as possible. This was also very important with regard to plans for our subsequent reuse strategy and later migration to the OVM library for future projects. Using the AVM library saved resources that were no longer taken up in handling the details of managing transaction movement inside an environment. The predefined TLM structure made it possible for a single engineer to plan, build and maintain the whole environment, including most of the transactors. The rest of the team could concentrate on RTL development and test writing. Converting the Fibre Channel and SAS/SATA Verilog transactors to SystemVerilog and AVM was not feasible within this projectâ&#x20AC;&#x2122;s schedule, but these tasks will be undertaken for our next-generation environment. Porting our already available SV Al-TB driver to AVM compliance required some changes in its internal structure, but was accomplished in a reasonable time. The development of all the new transactors was accomplished ahead of schedule thanks to the easy-touse structural TLM building blocks of the AVM library.
Untitled-15 1
LSI Orleansstrasse 4 81669 Munich Germany T: + 49 (0) 89 45833 0 W: www.lsi.com
6/1/09 3:51:19 PM
STAY ON THE
FRONTOFLINE EE DESIGN
Attend the world’s largest EDA industry event with its technical sessions on electronic system level design (ESL), IC design and physical verification, functional verification, FPGA/PLD, design-for-test, and PCB systems design.
SEPTEMBER 3RD Santa Clara, CA Santa Clara Convention Center For our Santa Clara audience, this free, one day event now has an embedded software track! Sessions include: • Linux® and Nucleus®: A Killer Combo for Multicore SoCs • Driving Android™ Beyond Handsets • Invigorate Your Device Utilizing 3D User Interface Technology
Check other locations and register now at: 2009 Platinum Sponsors:
34
< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION
Implementing a unified computing architecture Kurt Parker, Netronome Systems Kurt Parker is a field applications and product marketing engineer for Netronome Systems. He holds a Masters of Engineering and an MBA from Arizona State University.
Unified computing architectures (UCAs) bring together networking, computing, storage access and virtualization in systems that aim to streamline data center resources, scale service delivery, and reduce the number of devices within a system that require setup and management. They must deliver powerful packet processing (e.g., thousands of applied processing cycles per packet, and more than 30 million packets per second); high levels of integration (e.g., I/O virtualization, security and encryption); and ease of both implementation and use. In adopting UCAs, system architects seek to avoid costly, lengthy and risky custom ASIC developments and instead favor merchant silicon providers that offer highly programmable network processors. Such devices give them the ability to develop and deliver innovative, differentiated products while conforming to industry standards—standards that are themselves often in flux. Meanwhile, performance, power and space budgets are fueling the popularity of multithreaded, multicore architectures for communications systems. Such a range of technologies is encapsulated within our company’s Netronome NFP-3200 processor. Our experience here suggests that comprehensive suites of intuitive and familiar software applications and tools are fundamental to the success of next-generation communications processing projects. The application development suites must be based on a familiar GUI and act as the easy-to-use gateway to a software development kit that contain the tools needed for all phases of a project (e.g., initial design, code creation, simulation, testing, debugging and optimization). A command line interface with flexible host OS requirements will speed interactive development tasks. You also need a power simulation environment that allows the software and hardware teams to develop the next-generation platform simultaneously and thereby take fullest advantage of the capabilities available in highly programmable UCAs. Other requirements for effective development that work within this model include:
• The ability to switch between high-level programming languages and assembly code at a very granular level. C compilers provide a familiar high-level programming language with isolation from hardware specifics for faster time-to-market and optimum code portability. Assembly code can also be used to fine-tune portions of an application to maximize performance, and should be embedded in the C code for optimal results. • The appropriate use of legacy architectures. Architectural choices backed by technologies that can boast years of market leadership and success are inherently both safer and more stable. Such legacy technologies also provide access to and thus take advantage of available pools of talent and experience. Meanwhile, most customers will expect full backward-compatibility with existing architectures. • A choice of development platforms. Access to multiple development platforms and an ability to debug applications on internally developed platforms will enable accurate simulations of real-world performance in hardware during the system design process. • Access to advanced flow processing development tools. Cycleand data-accurate architectural simulations are vital to the rapid prototyping and optimization of applications and parallel hardware/software development. Flexible traffic simulation and packet generation tools reduce testing and debugging time.
Applications enabled by unified computing architectures Enterprises and service providers alike are using various network-based appliances and probes across a widening range of important activities. These include the test and measurement of applications and services, and deep packet inspection to provide billing, accounting and the enforcement of acceptable-use policies. These appliances must therefore offer high performance. They must be sufficiently programmable that they can adapt to the evolving networking landscape. And they must have extremely low latency to avoid inserting any delay into applications and services that measure or pro-
EDA Tech Forum September 2009
Netronome offers a range of programmable Network Flow Processors, which deliver high-performance packet processing and are aimed at designers of communications equipment whose requirements extend beyond simple forwarding. Many network processors and multicore CPUs lack L4-L7 programmability or cannot scale to 10Gbit/s and beyond. Netronomeâ&#x20AC;&#x2122;s flow processors are powered by 40 programmable networking cores that deliver 2,000 instructions and 50 flow-operations-per-packet at 30 million packets-per-second, enabling 20Gbit/s of L2-L7 processing with line-rate security and I/O virtualization. This article describes the tool flow for the development of a high-end application using the processor. Source: Netronome
Project workspace with libraries and documentation.
Integrated Project Management Inital Design
Centralized control of compiler, linker, assembler debugger, simulation and testing
FlowC Compiler and optional Network Flow Assembler Code Creation
Performance Optimization
Netronome Programmer Studio
Local simulation with local, remote or no foreign model, and hardware
Software Simulation
Debugging
Precision Flow Modeler for data- and cycle-accurate simulation Development Testing Packet generation and traffic simulation
FIGURE 1 Comprehensive tools for all design phases tect system activity. Simple configurations will not suffice. Evolving standards, government oversight and regulation, and technological innovation require not only UCAs but also powerful and flexible tools that give fast, easy access to those architectures. Network-based threats, such as spam, spyware and viruses, identity theft, data theft and other forms of cyber crime have become commonplace. To combat these threats, a multi-billion-dollar industry of independent software vendors (ISVs) has emerged. These ISVs provide numerous categories of network and content security appliances such
as firewalls, intrusion detection systems, intrusion prevention systems, anti-virus scanners, unified threat management systems, network behavior analysis, network monitoring, network forensics, network analysis, network access control, spam/spyware, web filters, protocol acceleration, load balancing, compression and more. The ISVs desire tools and software libraries that deliver quick, easy access to the powerful UCAs built for deep packet inspection and the deployment of security applicaContinued on next page
35
36
< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION
tions in increasingly virtualized environments. Another area where communications equipment manufacturers will see an impact from UCAs is in intelligent network interface cards for virtual machine environments within multicore Intel Architecture system designs by way of virtualized on-chip networks. Today, in single-core systems, an external network provides functions (e.g., VLAN switching, packet classification and load balancing) to direct traffic to one or more systems. As these systems are now combined within a multicore virtualized system, the server’s network I/O facility must provide the same functionality that would previously have been provided externally. The Netronome Network Flow Processing software development kit (SDK) and related application code enables system designers to take such an integrated approach by employing high-speed network flow processing to intelligently classify millions of simultaneous flows and direct traffic to the appropriate core and/or virtual machine. While unified computing systems (UCS) are extending to and through 10Gbit/s data rates, their very existence obviates merely configurable architectures, which offer little or no ability to differentiate in services or performance. Purpose-built processors designed to handle the growing and changing needs of UCSs through their programming flexibility and high levels of integration are the only way to achieve maximum performance efficiency. The NFP-3200 has 40 multithreaded packet processing microengines running at 1.4GHz, and in the next section we will use it as an example of how such high performance can be exploited by an appropriate toolset to develop a UCS.
Implementation flow The NFP SDK provides the tools needed to implement nextgeneration designs. These are the main steps you would take to develop a UCS. Configuration The Netronome Programmer Studio is a fully integrated development environment (IDE) that allows for the building and debugging of networking applications on a unified GUI. Its graphical development environment conforms to the standard look and feel of Microsoft Windows, allowing developers to customize the workspace to fit their personal flows and comfort. To enhance organization and multi-party access, ongoing development settings and files are managed through projects. Most projects are set up in a standard fashion that allows full assemble, compile and build control. There is also the ability to create a ‘debug-only’ project. This allows fast-track enablement to debug functionality on externally controlled projects. The Project Workspace, a window within the Programmer Studio, provides tabs with important project and development related information including a tree listing of all project files; a listing of all the Microengine cores in the NFP-3200 that are loaded with microcode when debugging; a
Source: Netronome
FIGURE 2 The Network Flow Linker interface listing of documents included in the SDK; and a tree listing of all microcode blocks that are found when opening a project. Development Application development can be broken into six phases: Initial Design, Code Creation, Software Simulation, Development Testing, Debugging and Performance Optimization (Figure 1). The process of architecting a system often entails an iterative series of requirements analysis, estimation (e.g., size, speed, resource cost), proof-of-concept implementation and test. Many blocks of code already exist for standard functions (e.g., packet classification, forwarding and traffic management) and can help during architectural development. In particular, they give an indication of the code and resource footprint for typical functions. This allows developers to focus on innovations that drive value to their users and differentiate the end system. Proof-of-concept and test are mini-develop/debug phases that can be accelerated by using the SDK’s code-building and simulation tools. Powerful high-level language tools drive rapid code development. The Netronome Flow C Compiler (NFCC) provides the programming abstraction through the C language. It focuses on one given microengine within the NFP3200, with threading and synchronization exposed at the language level. When a program is executed on a microengine, all its threads execute the same program. Therefore, each thread has a private copy of all the variables and data structures in memory. The compiler supports a combination of standard C, language extensions and intrinsic functions. The intrinsic functions provide for access to such NFP features as hash, content addressable memory (CAM) and cyclic redundancy check (CRC) capabilities. Developers can configure the NFCC through the GUI or through a command line to optimize their code for size, speed, or debugging at the function level or on a whole-program basis. The compiler also supports inline assembly language, both in blocks and individual lines. The Netronome Flow Assembler (NFAS) will assemble microengine code developed for the IXP2800 legacy mode
EDA Tech Forum September 2009
Source: Netronome
FIGURE 3 The Precision Flow Monitor simulation system
You need a power simulation environment that allows the software and hardware teams to develop your nextgeneration platform simultaneously. or for the NFP-3200â&#x20AC;&#x2122;s extended addressing and functionality (a.k.a. extended mode). Like the NFCC, it assembles on a per-microengine basis. Evoking that assembler results in a two-step process: preprocessing and assembly. The preprocessor is invoked automatically by the assembler to transform a program before it reaches the assembly process, including the processing of files and replacement of certain literals. At this stage, developers can also invoke any or all of the following facilities: declaration file inclusion, macro expansion, conditional compilation, line control, structured assembly and token replacement. The assembly process includes code conversion, optimization and register/memory allocation. For single functions or entire applications, ready-to-build code can be partitioned and imaged using Netronomeâ&#x20AC;&#x2122;s Network Flow Linker (NFLD). The NFLD interface allows users to manage the complexity of a multi-threaded, multiprocessor architecture within an easy-to-use GUI (Figure 2). The user assigns list files output by the assembler and compiler into each of the microengines within the chip. Various memory reservation, fill options and build options are presented as well. Debug and optimization The Programmer Studio is backed by the Netronome Precision Flow Modeler (PFM), a cycle- and data-simulation model
of the entire data-plane portion of the chip and its interfaces. Figure 3 shows the PFM in action. In the debug phase of the design, a customer can select and view the code and program counter position for any thread with code loaded in the build system. Using breakpoints is a standard tool for checking for code correctness, and the PFM allows them to be set not only on points in the code being run, but also on changes in internal registers and external memory locations. In many communication applications, performance efficiency as it relates to power, cost and size are as important as performance. Competitive differentiation is often gained through the ability to apply increasing amounts of functionality to every packet in a high-speed data flow. In these cases, it is desirable to tune application code to maximize performance and functionality in the communications system. Because the PFM is a cycle-accurate simulation, developers can use it to see exactly how well their code is running as written for the NFP without actually loading it on the chip. In addition, Programmer Studio captures code coverage so the user can identify dead and highly executed code in an application. This allows performance improvements and iterative code development in parallel with hardware design and integration. Of specific use to developers in the optimization phase is the Thread History Window (seen at the foot of Figure 3). Color coding of cycle-by-cycle activity on each microengine thread gives a quick visualization of when the microengine is executing code or might be stalled and in need of a software switch to the next context in line. Performance statistics, execution coverage, and an ability to craft simulated customized traffic patterns into the NFP-3200 help developers see hot spots in their code where additional focus would bring performance gains in the application.
Netronome 144 Emeryville Drive Suite 230 Cranberry Twp PA 16066 USA T: 1 724 778 3290 W: www.netronome.com
37
38
< TECH FORUM > DESIGN TO SILICON
System level DFM at 22nm Special Digest, EDA Tech Forum The 2009 Design Automation Conference was held in San Francisco, July 26-31, 2009. Further information on accessing archived proceedings and papers presented at the event is available at www.dac.com.
A recent session at the Design Automation Conference in San Francisco considered how to make the 22nm process node a reality despite an increasing number of obstacles. All the speakers were unanimous that part of the answer will come from using system-level design strategies to address manufacturability. Much has already been said and written about the need to bring design-for-manufacture (DFM) further up the design flow, although it would appear that necessity will prove as much the mother of abstraction as invention in this case, with 22nm creating a series of challenges that make the shift necessary. According to various data, 22nm manufacturing is expected in 2012 and leading manufacturers are already installing or preparing to install capacity (Figure 1).
The challenges of 22nm Intel Intel has always been at the forefront of the intersection between design and manufacturing, and remains one of the few semiconductor companies fully active in (and committed to) both areas. Shekhar Borkar, a company fellow and director of its Microprocessor Technology Labs, divided the challenges presented by the 22nm into the technological and the economic, and also made some observations on the future of custom design. His overarching theme was that ‘business as usual’ simply is not an option. From the technological point of view he cited a number of relatively familiar but now increasingl large problems. The main ones were: • slowdowns in gate delay scaling; • slowdowns in both supply and threshold voltages as subthreshold leakage becomes excessive; • increased static variations due to random dopant fluctuations (RDFs) and the limitations of sub-wavelength lithography (stuck at 193nm, with ‘next-generation’
extreme ultra-violet (EUV) lithography still to arrive in commercial form); • increased design rule complexity and more restrictions due, again, largely to sub-wavelength lithography; and • greater degradation and less reliability due to the high electric fields. From the economic point of view, Borkar noted that: • Intel expects a 22nm mask set to cost more than $1M, indicating that the manufacturing cost ramp is hardly slowing down (separately, the Globalfoundries fab being prepped for 22nm has been ‘conservatively’ estimated at a cost of $4.5B); and • there is a growing ‘tools gap’ between increases in efficiency that are being delivered and the ability of the EDA software to deal with the increases in complexity presented by a node that is likely to offer one billion transistors on a single piece of silicon. Then, Borkar described how the traditional advantages of custom design would be reduced or obviated completely: • Achieving the best operational frequency is no longer achieved largely by optimizing the resistance and capacitance metrics for interconnect, since transistor scaling, variability and power consumption have increased their influence so greatly. • The restricted design rules (RDRs) imposed by manufacturing variability virtually eliminate the chances of going in to manually enhance how closely transistors and interconnects are packed together on a chip. • There have been cases where attempts to take a local rather than a global view of optimization, in the hope that there may be some islands available, have actually worsened rather than improved a design, or at least added to the time-to-market to little effect. Carnegie Mellon/PDF Solutions In his presentation, Andrzej Strojwas, Keithley professor of Electrical and Computer Engineering at Carnegie Mellon University and chief technologist of PDF Solutions, echoed many of Borkar’s points and added some observations of his own, more directly related to manufacturing metrics.
EDA Tech Forum September 2009
The article provides an overview of one common theme in the papers presented at a special session of the 2009 Design Automation Conference, Dawn of the 22nm Design Era. As such, we would recommend that readers wishing to access still more detail on this topic (in particular, on device structures for 22nm and project management requirements) read the original contributions in full. The papers are: It is his belief —and that of his CM/PDF co-authors— that the recent, highly lauded innovations in high-k metalgate stacks will reduce RDF-based variation only for the 32/28nm process generation. For 22/20nm, the industry will need to move to long-touted device architectures such as FinFETs and ultra-thin-body or fully depleted silicon on insulator (SOI), technologies that mitigate RDFs by reducing the dopant concentration in the channel. Furthermore, CM/PDF research suggests that systemic variations will reach prohibitive levels at 22nm if issues surrounding limitations in lithography resolution and the design enhancements offered through stress technologies are not addressed. In particular, the ongoing lack of EUV lithography is forcing the introduction of double patterning techniques (DPTs). In context of the modeling, characterization and printability challenges such multi-exposure DPTs suggest, the technique will be extremely expensive.
8.1 “Design Perspectives on 22nm CMOS and Beyond”, Shekhar Borkar, Intel. 8.2 “Creating an Affordable 22nm Node Using DesignLithography Co-Optimization”, Andrzej Storjwas, Tejas Jhaveri, Vyacheslav Rovner & Lawrence Pileggi, Carnegie Mellon University/PDF Solutions. 8.3 “Device/Circuit Interactions at 22nm Technology Node”, Kaushik Roy, Jaydeep P. Kulkarni, Sumeet Kumar Gupta, Purdue University. 8.4 “Beyond Innovation: Dealing with the Risks and Complexity of Processor Design in 22nm”, Carl Anderson, IBM. include multi-fin and width quantization, use of the backgate as an independent gate, gate-underlapping and fin orientation (Figure 2, p.40). An important question, though, is where in the flow such tuning should take place. Source: Intel
High Volume Manufacturing
2006
2008
2010
2012
2014
2016
2018
Technology Node (nm)
65
45
32
22
16
11
8
Integration Capacity (BT)
4
8
16
32
64
128
256
Delay = CV/I scaling
~0.7
>0.7
Delay scaling will slow down
Energy/Logic Op scaling
>0.5
>0.5
Energy scaling will slow down
Variability
Medium
High
Very High
FIGURE 1 Technology outlook
Purdue University Kaushik Roy, professor in the Nanoelectronics Research Laboratory at Purdue University, joined Strojwas in putting forward a structural answer to the challenges presented by 22nm. In Purdue’s case, one proposal was for multi-gate FETs (MUGFETS) to address the increase in short channel effects (SCEs) as shrinking transistor sizes bring the source close to the drain. MUGFETS will not be immune to SCEs—nor to threats from body thickness—but they do offer a broader range of design elements that can be tuned for the node. These
IBM Carl Anderson, an IBM Fellow who manages physical design and tools for the company’s server division, addressed the challenges inherent in 22nm from a project management perspective. As complexity increases, he argued, so does the emphasis that needs to be placed on the culture and discipline through which companies manage risks and resources. Even today, respins and major delays could often be attributed to changes that were sought relatively late in the Continued on next page
39
40
< TECH FORUM > DESIGN TO SILICON
Source: Purdue University
Single Gate
Process Options
Circuit Options
Double Gate
Fin height, fin thickness, gate workfunction, channel doping, fin orientation, gate underlap, ... Special circuit styles: Schmitt Trigger, Dynamic logic, SRAM, Skewed logic...
Standard CMOS, SRAM, Logic
System Requirements
Lower-power, robustness, high performance, area efficiency
Tech Design Options
Vdd-H-Vt, gate underlap (sym/asym), fin orientation
Asymetric gates
FIGURE 2 Technology-device-circuit-system codesign options for double gate MOSFETS from 22nm design cycle, he said. He also warned that “It will be very easy and tempting to propose chip designs and goals for 22nm that are unachievable within finite schedules and resources.”
Source: Intel
Small Cores
The system-level solution Given the variation in topics across the four presentations, there was nevertheless broad agreement that some kind of high-level design flow strategy needs to be adopted to take full account of 22nm’s sheer range. Correct-by-construction The notion of abstracting to the system level to achieve correct-by-construction design was cited explicitly by both Borkar and Strojwas. Borkar said the main objective had to be designs that are fully functional, on specification and manufacturable after the first past. To do this, he said that the industry needs to shift to a system design methodology that is “SoC-like” [system-on-chip]. The main difference will lie in the fact that today such SoC strategies are based around application-specific blocks, whereas that required for 22nm will be more concerned with soft blocks (or macros) that represent such components of a design as cores, on-die networks and some ‘special function’ hardware (Figure 3). At the same time, Borkar said that the shift to place more of the overall system onus on software will continue. These new blocks will be predesigned and well characterized, and as a result, the emphasis in differentiating your product and enhancing its performance will move to system-level optimization as opposed to designing logic blocks. Physical design will also be predominantly automated, and numerous aspects of a design might now be ‘hidden’ from
Large Core
Network on Chip Special Function Hardware
Memory
FIGURE 3 SoC-like design methodology with ‘soft’ macros the designer (e.g., clocking). There would be still more integration required to meet the needs of the test stage, Borkar said, with each functional block requiring either built-in self-test capability or a straightforward interface to an external tester. Strojwas defined goals that addressed a traditional set of DFM objectives more specifically, but also placed great emphasis on pre-characterized circuit elements and templates. This may suggest a slightly greater degree of granularity than Borkar’s vision, although to say as much to any major degree would be splitting hairs. Strategically, Strojwas says that DFM must become proac-
42
< TECH FORUM > DESIGN TO SILICON
EDA Tech Forum September 2009
tive as opposed to reactive, notwithstanding its inherent need to operate based on a considerable volume of already generated and deeply researched manufacturing data (indeed, exactly the kind of analysis in which PDF Solutions specializes). This alone will effectively counter systemic variation, he said. In the paper that accompanies the DAC presentation, Strojwas and his co authors write, “We propose a novel design methodology (Figure 4)…that will ensure a correct-byconstruction integrated circuit design.” They continue: The key enabler of the methodology is the creation of a regular design fabric onto which one can efficiently map the selected logic templates using a limited number of printability friendly layout patterns. The co-optimization of circuit layout and process is achieved by selection of logic functions, circuit styles, layout patterns and lithography solutions that jointly result in the lowest cost per good die. The resulting set of layout patterns and physical templates are then fully characterized on silicon through the use of specially designed test structures. Source: Carnegie Mellon University/PDF Solutions
Product Design Ojectives Circuit Design Style
Layout Design Style
Litho Choices
DPT, MEBM, IL
Templates
Silicon Characterization
Patterns
First-Pass Silicon Success
FIGURE 4 Extremely regular layout methodology Roy and Anderson are looking at this from less specifically methodological aspects, but where their papers intersect, they reach broadly similar conclusions. Roy and his co-authors conclude their review of various device structures by noting that system-level strategies are already developing that are independent of the underlying technology, and continue by stating: Technology scaling in 22nm [will] require closer interaction between process, device, circuit and architecture. Co-optimization methodologies across different levels [will] help to mitigate the challenges arising due to changes in transistor topology and increased process variations. Novel device/circuit/ system solutions integrated seamlessly with the EDA tools [will] help meet the desired yield and [will] help the semicon-
ductor industry to reap the benefits of scaling economics in sub-22nm nodes.
With his focus on integrating tools and design methodologies into project management—arguably one of the more neglected areas in terms of EDA implementations and executions—Anderson notes that design teams will need to make still more numerous and complex trade-offs between schedule, function, performance, cost and resources for 22nm. He continues: Trade-offs between custom vs. synthesized macros, reused vs. new custom [intellectual property], more features vs. power, etc. will have to be made early in the design cycle. The challenge will be to optimize the entire chip and not just a single component.
However, given the threats posed by late-stage changes or “problems that are hidden or that only surface near the end of the design cycle,” and the fact that the ability to address them may be constrained, he also notes that a good engineer’s ability will continue to be defined by whether or not he can innovate “inside the box.”
Some kind of high-level design flow strategy needs to be adopted. Cultural change Until very recently, DFM and ESL have been seen as two very different areas of endeavor within EDA and chip design more generally. There has also been a perceived geographic split, with North America being considered stronger on manufacturability issues while Asia and Europe were considered more advanced in terms of abstracted design flows. The message from manufacturing specialists at 22nm, however, is that its distinction is becoming increasingly untenable. The minutiae of a circuit or a transistor still matter— indeed, the structure of circuits would seem fundamental to any progress along the process node path—but systems that are defined with an awareness of fabrication challenges are vital to future progress.
Are you drowning in the …
Substrate noise of your power management or RF circuit? Electro-migration issues of your power transistor array? RF cross-talk of your ESD structure? Inaccuracy of your extractor for 3D nets?
Verify and extract with confidence 3D simulation accuracy with speed and capacity
Magwel’s 3D simulation and verification solutions have the capacity, speed and accuracy to tackle your most challenging designs whether they are in analog, mixed-signal, RF or power management ICs . Magwel’s revolutionary 3D co-simulation technology bridges the simulation gap between semiconductors and metal that has limited the accuracy of conventional extraction tools. Get the accuracy of a true physics based simulator with the speed and capacity of our new 64-bit algorithms. To learn more please visit our website www.magwel.com or call us at +1 408 930 1065.
Unique technology. Exclusively from Magwel. The leader in 3D co-simulation and extraction.
44
< TECH FORUM > TESTED COMPONENT TO SYSTEM
Pushing USB 2.0 to the limit Jacko Wilbrink, Atmel & Matt Gordon, Micrium Matt Gordon is technical marketing engineer at Micrium and has several years of experience as an embedded software engineer. He has a bachelor’s degree in computer engineering from the Georgia Institute of Technology. Jacko Wilbrink is product marketing director at Atmel. He has more than 20 years of experience in the semiconductor industry and fostered the development of the industry’s first ARM-based microcontroller, the SAM7. He has a degree in electronics engineering from the University of Twente, the Netherlands.
The Universal Serial Bus (USB) revolutionized the PC world and is rapidly gaining ground in the embedded controls market. The basis of its success is simplicity, reliability and ease-of-use. In the PC world, USB has completely replaced the UART, PS2 and IEEE-1284 parallel ports with a single interface type, greatly simplifying software drivers and reducing the real estate that must be dedicated to bulky connectors. The rapid drop in solid state memory prices combined with the increased speed of the USB 2.0 specification (480Mbps) created the opportunity to store several gigabytes of data in USB memory sticks very quickly. As a result, memory sticks have replaced floppy disks and CD drives as the primary vehicle for data storage and transfer. One further key to USB’s success is interoperability, based on the well-defined standard (Figure 1) and guaranteed by the USB Consortium. Any USB-certified device from any vendor will work in a plug-and-play fashion with any other USB-certified device from any other vendor. Multiple devices can operate on the same bus without affecting each other. The end-user no longer needs to specify IRQs for every peripheral in the system. The USB standard does all the housekeeping. Another major advantage of USB is that it relieves system designers of the burden of implementing one-off interfaces and drivers that are incompatible and often unreliable. For users of embedded controls systems in particular, USB obviates the need to maintain an inventory of different, bulky cables as well as any concerns over their long-term availability because of the drop-in replacement nature of USB peripherals. All these advantages have fostered the adoption of USB in the embedded space. It has become so popular that virtually every vendor of 32-bit flash MCUs offers several de-
rivatives with USB full-speed device or On-The-Go (OTG) capabilities. Embedded microprocessors frequently include both high-speed device and host ports. Some even have an integrated USB hub that supports the connection of multiple USB devices, going some way beyond the initial line-up of keyboards, mice and storage card readers. The simplicity and ease-of-use of USB software and its high sustainable data rates are driving many designers to migrate designs to USB-enabled 32-bit MCUs, which are now pricecompetitive with 8- and 16-bit devices and offer higher internal bandwidth to handle and process hundreds of thousands of bits for each attached high-speed peripheral. USB also offers the opportunity to replace wires between PCBs within a system (e.g., a host processor platform connection to a user interface panel). In most cases, the technology brings together different PCBs that do not sit close together. The USB cable is a robust, low-cost and EMI-tolerant alternative to parallel wires. As USB has found its way into an increasing number of embedded devices, software developers have become wary of the additional complexity that this protocol can bring to an application. The developers of USB-enabled products must shoulder a hefty burden in order to grant end-users the convenience that has made this means of serial communication so popular. Whereas software drivers for SPI, RS-232 and other simple serial protocols typically involve little more than read and write routines, USB software drivers can span thousands of lines, incorporating routines that are difficult to develop and to debug. The software that sits on top of the drivers can be similarly complex, in part because this code must manage enumeration, the byzantine process by which USB hosts identify devices. In order to avoid concerning themselves with enumeration and other confusing aspects of USB, many engineers turn to commercial off-the-shelf (COTS) USB software. For a developer using a reliable, well-tested software module, USB communication simply becomes a series of calls to easily understandable API functions. Thus, programmers who rely on such modules can afford their end-users the convenience of USB without significantly lengthening product development times. Using COTS USB software also offers the best guarantee that devices can interoperate, intercon-
EDA Tech Forum September 2009
USB offers many advantages for use on embedded systems, although software developers remain concerned about the additional complexity it can bring to an application. For example, software drivers for SPI, RS-232 and other traditional serial protocols typically involve little more than read and write routines, while USB software drivers can span thousands of lines, incorporating routines that are difficult to develop and to debug. The software that sits on top of the drivers can be equally complex.
Source: USB Consortium
Client Driver Software
System Software
To avoid being forced to address enumeration and other confusing aspects of USB, many engineers turn to commercial off-the-shelf (COTS) USB software. For a developer using a reliable, well-tested software module, USB communication simply becomes a series of calls to easily understandable API functions. Thus, programmers who rely on such modules can afford their end-users the convenience of USB without significantly lengthening product development times. Using COTS USB software also offers the best guarantee that devices can interoperate, interconType nect and/or communicate with one another as specified by the USB standard.
Universal Bus Driver (UBD)
Companion (UHCI or OHCI) Host Controller Driver
Enchanced Host Controller Driver (EHCD)
Companion (UHCI or OHCI) Host Controller
Enhanced Host Controller (EHCI)
Hardware
Source: Atmel/Micrium
Scope of EHCI
USB USB Device
FIGURE 1 USB 2.0 nect and/or communicate with one another as specified by the USB standard.
Software solutions for USB implementations For the sake of simplicity, ease-of-use and software portability, three hardware/software interface standards have been defined by Intel for the register level interface and memory data structures for the Host Controller hardware implementation: the Universal Host Controller Interface (UHCI) for low-speed, Open HCI (OHCI) for full-speed, and Enhanced HCI (EHCI) for high-speed USB host controllers. The USB driver abstracts the details for the particular host controller driver for a particular operating system. On top of the driver, multiple client drivers run specific classes of devices. Examples of device classes are Human Interface Device (HID), Communication Device Class (CDC) and Storage Class. Developers whose products function as USB hosts are not the only engineers who can benefit from a quality USB software module; implementers of USB OTG and devices also
Mode
Max bandwidth
Bulk
53.24MB/s
Interrupt
49.15MB/s
Isochronous
49.15MB/s
Control
15.87MB/s
TABLE 1 Effective data rates for USB HS operating modes have much to gain. Although the makers of USB devices are somewhat insulated from the aforementioned host controller differences, these developers still must ensure that high-speed hosts can actually recognize their devices. A home-grown USB device implementation capable of full-speed communication must normally be overhauled to support high-speed communication. Even if the underlying USB device controller is capable of high-speed communication, the upper-layer software might not support the additional enumeration steps that high-speed communication involves. The upper layers of a solid COTS implementation, however, are intended to be used with any type of host, full- or high-speed. Because hardware-related issues for both hosts and devices are minimized by USB software modules, overhead can be a major concern for developers considering these modules. Although most embedded microcontrollers cannot maintain high-speed USBâ&#x20AC;&#x2122;s 480Mbps maximum data rate, a low-overhead software module can ensure that rates well over the full-speed maximum of 12Mbps are viable. Because these modules rely heavily on DMA for transferring packets to and from memory, applications that incorporate Continued on next page
45
< TECH FORUM > TESTED COMPONENT TO SYSTEM
Source: Micrium
60
50 Memory Usage (in Kbytes)
46
40
30
20
10
0 µC/USB Device
µC/USB Host
µC/USB OTG Code
Data
FIGURE 2 Memory footprint of USB modules them are not forced to waste CPU cycles copying data. Of course, a low-overhead software module should use both memory and CPU clock cycles efficiently. The best commercial off-the-shelf (COTS) USB solutions are devoid of redundant code and superfluous data structures that would otherwise bring about bloated memory footprints. Given the magnitude of the USB protocol, the compact size of these modules is particularly impressive. For example, the code size of a normal configuration of Micriµm’s µC/USB-Host is just 35 kilobytes, while µC/USB-Device, which is Micriµm’s USB Device stack, has a code size of only 15 kilobytes. These modules’ memory needs, as well as those of Micriµm’s OTG module, µC/USBOTG, are summarized in the graph in Figure 2. The benefits that an expertly crafted USB module offers easily outweigh the small sacrifices necessary to accommodate such a module. Although developing a high-speed USB host or device without one of these modules is not impossible, it is hardly advisable. With a capable COTS solution on their side, astute engineers can accelerate the transition from full speed to high speed and can quickly move their USB-enabled products to market.
Hardware implications in sustaining highspeed USB bandwidth Most USB-enabled MCUs are limited to 12Mbps full-speed USB 2.0. The problem here is that the amount of data being collected, stored and ultimately offloaded to a storage device for remote processing for today’s embedded controls applications has increased exponentially. Full-speed USB does not compete effectively with 20Mbps SPI or 100Mbps-plus parallel bus. Fortunately, flash MCUs and embedded MPUs are coming to market with on-chip 480Mbps high-speed USB 2.0.
These chips are likely to speed up the adoption of USB for the majority of interconnects between PCBs as well as between the embedded system and its peripherals. Type It is a relatively straightforward task to sustain a 480Mbps data rate in a PC or a 400MHz ARM9-based product running a Microsoft or Linux OS with a single memory space connected to a single high-speed bus. Achieving this on an ARM Cortex M3 flash MCU with a clock frequency of 96MHz is another story. To run at that speed, store the data in external or internal memory, process it and then resend it either over the USB link or another equivalent speed interface (e.g., an SDCard/SDIO or MMC), needs a highly parallel architecture where DMAs stream data without CPU intervention between memories and peripherals, and where the CPU has parallel access to its memory space to process the data. Atmel solved this problem on the SAM3U Cortex M3 Flash Microcontroller with a high-speed USB interface by adapting the multi-layer bus architecture of their ARM9 MPUs to the Cortex M3 and dividing the memory in multiple blocks distributed in the architecture. Three types of DMAs are connected to minimize the loading of any data transfer on the bus and memories, and free the processor for the data processing and system control tasks.
The simplicity and easeof-use of USB software and its high sustainable data rates are leading many engineers to migrate designs to USB-enabled 32bit MCUs Ideally, the central DMA features a built-in FIFO for increased tolerance to bus latency and programmable length burst transfers that optimize the average number of clock cycles per transfer, scatter, gather and linked list operations. It can be programmed for memory-to-memory transfers or memory-to-peripheral like a high-speed SPI or SDIO/SD/ MMC Media Card Interface (MCI). The high-speed DMA used for the USB High-Speed Device (HSD) interface has a dedicated layer in the bus matrix, maximizing parallel data transfers. The risk of waiting for the bus availability has been removed, and the only critical access the programmer needs to manage is the access to the multiple memory blocks. Simultaneous accesses need to be avoided, otherwise a FIFO overrun or underrun can occur and data will be lost or the transfer will be stalled. The peripheral DMA should be tightly integrated in the peripheral programmer’s interface, which will simplify periph-
Embedded Prototyping. SimpliďŹ ed.
Traditional Prototyping Tools
Graphical System Design Tools
Get to market faster and reduce development costs with graphical system design, an approach that combines open, graphical software and off-the-shelf hardware to help you quickly iterate on designs and easily implement them on an NI embedded platform. The NI CompactRIO system offers an ideal embedded prototyping platform with a built-in microcontroller, RTOS, programmable FPGA, integrated signal conditioning, and modular I/O, as well as tight integration with intuitive NI LabVIEW software.
>>
Learn how to simplify embedded design at ni.com/embedded
Š2009 National Instruments. All rights reserved. CompactRIO, LabVIEW, National Instruments, NI, and ni.com are trademarks of National Instruments. Other product and company names listed are trademarks or trade names of their respective companies. 2009-10794-305-101-D
888 279 9833
< TECH FORUM > TESTED COMPONENT TO SYSTEM
EDA Tech Forum September 2009
System Peripherals
Source: Atmel
Cortex-M3 Processor
SRAM 16KB
Flash 128kB
SRAM 32KB
Flash 128kB
4kB DPRAM DMA
MPU
External Bus Interface
High Speed USB Device Central DMA
4kB SRAM
5-Layer AHB Matrix
Backup Unit
48
Peripheral Bridge
Peripheral DMA Controller
Peripheral Bridge
Peripheral DMA Interface Low Speed Peripherals ADC, Timer, PWM, UAR, I2C
High Speed Peripherals MMC/SDCard, SDIO, SPI, I2S
FIGURE 3 Block diagram of the SAM3 with multi-layer bus, DMAs and high-speed interfaces (HSMCI, EBI) eral driver development. It should have a reduced gate count to generalize its implementation without a serious cost adder reducing processor overhead in data transfers. Gate count reduction can be achieved by removing local storage capabilities and reducing linked list support to two memory areas. Multiple data memory blocks should be distributed in the microcontroller. For example, two central SRAM blocks can allow the CPU to run from one with the DMAs loading and storing in parallel from the other. There should be several FIFOs built into the high-speed peripherals and DMA controller, including a 4KB DPRAM in the USB HSD interface. These memories reduce the impact of the bus or memory latency on the high-speed transfer. The programmer can allocate the 4KB DPRAM in the USB HSD to the different end points, except for the control end point since its data rate is low. Up to three buffers can be allocated to a single end point to support micro-chain messages. Table 1 (p.45) provides benchmark data on the effective data rates for the different operating modes of the USB HS Interface in Atmelâ&#x20AC;&#x2122;s Cortex M3-based SAM3U. The data is streamed in parallel to the processor doing the data packing or unpacking. The delta between the effective data rate and the maximum 480Mbps or 60MBs in Bulk, Interrupt and Isochronous modes, are due to the protocol overhead and not to any architectural limits. The gap between the data requirements of embedded systems and the hardware that moves and processes that data has been growing exponentially in recent years. Recent developments in both microcontrollers and software capable of supporting high-speed USB provide a much needed solution. In the early stages of adoption, the majority of users are
unlikely to run at the maximum 480 Mbps data rate. More likely, they will run at tens or hundreds of Mbps, to escape the limitations of full speed USB (12Mbps) or SPI (tens of Mbps). However, over time, data requirements will continue to grow and thereby push demands on any system. Running at the maximum 480 Mbps data rate on a Cortex M3 class flash MCU is feasible through a careful design of the internal bus, memory and DMA architecture. Using COTS software takes the burden and risk away for the software developer, providing the best guarantee for USB compliance and interoperability in the minimum amount of time. The use of market-standard implementations of the USB host interface defined by Intel increases the choice in OSs, RTOSs. Micrium 1290 Weston Road Suite 306 Weston FL 33326 USA T: 1 954 217 2036 W: micrium.com Atmel 2325 Orchard Parkway San Jose CA 95131 USA T: 1 408 441 0311 W: www.atmel.com
EDATechForce EDA Globalizing Sales & Support™
Your Technical Sales Rep Company Professional Sales Team EDATechForce understands “Business is Personal”. Our customers trust us to bring them strong, competent technology that we can personally support.
Our Mission: Provide outstanding Sales and Technical Support Services to world -wide EDA and IP partners, for the benefit of our valued customers.
Technical Advantage Spend less time evaluating EDA products ... … More time designing x US-based state-of-the-art compute center x Our AEs are experts in the entire IC design flow x Let us help you find the Technology you need x
EDATechForce, LLC www.EDATechForce.com Our Partners
3000 Scott Blvd. Suite 116 Santa Clara, CA 95054 Phone: (408) 855-8882 Email: sales@EDATechForce.com
50
< TECH FORUM > TESTED COMPONENT TO SYSTEM
Ensuring reliability through design separation Paul Quintana, Altera Paul Quintana is a senior technical marketing manager in Altera’s military and aerospace business unit, focusing on secure communication and cryptography, DO-254 and software defined radio. He holds an MSEE and a BSEE from New Mexico State University.
FPGAs are a ubiquitous part of today’s processing landscape. Their use has extended from their long-established role as glue logic interfaces to the very heart of the advanced information-processing systems used by core Internet routers and high-performance computing systems. What remains common throughout this evolution is the drive to integrate more functionality in less space while decreasing power and cost. High-reliability system design—as well as other system design areas such as information assurance, avionics and industrial safety systems—sets similar requirements for reduced system size, power and cost. Traditionally, highreliability systems designs have approached reliability through redundancy. The drawback with redundancy, however, is increased component count, logic size, system power and cost. Altera has developed a strategy that addresses the conflicting needs for low power, small size and high functionality while maintaining the high reliability and information assurance these applications require. The design separation feature in its Quartus II design software and Cyclone III LS FPGAs gives designers an easy way of executing high-reliability redundant designs using single-chip FPGA-based architectures.
Life-cycles and reliability The concept of reliability engineering has been driven by the U.S. Department of Defense (DoD) since its first studies into the availability of Army and Navy materiel during World War II. For example, the mean time to failure (MTBF) of a bomber was found to be less than 20 hours, while the cost to repeatedly repair the bomber would eventually reach more than ten times the original purchase price. Subsequently, the total life-cycle cost has become a critical metric for system specification and design. High-assurance cryptographic systems have historically followed a similar path. Failures in a cryptographic system affect the total life cycle in such fundamental terms as security for military systems and commerce for financial systems. Given this context, high-assurance cryptographic systems have similar design and analysis requirements to high-reliability systems. In each case, the designer’s goal is to shrink the PCB size and reduce the number of components needed for a particular application. This has been the trend in the electronics industry for decades, most recently in system-on-chip (SoC) ASICs and today progressing to SoC FPGAs. Developing SoC ASICs consolidated external digital logic into a single device. This paradigm progressed successfully until the cost and schedules of ASIC development exceeded market money and time budgets. With ASIC costs having grown so much, system designers are increasingly turning to FPGAs Source: Altera
U.S. “Orange Book” TCSBC (1985) French “Blue-White-Red Book” German IT-Security Criteria Netherlands Criteria U.K. Systems Security Confidence Levels U.K. “Green Books” (All 1989)
U.S. Federal Criteria Draft (1992) European ITSEC (1991)
1985
FIGURE 1 Evolution of security criteria design and analysis
Canadian Criteria CTCPEC (1993)
Common Criteria, ISO 15408 v1.0 (1996) v2.0 (1998) v2.1 (1999) v2.2 (2004) v3.1 (2006)
2007
EDA Tech Forum September 2009
where performance and logic densities enable logic consolidation onto a reprogrammable chip. However, while the growth in SoC designs has been steady for many years, the design and complexity of FPGAs have until now prevented the integration of redundant designs. Many system and security analysts deemed the analysis necessary to verify separate and independent datapaths too difficult. By working with certification authorities, Altera has simplified complex FPGA device analysis and ensured separate and independent datapaths. By providing users with FPGA tools and data flows that have this analysis in mind from the start, we enable designers to consolidate fail-safe logic designs into a single FPGA fabric. This allows them to meet development budgets and also the requirements of highreliability and high-assurance applications.
Information-assurance applications Information-assurance equipment must provide a high level of trust in the design and implementation of the cryptographic equipment. Guaranteeing a complex system design is trustworthy requires robust design standards and system analysis, and several security-design standards and evaluation bodies exist. While explaining the design requirements and evaluation criteria used by each of these bodies exceeds the scope of this article, an overview of their evolution and complexity is shown in Figure 1. IT systems have the greatest influence on information assurance. With an ever-increasing number of infrastructure-control systems, and with corporate and personal information accessible via the Internet, they are increasingly relied on to protect sensitive information and systems from hackers and criminals. To provide information assurance on the Internet, a user must not only inspect data for viruses, but also protect sensitive information by using security and encryption technologies such as IPsec and HTTPS. While the HTTPS cryptographic algorithm is typically implemented in software running on a computer platform, IPsec and virtual private network (VPN) encryption applications usually require higher performance and rely more heavily on hardware. Network IT equipment must be evaluated at all appropriate levels to ensure trust in the overall system. This trust must be proven by hardware analysis of each IT component, ensuring that information-assurance levels meet the security requirements of either the Common Criteria or Federal Information Processing Standard (FIPS) 140-2 or 140-3. As shown in Table 1 (p.53), this analysis is complex and can greatly extend the design cycle.
Commercial cryptography The financial industry today drives the development of commercial cryptography and cryptographic equipment. Its need
System designs have traditionally achieved reliability through redundancy, even though this inevitably increases component count, logic size, system power and cost. The article describes the design separation feature in Altera software that seeks to address these as well as todayâ&#x20AC;&#x2122;s conflicting needs for low power, small size and high functionality while maintaining high reliability and information assurance. for information assurance has become ever more pervasive, as its use of technology has grown from inter- and intra-bank electronic data interchange (EDI) transactions, to public automatic teller machines (ATMs), to high-performance cryptographic applications driving electronic commerce. Like the military, commercial electronic commerce needs commonly accepted standards for the design and evaluation of cryptographic hardware. The financial industryâ&#x20AC;&#x2122;s need for cryptographic interoperability has been a key differentiator in this market. Commerce extends beyond national boundaries and therefore so must the cryptographic equipment it uses. A major complication in this landscape is the classification of cryptography as a regulated technology under the International Traffic in Arms Regulations (ITAR). High-performance electronic-commerce cryptographic equipment is developed mainly by large server manufacturers that can invest in the expertise and long design cycles necessary to create FIPS 140-2-certified modules. Source: Altera
The portions of logic shown in blue and in yellow are in separate, secure partitions
FIGURE 1 D esign separation for high reliability and information assurance
High-reliability applications Industrial applications also take advantage of the design separation and independence available from FPGAs. For example, increasing numbers of embedded control units (ECUs) are used in automobiles with increasing complexity and functionality. ECU designers must maintain reliability while reducing size and cost. An ability to separate redundant logic within a single FPGA allows them to reduce the number of system components while maintaining fault isolation. Continued on next page
51
52
< TECH FORUM > TESTED COMPONENT TO SYSTEM
Source: Altera
Verilog HDL (.v)
VHDL HDL (.vhd)
AHDL (.tdf)
Block Design File (.bdf)
EDIF Netlist (.edf)
VQM Netlist (.vqm)
Partition Top Design Partition Assignments
Partition 1 Partition 2
Settings & Assignments
Analysis & Synthesis Synthesize Changed Partitions, Create/Modify Design Partition Assignments Preserve Others
Settings & Assignments
One Post-Synthesis Netlist per Partition
Partition Merge Create Complete Netlist Using Appropriate Source Netlists for Each Partition (Post-fit, Post Synthesis or imported Netlist)
One Post-Fit Netlist per Partition
Single Netlist for complete Design
Fitter Place-and-Route Changed Partitions Preserve Others Create Individual Netlists and Complete Netlists
Single Post-Fit Netlist for Complete Design
Assembler
Settings & Assignments Floor Plan Location Assignments • Create/Modify Initial Floorplan • Assign Security Attributes • Assign Routing Regions and Signals • Assign I/O
Timing Analyzer
Requirements Satisfied?
No
Yes Program/Configure Device
FIGURE 3 High-assurance design flow using incremental compile
Make Design & Assignment Modifications
EDA Tech Forum September 2009
Source: Altera
#
Section
Security Level 1
Security Level 2
Security Level 3
Security Level 4
1
Cryptographic module specification
Specification of cryptographic module, cryptographic boundary, approved algorithms, and approved modes of operation Description of cryptographic module, including all hardware, software, and firmware components Statement of module security policy
2
Cryptographic module ports and interfaces
Required and optional interfaces
3
Roles, services, and authentication
Logical separation of required and optional roles and services
4
Finite state model
Specification of finite state model
Specification of all interfaces and of all input and output datapaths Role-based or identitybased operator authentication
Data ports for unprotected critical security parameters logically separated from other data ports Identity-based operator authentication
Required states and optional states State transition diagram and specification of state transitions
5
Physical security
Production-grade equipment
Locks or tamper evidence
Tamper detection and response for covers and doors
Tamper detection and response envelope EFP and EFT
6
Operational environment
Single operator Executable code Approved integrity technique
Referenced PPs evaluated at EAL2 with specified discretionary access control mechanisms and auditing
Referenced PPs plus trusted path evaluated at EAL3 plus security policy modeling
Referenced PPs plus trusted path evaluated at EAL4
7 Cryptographic key management
Key management mechanisms: random number and key generation, key establishment, key distribution, key entry/output, key storage, and key zeroization Secret and private keys established using manual methods may be entered or output in plaintext form.
Secret and private keys established using manual methods shall be entered or output encrypted or with split knowledge procedures.
8
EMI/EMC
7 CFR FCC Part 15, Subpart B, Class A (Business use), Applicable PCC requirements (for radio)
7 CFR FCC Part 15, Subpart B, Class B (Home use)
9
Self tests
Power-up tests: cryptographic algorithm tests, software/firmware integrity tests, critical functions tests, conditional tests
Statistical RNG tests callable on demand
Statistical RNG tests performed at powerup
10
Design assurance
Configuration management (CM) Secure installation and generation
High-level language implementation
Formal model Detailed explanations (informal proofs) Preconditions and post conditions
CM system Secure distribution Functional specification
Design and policy correspondence Guidance documents -
Mitigation of other attacks
Specification of mitigation of attacks for which no testable requirements currently are available
TABLE 1 FIPS 140-2 security requirements
Design separation Information-assurance and high-reliability applications currently require at least two chips to ensure the logic remains separate and functions independently. This ensures that a fault detected in one device does not affect the remainder of the design. In cas-
es where design separation is criticalâ&#x20AC;&#x201D;such as financial applications, where data must be encryptedâ&#x20AC;&#x201D;data must not be able to leak from one portion of the design to another in the event of an inadvertent path being created by a fault. In cases where high reContinued on next page
53
54
< TECH FORUM > TESTED COMPONENT TO SYSTEM
liability is critical—such as industrial systems where entire manufacturing lines may be shut down if one piece of equipment fails—redundant circuits continue to control the system in the event of a main circuit failing, ensuring little to no downtime. The design separation feature in the Quartus II design software allows designers to maintain the separation of critical functions within a single FPGA. This separation is created using Altera’s LogicLock feature. This allows designers to allocate design partitions to a specific section of the device. When the design separation flow is enabled, as shown in Figure 2, each secure partition has an automatic fence (or ‘Keep out’ region) associated with it. In this way, no other logic can be placed in the proximity, creating one level of increased fault tolerance. However, to ensure true separation, the routing also must be separated. Therefore, all routing is restricted to the LogicLock area of the design partition. This means that the fence region does not contain logic and does not allow routing to enter or exit the fence, ensuring the region’s physical isolation from any other function in the device. Routing interfaces can then be created using interface LogicLock regions. These interface LogicLock regions can route signals into or out of separated regions by creating an isolated channel between two separated partitions. This is effectively the same as using two physical devices to ensure separation. Altera has designed the Cyclone III LS fabric architecture to ensure the separation results in an increased fault tolerance with the minimal fence size, enabling designers to use over 80% of the resources for their design. The design separation flow also enables specific banking rules that ensure the separation created in the fabric for critical design partitions extends to the I/Os. The Cyclone III LS packages also are designed to support such I/O separation.
Single-chip high-assurance design flow This uses a standard incremental compile design flow (Figure 3, p.52) with five additional steps during floorplanning: • Create design partition assignments for each secure region using incremental compilation and floorplanning. Each secure region must be associated with one partition only, which means the design hierarchy should be organized early in the design process. • Plan and create an initial floorplan using LogicLock regions for each secure partition. Top-level planning early in the design phase helps prevent and mitigate routing and performance bottlenecks. • Assign security attributes for each LogicLock region. Locked regions are used for those parts of a design that require design separation and independence. • Assign routing regions and signals. To ensure each signal path is independent, a secure routing region must be created for every signal entering or leaving a design partition. • Assign I/Os. Each secure region with fan-outs to I/O pins cannot share a bank with any other secure region to ensure design separation and isolation.
The design separation feature is fully supported using the Mentor Graphics ModelSim verification environment, allowing designers to achieve high system reliability through logical redundancy. ModelSim allows designers to verify the functional equivalence of redundant logic on a single Cyclone III LS FPGA.
Conclusion Requirements for high-reliability and information-assurance systems have many similarities. Both systems require design separation and independence, as each system requires redundancy to ensure proper design operation in the event of hardware faults. Traditionally, the implementation of redundancy increases system size, weight, power and costs because this redundancy is implemented at the board level. To reduce these factors, low-power FPGA processes can be used with a high-assurance design flow to meet stringent NSA Fail Safe Design Assurance requirements. By ensuring design separation and independence, redundant logic can be transferred from the board level to a single FPGA as part of a SoC design approach. Combining low-power, high-logic density and design-separation features allows developers of high-reliability, high-assurance cryptographic and industrial systems to minimize design development and schedule risk by using reprogrammable logic, and to improve productivity by using a proven incremental-compile design flow.
Further information
Cyclone III FPGAs—Security, www.altera.com/products/devices/cyclone3/overview/security/cy3-security.html Partitioning FPGA Designs for Redundancy and Information Security, webcast, www.altera.com/education/webcasts/all/wc-2009-partitioning-fpgaredundancy.html AN 567: Quartus II Design Separation Flow, www.altera.com/literature/an/an567.pdf Protecting the FPGA Design From Common Threats, www.altera. com/literature/wp/wp-01111-anti-tamper.pdf
Altera 101 Innovation Drive San Jose, CA 95134 USA T: 1 408 544 7000 W: www.altera.com
at the heart... of SoC Design ARM IP — More Choice. More Advantages. • Full range of microprocessors, fabric and physical IP, as well as software and tools • Flexible Foundry Program offering direct or Web access to ARM IP • Extensive support of industry-leading EDA solutions • Broadest range of manufacturing choice at leading foundries • Industry’s largest Partner network
www.arm.com The Architecture for the Digital World® © ARM Ltd.AD123 | 04/08
Low power Highest functionality in its class First 65-nm low-cost FPGA
VERY COOL Cool off your system with Altera® Cyclone® III FPGAs. The market’s first 65-nm low-cost FPGA features up to 120K logic elements—2X more than the closest competitor—while consuming as little as 170 mW static power. That’s an unprecedented combination of low power, high functionality, and low cost— just what you need for your next power-sensitive, high-volume product. Very cool indeed.
Copyright © 2008 Altera Corporation. All rights reserved.
www.altera.com