EDA Tech Forum Journal: June 2009

Page 1

The Technical Journal for the Electronic Design Automation Community

www.edatechforum.com

Volume 6

Issue 3

June 2009

Embedded ESL/SystemC Digital/Analog Implementation Tested Component to System Verified RTL to Gates INSIDE: Design to Silicon

DAC putting users on fast track

Refining the embedded OS soup Making virtual prototypes real

Highlighting computational litho Mastering antenna design basics


COMMON PLATFORM TECHNOLOGY Industry availability of real innovation in materials science, process technology and manufacturing for differentiated customer solutions.

Chartered Semiconductor Manufacturing, IBM and Samsung provide you the access to innovation you need for industry-changing 32/28nm high-k/metal gate (HKMG) technology with manufacturing alignment, ecosystem design enablement, and flexibility of support through Common Platform technology. Collaborating with some of the world’s premier IDMs to develop leading-edge technology as part of a joint development alliance, Chartered, IBM and Samsung provide access to this technology as well as qualified IP and robust ecosystem offerings to help you get to market faster, with less risk and more choice in your manufacturing options. Visit www.commonplatform.com today to find out how you can get your access to innovation. To Find Out More, Visit Us at These Upcoming EDA TF Locations: August 25 - Hsinchu, Taiwan August 27 - Seoul, Korea September 1 - Shanghai, China September 3 - Santa Clara, CA., USA September 4 - Tokyo, Japan October 8 - Boston, MA., USA

www.commonplatform.com


EDA Tech Forum June 2009

contents < COMMENTARY >

< TECH FORUM >

6

18

Start Here

Embedded

Why DAC and DATE still matter

Embedded software virtualization comes of age

Networking and interaction have a critical value during a downturn.

8 Conference

At the sharp end

DAC 2009 will see the launch of a track dedicated to real-world design.

12 USB Focus

Connecting to embedded design

We review some of the sector’s increasingly common USB implementations. Analysis

LynuxWorks

24 ESL/SystemC

Using TLM virtual system prototype for hardware and software validation Mentor Graphics

28 ESL/SystemC

Bridging from ESL models to implementation via high-level hardware synthesis CoFluent Design

Nice in Nice

34

Analysis

Parallel transistor-level full-chip circuit simulation

We review the major trends from DATE 2009 multicore, aggregation, ESL and more.

Kermit would blush

Just what is a ‘green’ design strategy and what should it contain. Profile

Renaissance man

Verified RTL to Gates

University of California, San Diego

40 Digital/Analog Implementation

Reducing system noise with hardware techniques

Saul Griffith isn’t just trying to save the planet - he also proves engineers can match charisma with nous.

Texas Instruments

All content now online at www.edatechforum.com. Register and read.

Computational scaling: implications for design

46 Design to Silicon

IBM Microelectronics

EDA Tech Forum Volume 6, Issue 3 June 2009

EDA Tech Forum Journal is a quarterly publication for the Electronic Design Automation community including design engineers, engineering managers, industry executives and academia. The journal provides an ongoing medium in which to discuss, debate and communicate the electronic design automation industry’s most pressing issues, challenges, methodologies, problem-solving techniques and trends. EDA Tech Forum Journal is distributed to a dedicated circulation of 50,000 subscribers.

50 Tested Component to System

Antenna design considerations LS Research

EDA Tech Forum is a trademark of Mentor Graphics Corporation, and is owned and published by Mentor Graphics. Rights in contributed works remain the copyright of the respective authors. Rights in the compilation are the copyright of Mentor Graphics Corporation. Publication of information about third party products and services does not constitute Mentor Graphics’ approval, opinion, warranty, or endorsement thereof. Authors’ opinions are their own and may not reflect the opinion of Mentor Graphics Corporation.

3


4

team < EDITORIAL TEAM > Editor-in-Chief Paul Dempsey +1 703 536 1609 pauld@rtcgroup.com

Managing Editor Marina Tringali +1 949 226 2020 marinat@rtcgroup.com

< CREATIVE TEAM > Creative Director Jason Van Dorn jasonv@rtcgroup.com

Art Director

Kirsten Wyatt kirstenw@rtcgroup.com

Graphic Designer Christopher Saucier chriss@rtcgroup.com

Copy Editor Rochelle Cohn

< SALES TEAM > Advertising Manager Stacy Mannik +1 949 226 2024 stacym@rtcgroup.com

Advertising Manager Lauren Trudeau +1 949 226 2014 laurent@rtcgroup.com

Advertising Manager Shandi Ricciotti +1 949 573 7660 shandir@rtcgroup.com

< EXECUTIVE MANAGEMENT TEAM > President

John Reardon johnr@rtcgroup.com

Vice President

Cindy Hickson cindyh@rtcgroup.com

Vice President of Finance Cindy Muir cindym@rtcgroup.com

Director of Corporate Marketing Aaron Foellmi aaronf@rtcgroup.com

,I JDWHV ZHUH EHHU ZKLFK JODVV ZRXOG \RX FKRRVH"

6S\JODVV . JDWHV

5HJLVWHUHG WUDGHPDUN RI $WUHQWD

+LJKHVW FDSDFLW\ )DVWHVW VSHHG (DVLHVW WR XVH %HVW YDOXH

,QGLJR 57/ $QDO\VLV PLOOLRQ JDWHV 2QH UXQ

7KH 7HFKQRORJ\ /HDGHU LQ 57/ $QDO\VLV 7LPLQJ &RQVWUDLQW *HQHUDWLRQ 9DOLGDWLRQ

Untitled-9 1

6/1/09 3:35:03 PM


C O N VE R G E N CE

www.apache-da.com

Power. Noise. Reliability. From prototype to signoff, we provide a unified CPS platform to manage noise and co-optimize your power delivery network across the system. Mitigate design failure risk, reduce system cost, and accelerate time-to-market with Apache.

visit us at

Booth #722


6

< COMMENTARY > START HERE

start here Why DAC and DATE still matter Our preview of the forthcoming Design Automation Conference concentrates on the User Track that makes its debut there next month. Given that it shares many of the objectives behind this journal, that is hardly surprising. However, it is not the only aspect of DAC that merits investigation. Also in the program, conference chair Dr. Andrew Kahng and Dr. Juan-Antonio Carballo of IBM are holding a workshop to look into developing a new generation of roadmap for EDA. Certainly, those that already exist are not perfect—although recasting the concept will not be easy either. EDA is, by its nature, an industry that responds to clients who hold their R&D cards very closely. Meanwhile, classical scaling is broken and much of what will comprise the worlds of architecture and process within 10 years is up for debate. Nobody wants to force themselves into making big bets based on limited visibility and certain volatility. The answer of course is for any new roadmapping activity to be as inclusive as possible. Again, a further question arises as to whether or not that can be best achieved within the existing ITRS framework or if it requires one that is separate but complementary. However, where Kahng and Carballo may be right is in deciding that it is time to put these issues back up for debate. In some respects, this attempt to broaden the scope of the roadmapping debate also reflects a trend that was seen in France earlier this spring at the Design Automation and Test in Europe (DATE) conference, and it feeds into a bigger issue as to why conferences still matter, even in as severe a downturn as this. DATE has been something of a whipping boy of late, and it is probably fair to say that its days as a top-level commercial exhibition are over. However, it always has been a very strong technology conference, and that was still true in 2009. Indeed, full delegate attendance (i.e., those there for the papers, not the stands) barely fell compared with 2008, despite global economic woes. Going back to DAC’s new User Track, DATE introduced invited industrial papers a few editions ago. It has also, as many at DAC admit, typically been a step ahead in how it addresses ESL. And this year, its debates, sessions and discussions were impressive in how they genuinely encompassed the views of both users and various forms of vendors. Solutions today are far more likely to come from consensus than from one company’s eureka moment. It is here that conferences like DATE and DAC (and ISQED, ICCAD and others) really come into their own, even though many may be tempted to slash their companies’ human presence. Meanwhile, our report from DATE appears in the ‘Editor’s Cut’ extended edition of EDA Tech Forum, just posted at our website alongside coverage of the tricky area that is green design. Find out more at www.edatechforum.com. Paul Dempsey Editor-in-Chief


Digital Signal Controllers Analog Serial EEPROMs

The Microchip name and logo, the Microchip logo, MPLAB and PIC are registered trademarks of Microchip Technology Incorporated in the U.S.A. and other countries. © 2009, Microchip Technology Incorporated. All Rights Reserved.

Microcontrollers

Why Do More Design Engineers Use PIC® Microcontrollers?

8 Reasons why Microchip is the Worldwide Leader in 8-bit Microcontrollers: 1) Broad portfolio of more than 250 8-bit PIC® microcontrollers 2) Comprehensive technical documentation, free software and hands-on training 3) MPLAB® IDE is absolutely free, and supports ALL of Microchip’s 8, 16, and 32-bit microcontrollers 4) Low-cost development tools help speed up prototyping efforts 5) Easy migration with pin and code compatibility 6) Shortest lead times in the industry 7) The only supplier to bring USB, LCD, CAN, Ethernet, and capacitive touch sensing to the 8-bit market 8) World-class, 24/7 technical support

GET STARTED TODAY!

Visit www.microchip.com/easy for a 20% discount off of our most popular 8-bit development tools!

www.microchip.com/easy


8

< COMMENTARY > CONFERENCE

At the sharp end DAC’s new User Track aims to add a more pragmatic flavor to chip design’s largest conference The Design Automation Conference (DAC) returns to San Francisco’s Moscone Center, July 26th-31st, and it is hoped that its proximity to Silicon Valley will see attendances hold up well even in tough times. However, the organizers are looking to more than just geography to guarantee continued interest in chip design’s main annual gathering. This 46th edition of DAC will also see the return of the CEO panel featuring Lip-Bu Tan of Cadence Design Systems, Wally Rhines of Mentor Graphics, and Aart de Geus of Synopsys. There will also be keynotes from technologists such as William Dally of graphics powerhouse Nvidia, and Fu-Chieh Hsu of leading foundry TSMC. And there will be 29 panels spread across the main conference and the Pavillion on the exhibition floor, including one that will have been voted into being by attendees (its topic was still to be determined as the magazine went to press). Add DAC’s typically strong technical program and you are already at the point where the conference appears to offer something for everyone. Or, actually, does it? One feature being launched this year has been explicitly designed to meet a need that both users and vendors have wanted the conference to address in more detail: the User Track. The new feature has been put together by Leon Stok, director of EDA for the IBM Systems and Technology Group, and Soha Hassoun, associate professor in the Computer Science Department of Tufts University. Stok explains the objective, “There is a strong desire among engineers to get their hands on more information about tools, flows and their use to solve the problems they face every day. But at DAC, the main parts of the technical program have tended to be more academic and focused on research. Meanwhile on the floor, the vendors are advertising their tools in a very specific way. What was missing was something in the middle. That is the void we are trying to fill.” The idea is not entirely new. DAC’s equivalent across the Atlantic, Design Automation and Test in Europe, has also recently launched a track dedicated to industrial case studies and methodologies in everyday use. The initiative proved popular there, and, based on the response from potential contributors, the same looks likely to happen in the USA.

“We have had 117 submissions, and we expected only 50 to 70 in the first year, so it’s gone a lot further than expectations,” says Hassoun. From that, the track has been divided ultimately into 16 front-end presentations (overseen by Hassoun) and 26 back-end presentations (overseen by Stok), plus a packed poster session (see box opposite and on p. 10). Of course, some may say that DAC is merely duplicating what the major vendors achieve via their user group meetings. These have also long been based around practical case studies and the so-called ‘war stories’ of how tool users overcame a particular design challenge. At a time when some EDA players have been pushing more investment into these events, you might even suggest that DAC’s User Track would be the source of some tension. “That’s just not the case, though,” says Stok. “The vendors also want there to be a forum at DAC for the exchange of very practical design information in a technical setting that is decoupled from the marketing message.” Indeed, evidence of that comes in the form of the User Track’s sponsor, Cadence, the vendor with perhaps the greatest reputation prior to this DAC for wanting to ‘go it alone’. Hassoun also says that DAC’s User Track can fill another vacuum that single-vendor shows cannot. “The vendor meetings are useful but also, by definition, more focused. And they don’t offer the chance to compare, whereas we have papers from designers working across various flows and with combinations of tools from different sources. It isn’t always the case that you will take everything from one supplier. And again, we have found a number of submissions where the vendor encouraged the user to write the paper up.” There are also some general trends emerging. At the front-end, Hassoun sees part of the function as bridging the gap between the stronger adoption of ESL strategies seen in Europe and Japan and the more traditional design strategies that are now beginning to move in the same direction in North America. To that end, she is also involved in a workshop of the fast-developing ESL concept of virtual platforms that takes place on Wednesday at this year’s DAC (Room 301). Continued on page 10


EDA Tech Forum June 2009

Design Automation Conference 2009 User Track Program

The User Track runs from Tuesday, July 28 until Thursday, July 30. All the main sessions take place in Room 132 at the Moscone Center. There is also a separate Poster Session on Wednesday, July 29 from 1.30pm-3.00pm in the Concourse Area. Details on this can be found on page 10. All details were correct as EDA Tech Forum went to press, but, as ever, may be subject to late change or cancellation. For the most accurate version of the agenda for this and all other DAC events, check the website, www.dac.com.

Tuesday, July 28 10.30am-12.00pm

Robust Design and Test

1.1 Electromagnetic Interference Reduction on an Automotive Microcontroller, STMicroelectronics 1.2s Power Integrity Sign-Off Flow Using CoolTime and PrimeTime-SI—Flow and Validation, Aptina Imaging 1.3s Improving Parametric Yield in DSM ASIC/SOC Design, Samsung 1.4 Low-Power Test Methodology, STMicroelectronics

2.00pm-4.00pm

Practical Physical Design

2.1 Automated Pseudo-Flat Design Methodology for Register Arrays, Intel 2.2 Qualcomm DSP Semi-Custom Design Flow: Leveraging Place and Route Tools in Custom Circuit Design, Qualcomm 2.3s Auto ECO Flow Development for Functional ECO Using Efficient Error Rectification Method Based on Conformal, Intel 2.4s Monte Carlo Techniques for Physical Synthesis Design Convergence Exploration, Intel 2.5s Tortoise: Chip Integration Solution, STMicroelectronics 2.6s ASIC Clock Distribution Design Challenges, Intel

4.30pm-6.30pm Verification: A Front-End Perspective 3.1 Interactive 2-D Projection Cross Coverage Viewer for Coverage Hole Analysis, ClueLogic, Verifore 3.2 Verification of Power Management Protocols Through Abstract Functional Modeling, Intel, Ipflex 3.3 Design Flow for Embedded System Device Driver Development and Verification, Cadence Design Systems, Virtutech

Wednesday, July 29th 9.00am-11.00am

Timing Analysis in the Real World

4.1 Design of a Single-Event Effect Fault Tolerant Micro-Processor for Space Using Commercial EDA Tools, European Space Agency, Atmel 4.2 SSTA and Its Application to SPARC 64 Processor Design, Fujitsu 4.3s A Hierarchical Transistor and Gate-Level Statistical Timing Flow for Micro-Processor Designs, IBM 4.4s Unifying Transistor- and Gate-Level Timing Through the Use of Abstraction, IBM 4.5s The Automatic Generation of Merged-Mode Design Constraints, Texas Instruments, FishTail Design Automation 4.6s Modeling Clock Network Variation in Timing Verification, Sun Microsystems

1.30pm-3.00pm Poster Session and Ice Cream Social See page 10

3.00pm-4.00pm

Toward Front-End Design Productivity

6.1 Unified Chip/Package/Board Codesign Flow for Laminate, Leadframe, and Wafer-Level Packages in a Distributed Design Environment, Infineon Technologies, Cadence Design Systems, CISC Semiconductor Design+Consulting

6.2s Fast FPGA Resource Estimation, Xilinx 6.3s Assessing Design Feasibility Early with Atrenta’s 1TeamImplement SOC, STMicroelectronics, Atrenta

4.30pm-6.00pm Front-End Development: Embedded Software and Design Exploration 7.1s Applying Use Cases to Microcontroller Code Development, Cypress Semiconductor 7.2s Mapping the AVS Video Decoder on a Heterogeneous DualCore SIMD Processor, University of Thessaly 7.3s An ‘Algorithm to Silicon’ ESL Design Methodology, STMicroelectronics 7.4s Necessary but Not Sufficient: Lessons and Experiences with High-Level Synthesis Tools, Texas Instruments 7.5 Switching Mechanism in Mixed TLM-2.0 LT/AT System, Intel

Thursday, July 30 9.00am-11.00am

Power Analysis and IP Reuse

8.1 Dynamic Power Analysis for Custom Macros Using ESP-CV, Qualcomm 8.2 Power Supply and Substrate Noise Analysis; Reference Tool Experience with Silicon Validation, Kobe University, ARTEC, STARC, Apache Design Solutions 8.3s Modeling and Design Challenges for Multicore Power Supply Noise Analysis, IBM 8.4s Dynamic Power Noise Analysis Method for Memory Designs, Samsung 8.5s Hard IP Reuse in Multiple Metal Systems SOCs, Texas Instruments, Freescale Semiconductor 8.6s Apache Redhawk di/dt Mitigation Method in Power Delivery Design and Analysis, Intel

2.00pm-4.00pm Front-End Power Planning and Analysis 9.1 Power Library Pre-Characterization Automation, NEC, NEC HCL ST 9.2s Chip - Package - PC Board Codesign: Applying a Chip Power Model in System Power Integrity Analysis, Cisco Systems 9.3s New SOC Integration Strategies for Multi-Million Gate, MultiPower/Voltage Domain Designs, Texas Instruments, Atrenta 9.4 PETRA: A Framework for System-Level Dynamic Thermal and Power Management Exploration, Intel 9.5 ALPES: Architectural-Level Power Planning and Estimation, STMicroelectronics, ST-NXP Wireless, ST-Ericsson

4.30pm-6.00pm Advances in Analog and Mixed-Signal Design 10.1 A Schematic Symbol Library for Collaborative Analog Circuit Development Across Multiple Process Technologies, Stanford University, NetLogic Microsystems, Rambus 10.2 A Mixed-Signal/MEMS CMOS Codesign Flow with MEMS-IP Publishing/Integration, National Chaio Tung University 10.3s Substrate Noise Isolation Characterization in 90nm CMOS Technology, Magwel, NXP Semiconductors 10.4s An Integrated Physical-Electrical Design Verification Flow, Mentor Graphics, SySDSoft

9


10

< COMMENTARY > CONFERENCE

The papers below will be presented at a poster session and ice cream social that will supplement the main User Track sessions, from 1.30pm-3.00pm on Wednesday, July 29. The event will be held on the Concourse Level of the Moscone Center. The track’s organizers are also encouraging DAC delegates to attend the social to provide feedback on this new section of the conference program. However, if you are unable to attend this or cannot otherwise contact Leon Stok and Soha Hassoun during the conference, you can also email Soha at soha@cs.tufts.edu.

A Simple Design Rule Check for DP Decomposition, National Taiwan University Algorithm for Analyzing Timing Hot-Spots, eInfochips An On-Chip Variation Monitor Methodology Using Cell-Based P&R Flow, Faraday Technology Application and Extraction of IC Package Electrical Models for Support of Multi-Domain Power and Signal Integrity Analysis, Freescale Semiconductor, Sigrity Applications of Platform Explorer, Integrator and Verifier in SOC Designs, Samsung

User Track – Front End

Assertion Based Formal Verification in SOC Level, Wipro Technologies

3-D Visualization of Integrated Circuits in the Electric VLSI Design System, Sun Microsystems

Attacking Constraint Complexity in Verification IP Reuse, Cisco Systems, Synopsys

Automatic Generation, Execution and Performance Monitoring of a Family of Multiprocessors on Large Scale Emulator, ENSTA, EVE

Automated Assertion Checking in Static Timing with IBM ASICs, IBM

C-Based Hardware Design Using AutoPilot Synthesizing MPEG-4 Decoder onto Xilinx FPGA, University of California - Los Angeles, AutoESL Design Tech

Case Study of Diagnosing Compound Hold-Time Violations, Realtek Semiconductor, Mentor Graphics

C-Based High-Level Synthesis of a Signal Processing Unit Using Mentor Graphics Catapult C, University of Tübingen, Robert Bosch Design and Verification Challenges of ODC-Based Clock Gating, PwrLite Effective Debugging Chip-Multiprocessor Design in Acceleration and Emulation, Chinese Academy Enabling IP Quality Closure at STMicroelectronics with VIP Lane, STMicroelectronics, Satin IP Technologies Formal Verification Based Automated Approaches to System-OnChip DFT Logic Verification, Texas Instruments Interactive Code Optimization for Dynamically Reconfigurable Architecture, Toshiba Power Gated Design Optimization and Analysis with Silicon Correlation Results, Intel

Design Profiling – Modeling the ASIC Design Process, IBM Enhanced SDC Support for Relative Timing Designs, University of Southern California, University of Utah Hold Time ECO for Hierarchical Design, Global Unichip, Dorado Design Automation Improving the Automation of the System in Package (SIP) Design Environment via a Standard and Open Data Format, IBM Interconnect Explorer: A High-Level Power Estimation Tool for OnChip Interconnects, Université de Bretagne Managing Information Silos: Reducing Project Risk through MultiMetric Tracking, Achilles Test Systems Net-List Level Test Logic Insertion: Flow Automation for MBIST & Scan, Broadcom Physical Implementation of Retention Cell Based Design, Atoptech

SystemC: A Complete Digital System Modeling Language: A Case Study, Rambus

Sequential Clock Gating Optimization in GPU Designs with PowerPro CG, Advanced Micro Devices

Transforming Simulators into Implementations, University of Texas - Austin

Soft-Error-Rate Estimation in Sequential Circuits Utilizing a Scan ATPG Tool, Renesas Technology, Hitachi

Using Algorithmic Test Generation in a Constrained Random Test Environment, Ericsson

Solving FPGA Clock-Domain Crossing Problems: A Real-World Success Story, North Pole Engineering, Honeywell International, Mentor Graphics

Visualizing Debugging Using Transaction Explorer in SOC System Verification, Marvell Semiconductor

Static Timing Analysis of Single Track Circuits, University of Southern California, Sun Microsystems, Intel

User Track – Back End

Timing Closure in 65-Nanometer ASICs Using Statistical Static Timing Analysis Design Methodology, IBM

A Generic Clock Domain Crossing Verification Flow, Advanced Micro Devices

Using STA Information for Enhanced At-Speed ATPG, Freescale Semiconductor, Mentor Graphics

Among back-end issues, Stok says that timing analysis and dealing with variability are big issues for users, and that parametric yield is also gaining increasing traction. “Also there is a big drive in enabling DFM to try to make appropriate timing and electrical information available to design teams so that they react as early as possible.” The goals are clear. Get people to show up. Get them talking about the User Track. And have them leave DAC with the sense that they have pulled in a lot of useful information for day-to-day use.

The last point Hassoun and Stok make is that this first run will not be perfect—they never are. By taking this step, DAC begins a process of refinement that will only work if attendees offer detailed feedback about both the good and the bad. “We’re looking at all the ways we can do that. Some may be formal, some not. There is an ice cream social and we hope to meet a lot of people there. But we’ll be around all week—and we need people to come up to us and tell us what they think,” says Stok.


320,000,000 MILES, 380,000 SIMULATIONS AND ZERO TEST FLIGHTS LATER.

THAT’S MODEL-BASED DESIGN.

After simulating the final descent of the Mars Rovers under thousands of atmospheric disturbances, the engineering team developed and verified a fully redundant retro firing system to ensure a safe touchdown. The result—two successful autonomous landings that went exactly as simulated. To learn more, go to mathworks.com/mbd

©2005 The MathWorks, Inc.

Accelerating the pace of engineering and science


12

< COMMENTARY > USB FOCUS

Connecting to embedded design Yingbo Hu and Ralph Moore describe how to implement USB across a range of common functions. Universal Serial Bus (USB) is a connectivity specification that provides ease-of-use, expandability and good performance for the end-user. It is one of the most successful interconnects in computer history. Originally released in 1995 for PCs, it now is expanding into use by embedded systems and is replacing older interfaces such as serial and parallel interfaces as the preferred communication link. This article has been written as a tutorial on some of the many ways in which USB can be employed in embedded systems. USB is not a peer-to-peer protocol like Ethernet. One USB device (the ‘USB host’) acts as the master, while others (‘USB devices’ or ‘USB peripherals’) act as slaves. The host initiates all bus transfers. Up to 127 USB devices can be connected to one USB host via up to six layers of cascaded hubs. For embedded systems, it is very unusual to have more than one hub. In most cases, one USB device connects directly to one USB host with no hub. Source: Micro Digital

PC

Embedded Device

PC Application

Embedded Application

Serial Port API USB Serial/CDC Function Driver USB CDC Class USB Device USB Host Stack

USB Host Controller Driver

USB Device Controller Driver

USB Host Controller

USB Device Controller

USB Cable

FIGURE 1 PC to device via USB serial

Software Hardware

A USB host requires a USB host controller and USB host software. The latter is layered from the bottom up as follows: (1) USB host controller driver, (2) USB host stack, and (3) USB class driver. The first layer controls the USB host controller (i.e., it reads and writes registers in the controller and it transfers data). The second layer implements the USB protocol and thus controls connected USB devices. The third layer is device-aware and communicates with and controls the actual device (e.g., disk drive, HID human interface device, CDC communication device, etc.). One USB host stack can support multiple class drivers simultaneously. In an embedded system there is usually only one USB host controller. A USB device requires a USB device controller and USB device software. The latter is layered from the bottom up as follows: (1) USB device controller driver, (2) USB device stack, and (3) USB function driver. The first layer controls the USB device controller (i.e., it reads and writes registers in the controller and it transfers data). The second layer implements the USB protocol and thus communicates with the USB host stack. The third layer communicates with the class driver in the host and provides the actual device control. It makes the embedded unit look like a USB disk drive, HID, serial device, or another defined type. One USB device stack can support more than one function driver simultaneously, through the composite device framework. An attractive feature of USB is that it is plug-and-play, which means that a USB device will be automatically recognized shortly after being connected to a host. Also, cabling is simple: there is an A receptacle/plug pair for the host-end and a B receptacle/ plug pair for the device-end. All hosts and devices adhere to this standard, except On The Go (OTG) devices, which are designed for but not yet widely used in embedded systems. What follows are descriptions of six examples of how USB can be utilized in an embedded system. Where performance information is given, a “medium performance processor” is assumed to be a 50-80 MHz ARM7 or ColdFire.

1. PC to device via USB serial Most new PCs and laptops do not provide serial or parallel ports; they have been replaced with USB ports. Hence, connecting a PC to an embedded device via its RS-232 port is no longer possible. As part of their USB host stacks, popular PC OSs include Communication Class Drivers (CDCs). As shown in Figure 1, if


EDA Tech Forum June 2009

Source: Micro Digital

PC

PC

Embedded Device

PC Application

Source: Micro Digital

Embedded Device

Web Browser

Web Server

TCP/IP Stack

TCP/IP Stack

RNDIS Over USB

USB RNDIS Function Driver

PC USB Host Stack

Embedded USB Device Stack

PC USB Host Controller Driver

USB Device Controller Driver

USB Host Controller

USB Device Controller

Embedded Application

File System File System Mass Storage Class Driver

USB Mass Storage Function Driver

PC USB Host Stack

Embedded USB Device Stack

Flash Driver

USB Host Controller Driver

USB Device Controller Driver

Flash I/O Driver

USB Host Controller

USB Device Controller

Flash Memory

Software

Software Hardware

Hardware USB Cable

FIGURE 3 Web server access via USB RNDIS USB Cable

FIGURE 2 PC to device via USB disk the embedded device has a Serial/CDC function driver, then it will look like a serial device to the PC. When it is plugged in, it will be recognized by the PC OS as a serial device, and it will be automatically assigned a COM port number. Then, terminal emulators and other serial applications can communicate with the embedded device without any modification. This use of USB is particularly good for control and transferring serial data. Transfer rates of 800 KB/sec are feasible at full speed and 2,500 KB/sec at high speed for medium speed embedded processors.

2. PC to device via USB disk Another way of connecting a PC or laptop to an embedded device is for the embedded device to emulate a USB disk drive. Popular PC operating systems have built-in USB mass storage

class drivers that interface their file systems to the USB host stack, as shown on the left of Figure 2. Adding a mass storage function driver to the embedded device enables it to look like a USB disk drive to the PC. The figure also shows how a resident flash memory can be accessed as a flash disk via the USB function driver connected to its flash driver. Any other type of storage media could be connected, instead, via its own driver. When the embedded device is plugged into a PC, it is recognized as a disk drive and automatically assigned a drive letter. Thereafter, files can be dragged and dropped to and from the embedded device as though it were a disk drive. In this example, a PC application could read and write the files on the flash disk. Note that the embedded application uses a local file system to access the flash disk itself. This file system must, of course, be OS-compatible. An important concept to Continued on next page

13


< COMMENTARY > USB FOCUS

Source: Micro Digital

PC

Embedded Device

Embedded Device

PC App 1 PC App 2 Application Task Channel 3

Serial Serial Serial Port 3 Port 2 Port 1

PC CDC-ACM Driver

PC USB Host Stack

PC USB Host Controller Driver

USB Host Controller

File System API

Channel 2

Embedded Application

RS232 API File System

Communication

USB Mass Storage Function Driver

USB Serial Port Function Driver

Channel 1

Mass Storage Driver

USB CDC Function Driver

USB Device Framework

USB Composite Device Framework

PC USB Host Stack

Embedded USB Device Stack

PC USB Host Controller Driver

USB Device Controller Driver

USB Host Controller

USB Device Controller

Embedded USB Device Stack USB Device Controller Driver

UART 1 Driver

UART 2 Driver

USB Device External External Controller Device 1 Device 2

CDCACM Driver

Software

PC Application

USB Cable

FIGURE 4 USB multi-port serial device with UART and other connections

Hardware Software

PC

Source: Micro Digital

Hardware

14

USB Cable

understand is that within the PC, the PC’s file system is used and the embedded device merely looks like another disk drive to it. This use of USB would be particularly good for uploading acquired data files or downloading new versions of code files.

3. Web server access via USB RNDIS RNDIS (Remote Network Driver Interface Specification) permits emulating Ethernet over USB. It is not part of the USB specification, but some popular PC OSs, such as Windows and Linux, support it. As shown in Figure 3 (p. 13), adding an RNDIS function driver to an embedded device allows for the interfacing of its USB device stack to its TCP/IP stack, which in turn, connects to its Web server. When the embedded device is plugged into a PC, its browser can connect to the Web server in the embedded device. Hence, it is possible to use a browser to access an embedded device’s Web server, even when there is no Ethernet connection or it is difficult to access. This can be convenient for field troubleshooting or configuration using a laptop. The same information accessed via the network to which the embedded device is connected can be accessed via USB.

4. USB multi-port serial device with UART and other connections In Example 1 we examined the implementation of one serial channel over a USB connection. However, it is actually pos-

FIGURE 5 USB composite devices sible to run multiple, independent serial channels over one USB connection. This is practical because of the higher speed of USB compared with other similar technologies. Figure 4 shows the block diagram. The CDC ACM class driver in the PC may not be the native driver that comes with the PC OS. A special driver may need to be installed. This driver presents multiple virtual COM ports to the PC application and it multiplexes the corresponding serial channels over the USB connection. In the embedded device, the USB CDC function driver de-multiplexes the serial channels. Note that, in this example, one channel goes to an application task, which might return certain internal information, and the other two serial channels connect to actual UARTs. The application in the PC can communicate with physical devices, (e.g., modem, barcode reader, printer, etc.) connected to the UARTs as though they were connected directly to serial ports on the PC. For example, with a medium performance processor and full speed USB, a total bandwidth of 200KB/sec is achievable. This would support fifteen 115.2Kbaud channels, with capacity left over.

Continued on page 16


D<<K K?< >LP k_Xk <C@D@E8K<; ?@J K<8DËJ D8EL=8:KLI@E> M8I@89@C@KP @JJL<J%

=@O PFLI D8EL=8:KLI@E> M8I@89@C@KP GIF9C<DJ 8E; PFLI I<GLK8K@FE N@CC GIF:<<; PFL% @] pflËi\ [\j`^e`e^ Z_`gj ]fi _`^_ ]leZk`feXc`kp# _`^_ jg\\[ Xe[ cfn\i gfn\i Zfejldgk`fe Xk k_\ dfjk X[mXeZ\[ gifZ\jj ef[\j# pflËm\ ^fk mXi`XY`c`kp `jjl\j% N\ _Xm\ k_\ jfclk`fe k_Xk n`cc `eZi\Xj\ pfli p`\c[# g\i]fidXeZ\ Xe[ gi\jk`^\ Yp X n`[\ dXi^`e% s >\k dfi\ `e]fidXk`fe Xk d\ekfi%Zfd&jfclk`fej&dXel]XZkli`e^$mXi`XY`c`kp%


16

< COMMENTARY > USB FOCUS

Source: Micro Digital

It is actually possible for one USB device to look like multiple USB devices to a USB host simultaneously. This is made possible by the USB Composite Device Framework, as shown in Figure 5 (p. 14). The USB host (a PC in this example) will recognize each USB device within the embedded device and load its corresponding class driver. The device looks like a USB disk and a serial port. Note that both function drivers are present. This example is a fairly common case that is supported by PC OSs. This particular one would support an application in the PC transferring files, and another application allowing an operator to control or configure the embedded device.

Embedded Device Embedded Application

File System (FAT or Other)

USB Mass Storage Class Driver

Embedded USB Host Stack

6. USB thumb drive support

USB Host Controller Driver

USB Host Controller

5. USB composite devices

Software USB Thumb Drive Hardware

FIGURE 6 USB thumb drive support

Figure 6 shows how an embedded device can access a USB thumb drive (aka USB memory stick). A mass storage class driver fits between the USB host stack and the local file system in the embedded device. It creates the usual read/write logical address API expected of media drivers. Naturally the file system must be OS-compatible in order to exchange thumb drives with a PC. Thumb drives are commonly used to transfer data from embedded devices to PCs or to update firmware or configuration settings and tables in embedded devices. Yingbo Hu is R&D embedded software engineer at Micro Digital, and Ralph Moore is the company’s president.

Untitled-15 1

6/1/09 3:51:19 PM


intelligent, connected devices. Choose your architecture wisely. 15 billion connected devices by 2015.* How many will be yours? intel.com/embedded * Gantz, John. The Embedded Internet: Methodology and Findings, IDC, January 2009. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. 2009 Intel Corporation. All rights reserved.

Š


18

< TECH FORUM > EMBEDDED

Embedded software virtualization comes of age Robert Day, LynuxWorks Robert Day is vice president, marketing at LynuxWorks and has more than 20 years of experience in the embedded industry. He is a graduate of The University of Brighton, England, where he earned a Bachelor of Science degree in computer science.

A major problem facing embedded software developers is how to easily migrate applications onto new hardware. The risks and difficulties of migration often force developers to stick with outdated software and hardware platforms, making it difficult and costly to add new competitive features to their embedded products. Undertaking development in high-level languages (e.g., C and C++) and using RTOS platforms with open standards interfaces (e.g., POSIX) can make the transition a little easier, but there are still major porting efforts required when the hardware changes. Much of the time, the embedded software engineer just wants to run his legacy system on the new hardware without having to change anything, and add new features alongside it to, say, offer a more modern user interface or perhaps to pick up new communication protocols to talk to the outside world. Interestingly enough, something similar was recently needed in the IT world, where multiple applications and versions of OSs needed to run on multiple hardware platforms. The prevailing solution in IT has been using software virtualization. It provides a common software environment in which to run different ‘guest’ OSs and their applications across many hardware environments. This virtualization—or ‘hypervisor’—technology is now available to embedded developers, offering similar benefits for the reuse and migration of legacy platforms. Another interesting benefit of using an embedded hypervisor when combined with a software separation kernel is that it allows a traditional embedded RTOS to be resident on the same physical hardware as a more traditional desktop OS like Windows or Linux without compromising the real-time performance and determinism of the system. The separation kernel is a modern version of an RTOS that includes the ability to safely and securely partition both time and memory for the different software applications running in the system. The applications run in their own virtual machines that are protected from each other

Source: LynuxWorks

Partition 1 (core 1)

Partition 2 (core 2)

Partition 3 (core 2)

GUI Application

New Application

Legacy Application

Guest OS

New RTOS

Legacy RTOS

Full Virtualization API

ParaVirtualization API

ParaVirtualization API

Separation Kernel & Hypervisor Multi-core Hardware

FIGURE 1 New and legacy applications on a multicore system by using the separation kernel and the memory management unit of the processor. Partitioned OSs have been used in safety-critical systems (e.g., avionics applications) for a while, but the separation kernel adds two new components, security and multi-processor support, that make them still more widely applicable. In a safety-critical system, a partitioned OS is used to guarantee that any fault condition is isolated to the application in question and cannot corrupt or contaminate other applications that are running on the same hardware. At the same time, there are guarantees that each application gets its required time slice regardless of what else is happening, based on predetermined priorities and time allocations. In a system requiring security functionality, the separation kernel must have the ability to stop malicious faults or attacks that have entered into the system through a breach in one of the applications. The separation kernel can also help with another interesting challenge, that of embedded software development on multicore devices. The traditional RTOS has been good at distributing functionality on a single device, as it can control and marshal all the resources that the single processor can control. In a multicore system there is a need to control


EDA Tech Forum June 2009

and synchronize resources and applications that are spread over multiple cores. Running multiple instantiations of a traditional RTOS can deal with the spread of different applications across different cores, but is typically somewhat limited in the communication between, and the synchronization across those applications. A well designed separation kernel can actually run one instantiation of itself in a multicore system. Using the partitioning and secure communication mechanisms described above, it can allow partitioning of the system across multiple cores and allow communication paths and synchronization mechanisms between applications regardless of which core they are running on. Separation kernels have a great deal of flexibility and yet offer a greater degree of security than traditional embedded RTOSs, and are likely to become very widely used over a broad range of different embedded applications on many different hardware platforms. This gets really interesting when the separation kernel is married to a software hypervisor. The embedded hypervisor is designed to be smaller and more efficient than its IT cousins, but still gives the benefit of being able to run different ‘guest’ OSs on top of an RTOS. The hypervisor and separation kernel combination gives a new dimension to embedded software development, offering the ability to run multiple guest OSs and applications securely separated by a real-time system. The combined use of multicore processors, a separation kernel and a hypervisor allows developers to partition their system with guest OSs spread over multiple cores, and retains the ability to control communications between them using a common separation kernel. Many new multicore processors also offer hardware virtualization to allow hypervisors to more efficiently run guest OSs. Although much of this technology has been built for the IT world, it offers some very compelling benefits for embedded users. By allowing the hardware to help with software virtualization, for example the VT-D and VT-X technology provided on Intel processors, it enables the hypervisor to leave many of its instruction and data handling operations to hardware. This boosts the performance of running guest OSs to near native performance,

New separation kernels and embedded hypervisors can help ease the pain of migrating legacy systems to new hardware platforms, including multicore processing systems. Bringing multiple OSs and applications on to the same hardware also opens up new possibilities for combining systems that offer real-time performance within a familiar GUI environment. The LynxSecure separation kernel and embedded hypervisor from LynuxWorks is one of the latest generations of this technology, and is used to illustrate these new capabilities. and does not compromise the real-time attributes of the guest applications even though they are running in a virtualized environment. Rather like a memory management unit, hardware virtualization also prevents the guest OSs from interfering with other parts of the system without having to rely totally on software management. This increases the security of the system without degrading the real-time performance and determinism. For many embedded systems, performance is a key factor, and this separates the new generation of embedded hypervisors from traditional IT-centric offerings. There are different types of software virtualization available to the embedded developer, offering the best control over a guest OS. The first is not really virtualization or a hypervisor, but rather a software emulation package. This is often provided under the guise of virtualization, but it has a very important element that needs to be clarified—that of performance. A software emulator essentially translates the machine instructions of one processor on top of the target hardware, and is available as an application that can run on top of a partitioned OS. This software emulator then can present a full emulated environment to a run a guest OS on. This can be very appealing as it appears to give the same functionality as a hypervisor, but there is a huge performance hit as the emulator is having to translate each instruction for both the guest OS and the applications running on it. A true hypervisor will assist in the running of guest OSs, but they will still be running on the processor for which they were designed, removing the need for emulation and hence increasing performance. Continued on next page

19


< TECH FORUM > EMBEDDED

Another approach that can be used with this technology is that of full virtualization. Here, the same hypervisor can offer a virtualization environment to the guest OS that is akin to running on the native hardware. This requires no changes to the guest OS, but because the hypervisor is adding more virtualization, there is a small performance hit in comparison to the para-virtualized approach. An advantage of full virtualization is that OSs where the source cannot be para-virtualized (e.g., Windows) can still be run in this embedded system. The LynxSecure separation kernel and embedded hypervisor from LynuxWorks offers both para- and full virtualization as well as the ability to run these different types of virtualization simultaneously in the same system. This allows the embedded developer to pick and choose the type of virtualization based on what guest OSs are required and the performance aspects of different parts of the system. This elegant solution has unlimited flexibility without sacrificing the performance required by embedded systems. Returning to the original issue of code migration, this new technology not only allows applications to be migrated to new hardware, but also for them to use the OSs that they were originally run on. The use of the separation kernel also

Virtualization technology is now available to embedded developers, offering major benefits for the reuse and migration of legacy platforms. Two types of hypervisor virtualization are available: paravirtualization and full virtualization. Para-virtualization is a term used for a guest OS that has been modified to run on top of a hypervisor. In this case, the virtualization environment on which the embedded applications run has been optimized for performance, both for the processor environment and for the hypervisor. This approach offers a near native execution performance for the embedded applications and OSs when combined with hardware virtualization extensions, and can be used to host true RTOSs in their own partitions.

SUPERVISOR

USER

Source: LynuxWorks

POSIX App

ARINC 653 App

Windows App

POSIX

APEX

API

App

App

Middleware

Linux App

App

App

GLIBC

Middleware

CDS Guard

Open-standards API

Win32 API

Open-standards API

LynxOS-SE

Windows

Linux

Virtual CSP/BSP/Drivers

Virtual CSP/BSP/Drivers

Virtual CSP/BSP/Drivers

Subject 0

Subject 1

Subject 2

API

Application Run-Time (ART)

ART

Subjects 3-4

LynxSecure Separation Kernel and Embedded Hypervisor Exceptions and Interrupts

App

Open-standards API

Intersubject Communication HYPERVISOR

20

Device I/O, Memory Management

Hardware

FIGURE 2 The LynxSecure separation kernel and embedded hypervisor offers ultimate flexibility for new embedded designs



22

< TECH FORUM > EMBEDDED

allows these legacy systems to be contained in their own virtual machine, and even allocated their own core in a multicore system. Figure 1 (p. 18) shows a legacy RTOS application running in its own partition, sharing a core with a new set of applications on a new RTOS, while the other core has the GUI element with a fully virtualized guest OS like Windows for a familiar man machine interface. This scenario is likely to become quite common in many embedded industries from industrial controls through to medical, financial, military and automotive systems. It brings together a real-time system (e.g., data collection from a networked sensor system) with a more traditional GUIbased OS such as Windows or Linux. This combination on a single piece of hardware has always been a challenge, as both systems have individually required control of underlying hardware, which could compromise both the real-time performance aspects as well as the security and integrity of both systems. The separation kernel and hypervisor allows both the RTOS and GUI OS to each run in their own partition, possibly each on their own core in a multicore system, both protected from one another, but allowing each to securely communicate with each other. Because the software environment is virtualized, any legacy applications will also run as before, removing the need

for a large amount of recoding. The RTOS will be able to run in real time because the underlying separation kernel still provides a deterministic environment; the GUI OS will believe it has full control of the processor, and will perform as if it has its own dedicated machine. This system will then be able to offer a familiar user interface to control and display the data, while the RTOS busily collects the information without compromise.

LynuxWorks 855 Embedded Way San Jose CA 95138-1018 USA T: 1 408 979 3900 W: www.lynuxworks.com

Do you wake up at night thinking about your project? We’ve got just what you need to see clearly each morning.

The DV Notebook gives you real-time status in any web browser. Achilles software automatically extracts, gathers, and organizes key results from your design and verication les as they are generated. Untitled-14 1

www.achillestest.com 6/1/09 3:43:34 PM


Embedded Prototyping. SimpliďŹ ed.

Traditional Prototyping Tools

Graphical System Design Tools

Get to market faster and reduce development costs with graphical system design, an approach that combines open, graphical software and off-the-shelf hardware to help you quickly iterate on designs and easily implement them on an NI embedded platform. The NI CompactRIO system offers an ideal embedded prototyping platform with a built-in microcontroller, RTOS, programmable FPGA, integrated signal conditioning, and modular I/O, as well as tight integration with intuitive NI LabVIEW software.

>>

Learn how to simplify embedded design at ni.com/embedded

Š2009 National Instruments. All rights reserved. CompactRIO, LabVIEW, National Instruments, NI, and ni.com are trademarks of National Instruments. Other product and company names listed are trademarks or trade names of their respective companies. 2009-10794-305-101-D

888 279 9833


< TECH FORUM > ESL/SYSTEMC

Using a TLM virtual system prototype for hardware and software validation Alon Wintergreen, Mentor Graphics Alon Wintergreen has worked for Mentor Graphics as technical support engineer specializing in ESL and Summit Design tools since 2006. He holds a BS in electrical engineering from the Technion, the Israel Institute of Technology, Haifa.

• Separates Function from Interface • Separates Timing and Power from Function for incremental refinement • Aligns with Hardware and Software requirements for speed and fidelity Communication Layer

Port

Port

Function

Timing

Power

Communication Layer

In consumer electronics, missing a market window by even a few weeks can result in drastically limited sales. These cost- and schedule-sensitive applications are at the same time among the most challenging to create. Composed of complex hardware blocks, they typically include sophisticated digital circuitry coupled with large memories to provide advanced computational and multimedia capabilities. Frequently battery-powered, they have stringent power restrictions despite the demand for each generation to support ever more features and capabilities. With all the complexity associated with the hardware, the software is also crucial to the competitive success of these products. The application software is often the key differentiator, allowing the system company to reap substantial profit margins. Software is also increasingly important with regard to the power and performance behavior of the hardware platform. In traditional product development flows, the software team waits to validate their code on prototype hardware. While this approach worked well in the past, it fails under current technical and time-to-market pressures. According to industry research firm Venture Development Corporation, nearly 40% of project delays can be traced back to flaws in the system architecture design and specification. This problem exists because finding and fixing hardware/ software design errors at the late, physical prototype stage is both very difficult and very time-consuming. Moving hardware/software validation to an earlier point in the design flow enables both groups to quickly model their designs, assess the functionality and attributes of the entire system, and make changes that deliver huge returns in performance, power consumption and system size without endangering deadlines. The conclusion is clear: starting application software and firmware development against a high-level hardware model can save significant development time, and yield products that meet or exceed consumer expectations.

Source: Mentor Graphics

Communication Layer

24

Communication Layer

TLM Reference Platform

FIGURE 1 A scalable TLM approach

Conducting software validation earlier A new system design methodology is emerging in response to this pressing need for hardware/software validation early in the design cycle. It is based on the creation of highlevel hardware models that describe functionality in sufficient detail for the software team to use them as a development platform, even when hardware design is in its nascent stages. Thus, software developers can start application and firmware validation during the initial stages of the design cycle, where changes are easiest, have the most impact on the characteristics of the final design, and raise little risk of jeopardizing the product launch date. The methodology is based on a scalable transactional level modeling (TLM) concept that describes the hardware in SystemC. It provides benefits during both hardware and


EDA Tech Forum June 2009

The article describes how a methodology based around scalable transaction level modeling (TLM) techniques can be used to enable software design to begin far earlier in a design flow and thus allow companies to bring designs to market faster, particularly in time-sensitive sectors.

software development. Not only can the software team begin coding much earlier, but TLM hardware descriptions also provide much faster verification times—100x or more. On the hardware side, TLM allows for compact descriptions because the hardware system blocks are captured at a higher level and communicate by function calls, not by detailed signals, significantly reducing simulation time. The TLM model does not limit the design creativity of the hardware team. TLM allows the separation of functionality from implementation. Hence, instead of forcing engineers to commit to hardware specifics early in the design cycle, the model simply describes the functionality of the hardware, not the details of how the hardware achieves that functionality. It also enables incremental model fidelity for timing and power. In essence, the TLM model is independent of the hardware mechanics, allowing the hardware team to continually refine the design without having to

It is based on the creation of high-level hardware models that describe functionality in sufficient detail so that the software team can use them as a development platform, even if hardware design is in its earliest stages. Thus, software developers can even start application and firmware validation during the initial stages of the design, when changes are easiest, have the most impact on the characteristics of the final design, and there is little risk of missing a market deadline. constantly update the high-level virtual prototype. A scalable TLM approach separates function from interface, timing and power from function for incremental refinement, and aligns with hardware and software requirements for speed and fidelity (Figure 1). At the same time, software development can align with hardware development from the very earliest stages, allowing system interaction issues to be identified and resolved from the outset, dramatically minimizing their potential impact on the schedule. As a result, this methodology moves software/hardware integration into the electronic system level (ESL).

Continued on next page Source: Mentor Graphics

Speed (MIPS)

Concept Untimed

Virtual Prototype

Architecture Exploration

Loosely Timed

Approximately Timed

Accurately Timed

Implementation RTL

100M Function

1M Port

10K

Function

Timing

Power

Communication Layer

Port

Communication Layer

Communication Layer

10M

Communication Layer

1K

Single Scalable TLM Model RTL Functional Accurate

Register Accurate

FIGURE 2 Ideal ESL design modeling

Transaction Accurate

Protocol Accurate

Bit Accurate

Accuracy

25


< TECH FORUM > ESL/SYSTEMC

Source: Mentor Graphics

Functional Behavior

Communication Layer

Port

Function

Timing

Power

Communication Layer

Port

Communication Layer

26

ANSI C++ System/Algorithm

100,000x (1 sec)

Untimed TLM HW/SW Prototype

10,000x

Timed TLM Architectural Analysis

1,000x

Bus Cycle Accurate HW Verification

10x

RTL Implementation

1x (7 days)

Communication Layer

RTL

FIGURE 3 Scalable transaction level modeling

Using the Programmer’s View for software application validation TLM has several levels of abstraction, all of which support virtual prototyping and hardware/software co-design. However, there are some trade-offs involved. The very highest level, known as the Programmer’s View (PV) level, is a good point at which to begin software validation. At this stage, the SystemC hardware description does not include any timing information and therefore the simulation performance is extremely efficient—at least 1000 times faster than at the RTL level. The TLM model contains sufficient information to describe the hardware functionality to support software application development. Interface declarations are included so the software can connect with the hardware side. Specifically, there are two kinds of interfaces. The first is a high-level methods interface with which the software engineer can call in his program. The method will ‘run’ the hardware design and ‘returns’ with the result value. The second is a bus-cycle-accurate interface based on memory-mapped registers on the hardware side, allowing the hardware and software sides to interact through read and write transactions along with interrupts. Such a hardware/software interface is achieved either by incorporating an instruction set simulator (ISS) or by using a host-mode technology that uses read/write implicit access. An implicit access captures all the accesses to hardware by identifying the memory space calls. It allows software to

run on any host processor (rather than the target processor) and simplifies software programming because the software engineer does not need to instrument the code with any external API calls. Host-mode execution often offers much faster simulation with slightly less accuracy than using the traditional ISS.

The firmware development environment Software teams have traditionally been forced to wait for a hardware prototype to develop the firmware because of the level of detail required for validation. However, this aspect of the hardware/software interaction can now be moved to much earlier in the design cycle by using TLM models. At this point, the hardware team should nevertheless introduce detailed timing information because of its potential influence over the behavior of the firmware. The abstraction level is now bus-cycle-accurate, and here software engineers can decide if they want to work on the target OS (in this case they will use ISS models accompanied by the SW development tools) or on any host OS of their choice (in which case they will use bus-functional models and implicit-access functionality). This enables the firmware code to interact through busfunctional models with the hardware design. Working in the host OS environment of choice using the cycle-accurate model, any read/write operation will be mapped to the hardware and interact with an actual address in the hardware. An example of this type of implicit access is:


EDA Tech Forum June 2009

*addr1 = value1; // write access to mapped address – addr1 value2 = *addr2; // read access from mapped address - addr2 There are several specific debugging functionalities for firmware-related verification tasks. For instance, the design team can manage both hardware and software environments in one IDE tool. They also can perform debugging operations such as assigning breakpoints on both sides, and perform hardware/software transaction debugging. And they can view all the transactions (read/write/interrupts) and associated information in between hardware and software and break on any specific types of transaction or its parameters.

Selecting hardware verification methods When it comes to hardware verification and debug, one of two approaches is usually taken. The first involves the use of ISS models and software development environments at the highest TLM level (fast ISS models) or at the cycle-accurate level as described earlier. The second approach adopts the emulation of software threads within the SystemC hardware design. As opposed to the previous methods where software is linked through an ISS or host mode, here it is embedded within the hardware CPU model as additional SystemC threads that execute directly with the hardware in one simulation process. This second option is used specifically for system performance exploration since it offers very high simulation speed while being less accurate with no support for an RTOS. In that approach, which is used mainly by system architects, it is also possible to use ‘token-based’ modeling, which allows high simulation performance. In the first approach, the PV and the cycle-accurate model can also interact with SystemC verification solutions. They can be connected to existing ISS SystemC models, either at the PV level or cycle-accurate ISS solutions at TLM’s Verification View level. Software developers can work on the real target OS if the host-mode is not accurate enough for them. If the ISS model (or models) and associated software development tools can be fully synchronized with the SystemC hardware description of the system, the target software development can also start earlier in the design cycle. In the second approach, we define a sub-level of abstraction, which is called the Architect’s View. It includes some timing information, simulates faster than cycle-accurate models, but is not as accurate. This level is mainly used by system architects for performance analysis. Here, the methodology includes a set of configurable hardware models at that abstraction level (e.g., generic buses, generic processor, generic DMA, data generators, etc.). Using this methodology, the system architect can define hardware and software partitioning as well as target processors, bus architectures

and memory hierarchies. Equally important, he or she can add timing and power metrics. This level also supports token-based modeling, an abstract high-level modeling method that uses tokens (or pointers) to represent the data structure, resulting in an exceptionally fast simulation performance, an important requirement for system performance analysis. In addition, performance analysis functionalities can be used with custom models, so that system architects can run software emulation as a testbench for the system performance analysis task. Think of this as a software emulation that runs as SystemC threads and therefore it is part of the hardware simulation, but runs extremely fast. This capability can be used by the system architect at the highest level to find the best architecture. The tokens or pointers result in very accurate modeling for use in measuring system performance. The system engineer can manipulate the parameters of the different blocks and test various configurations and use cases until reaching the required performance.

Integrating development In markets that are extremely sensitive to cost and schedule slips, hardware and software teams need to work together from the very outset to meet product launch windows. The emerging and scalable TLM methodology described above moves software and firmware validation to the earliest stages of the design cycle, benefiting both teams. Software designers can now validate their applications and firmware long before hardware prototypes are available. At the same time, the hardware team can concentrate on hardware refinement without having to continually update models for the software validation. By aligning the software and hardware flows at the earliest point possible, this approach minimizes integration risks downstream in the design flow. The result is a significantly reduced threat of schedule slips even as the design team maximizes product differentiation. The use of scalable TLM models is a crucial step in bridging software and hardware design methodologies, bringing us closer to the ultimate goal: true concurrent design.

Mentor Graphics Corporate Office 8005 SW Boeckman Rd Wilsonville OR 97070 USA T: +1 800 547 3000 W: www.mentor.com

27


28

< TECH FORUM > ESL/SYSTEMC

Bridging from ESL models to implementation via highlevel hardware synthesis Jérôme Lemaitre, CoFluent Design Jérôme Lemaitre is solution specialist at CoFluent Design. His fields of interest include architectural exploration and performance analysis, MPSoC/NoC design and rapid prototyping. He completed a post graduate degree in 2002 at ESPEO, University of Orléans, France, majoring in embedded systems.

Cycle-accurate models very precisely predict the behavior and performance of hardware and software components. Behavioral and performance transaction-level models (TLMs) enable hardware/software partitioning decisions to be made at the electronic system level (ESL) early in the development phase, long before cycle-accurate models will be available. The problematic gap between these different types of models is known as the ESL implementation gap. Figure 1 offers a proven methodology that bridges the gap from SystemC TLMs for architectural exploration to SystemC cycle-accurate models of hardware. The behavior of the cycle-accurate models can be verified in the complete system by comparing it with the reference TLMs. The reference model

of the complete system serves as a testbench for the verification and integration of the cycle-accurate models. This flow allows designers to evaluate alternative architectures early in a project with low modeling effort, and integrate cycle-accurate models later on to progressively replace the TLMs. Exact timing and performance properties obtained from simulating cycle-accurate models (e.g., power consumption, resource load) are used to back-annotate the reference models. This increases the level of confidence in decisions made when exploring the design space.

JPEG system application modeling

A JPEG system will be used to demonstrate the design flow. It consists of a still camera and a JPEG encoder/decoder (Figure 2). The Camera includes a Controller and two image Sensors. The JPEG encoder/decoder consists of two subsystems (JPEG1 and JPEG2), each processing the images acquired by a sensor. The subsystems consist of three functions: MBConverter, MBEncoder and JPEGSave. The structures Source: CoFluent Design of the two subsystems are different for MBEncoder. CoFluent Studio Library The test case focuses on Mixed Graphical and the transaction-level and Timing Textual Specification Power cycle-accurate modeling Load Memory Automatic SystemC Generation of the MBEncoder1 and Cost MBEncoder2 functions, the Reference mapping of these funcTransaction-Level Timed Behavior Architecture Modeling tions onto a multiprocessor Performance Exploration platform, and the resulting SW Integration & Test performance of multiple Development Back-annotation candidate architectures in terms of latency, utilization Cycle Catapult and power consumption. Validation Accurate HW Behavioral Synthesis MBEncoder1 is modModels Loop unrolling, pipelining eled with one computational block (Pixelpipe), Cycle Accurate SW/HW Behavior as shown on the left of co-simulation Real Performance Figure 3. This function contains the entire JPEG Back-annotation sequential C code, which Product constitutes the reference algorithm tested on a PC. FIGURE 1 Combining ESL models and cycle-accurate models of HW tasks


EDA Tech Forum June 2009 MBEncoder2 has the same functionality as MBEncoder1. The difference is the granularity: MBEncoder2 is more detailed in order to optimize its implementation, as shown on the right of Figure 3. The separation enables the optimization of image processing by introducing parallelism in the application. Mapping these functions onto a hardware or software processor enables the exploration of their behavior and performance. Functional TLMs of the encoders In the TLMs of the encoders, the behavior of the Pixelpipe, DCT, Quantize and Huffmanize blocks is implemented by calling procedures that execute sequential C code provided by Mentor Graphics. This C code operates on algorithmic C bit-accurate data types. These allow you to anticipate the replacement of the reference model with the cycle- and bit-accurate model obtained after the hardware undergoes high-level synthesis with Mentor’s Catapult C software. These are the execution times measured in CoFluent Studio for the functions under consideration. Computation Average block execution DCT 25.40us Quantize 26.09us Huffmanize 113.60us Pixelpipe 152.06us

The article describes a methodology that bridges the gap between SystemC transaction-level models (TLMs) that are used for architectural exploration and SystemC cycleaccurate models of hardware that typically follow much later in a design flow, after many sensitive decisions have been made. The behavior of the cycle-accurate models can be verified in the complete system by comparing it with the reference TLMs. The reference model of the complete system then serves as a testbench for the verification and integration of the cycle-accurate models. This flow allows designers to evaluate alternative architectures early in a project with low modeling effort, and they can integrate cycle-accurate models later on to progressively replace the TLMs. Exact timing and performance properties obtained from simulating cycle-accurate models (e.g., power consumption, resource load) are used to back-annotate the reference models. This increases the level of confidence in decisions made when exploring the design space. The methodology is illustrated via a case study involving a JPEG system, using Mentor Graphics’ Catapult Synthesis and CoFluent Design’s CoFluent Studio tools to provide a complete ESL flow, from architectural exploration to hardware implementation.

These numbers are obtained by calibrating the execution of the application on a 1GHz PC. The measurements provide initial timing information that is precise enough to map the corresponding functions onto software processing units (e.g., CPU, DSP). To map these functions onto hardware processing units (e.g. ASIC, FPGA), more accurate numbers can be obtained from high-level hardware synthesis. Although the execution time of Pixelpipe is shorter than the sum of the execution times of DCT, Quantize and Huffmanize, the processing of a complete image is shorter with MBEncoder2 (439ms) than with MBEncoder1 (536ms). This is because the DCTMB, QuantizeMB and HuffmanizeMB functions are pipelined, whereas MBEncoder1 has to complete the processing of a macro-block before accepting a new one. Continued on next page

Source: CoFluent Design

FIGURE 2 JPEG system Source: CoFluent Design

DCTMB DCTBehavior

MBEncoder1 MBEncoder1 Behavior

QuantizeMB QuantizeBehavior

HuffmanizerMB HuffmanizerBehavior

MBFromConverter2

MBFromConverter1

Pixelpipe

MBBitStream1

DCT

Quantize MBDCT

FIGURE 3 Structure of MBEncoder1 (left) and MBEncoder2 (right)

Huffmanize MBQuant

MBBitStream2

29


30

< TECH FORUM > ESL/SYSTEMC

Source: CoFluent Design

FIGURE 4 Verification of the functional behavior in CoFluent Studio Also, the processing speed of the pipeline in MBEncoder2 is limited by the HuffmanizeMB function, since it has the longest execution time in the pipeline. The operations are verified by visualizing images in CoFluent Studio and reviewing the timing properties as shown in Figure 4. Simulating one second of data with the parallel processing of two images of 200x128 pixels at the transaction level requires only a few seconds of actual simulation time.

Platform modeling The complete JPEG application is mapped onto the platform model shown in Figure 5. It consists of an ExternalPlatform, modeled as a hardware processing unit, and a JPEGPlatform. CoFluent Studio offers generic models of hardware elements. These processing, communication and storage units are characterized by high-level behavioral and performance properties that are parameterized to represent physical parts. Multiple and various platform models can be described

quickly, without investing in the expensive development or acquisition of models of specific IP components, such as memories, buses or processors. Simulation of software is not based on instruction-set simulators, as the C code used to describe algorithms executes natively on the simulation host. The FPGA has a speed-up defined as a generic parameter named FPGA_SpeedUp, which can vary from 1 to 250, with a default of 10. This parameter represents the hardware acceleration. The speed-up of the DSP is set to 2, meaning that the internal architecture of the DSP is twice as efficient as a generalpurpose processor, due to specialized embedded instructions. The test case maps MBEncoder1 and MBEncoder2 onto the FPGA and DSP, with exploration of multiple mapping alternatives. The following assumptions were used: the Camera model is mapped onto the External platform, while MBConverters and JPGSave are mapped onto the CPU with execution times short enough not to delay the DSP and FPGA. Average execution times can now be updated a follows: Computation SW execution HW execution block (KCycles) (KCycles) DCT 25.40/2 25.40/FPGA_SpeedUp Quantize 26.09/2 26.09/FPGA_SpeedUp Huffmanize 113.60/2 113.60/FPGA_SpeedUp Pixelpipe 152.06/2 152.06/FPGA_SpeedUp The power consumption for each computation block is described using a simplified law that utilizes the FPGA_ SpeedUp parameter. A higher speed-up on the FPGA uses more gates, and therefore increases the power consumption. The power consumption equations are: Source: CoFluent Design

External Platform SerialLink

DualPortMem

LocalBus

CPU

JPEGPlatformStruc

SharedBus

SharedMem

DSP JPEGPlatform FPGA

FIGURE 5 Platform model. Source: CoFluent Design

TABLE 1 Initial performances of the DSP and FPGA in Configuration C


EDA Tech Forum June 2009

Computation block DCT Quantize Huffmanize Pixelpipe

Source: CoFluent Design

Power consumption (mW) 0.2*FPGA_SpeedUp^(3/2) 0.15*FPGA_SpeedUp^(3/2) 0.2*FPGA_SpeedUp(3/2) 0.25*FPGA_SpeedUp^(3/2)

Mapping and architecture modeling Architecture description One image is imposed every 500ms simultaneously to MBEncoder1 and MBEncoder2. Here is a comparison of the performance of the three configurations. Function Config. A Config. B Config. C MBEncoder1 FPGA DSP FPGA DCTMB DSP FPGA DSP QuantizeMB DSP FPGA DSP HuffmanizeMB DSP FPGA FPGA By studying the impact of FPGA_SpeedUp on the performance of the system in terms of latencies, resource utilization and power consumption, the best architecture and the minimum value required for the FPGA_SpeedUp generic parameter can be selected. CoFluent Studio’s drag-and-drop mapping operation is used to allocate functions to processing units and route data through communication channels. The resulting architectural models are automatically generated in SystemC. Profiling data is automatically collected for each configuration at all hierarchical levels during simulation. The simulations are based on individual properties, described using constants or C variables. This information is displayed in tables, as shown in Table 1. The utilization of the FPGA increases to 200% because two functions (MBEncoder1 and HuffmanizeMB) can be executed in parallel on the FPGA. Initial exploration results Early exploration results are based on initial timing properties measured by simulating the reference model. Results in terms of utilization and power consumption on the DSP and FPGA, and processing latencies for the two JPEG encoders/decoders, are given for the default case (FPGA_SpeedUp = 10). Configuration C processes both images with the shortest latencies. Config. A Config. B Config. C Latencies Path w. Encoder1 182 366 182 (ms) Path w. Encoder2 400 136 136 Utilization DSP (%) FPGA

80.01 36.64

73.23 39.77

25.22 63.85

Power DSP Cons. (mW) FPGA

30.60 2.93

33.09 2.55

8.53 4.71

Continued on next page

FIGURE 6 Dynamic power consumption Source: CoFluent Design

FIGURE 7 Effects of FPGA speed-up generic parameter on latencies and power consumption Source: CoFluent Design

DCTMB MBFromConverter2

MBDCT

Wrapper Request Acknowledge DataIn

Reset Cycle Accurate SystemC DCT

Enable Address DataOut

Clock

FIGURE 8 Integrating the DCT cycle-accurate model in CoFluent Studio

31


32

< TECH FORUM > ESL/SYSTEMC

Source: CoFluent Design

Config A

Config B

Config C

Latencies (ms)

Path with Encoder1 Path with Encoder2

6 (7) 400 (400)

366 (366) 9 (5)

6 (7) 126 (126)

Utilization (%)

DSP FPGA

80.01 (80.01) 1.21 (1.46)

73.23 (73.23) 3.38 (1.59)

25.22 (25.22) 3.07 (2.55)

Power. Cons. (mW)

DSP FPGA

30.60 (30.60) 10.66 (14.43)

33.09 (33.09) 23.28 (12.04)

8.53 (8.53) 27.73 (23.05)

TABLE 2 Exploration after synthesis and back-annotation Source: CoFluent Design

Up = 15 is the minimum and optimal value, and should be set as the objective for the hardware high-level synthesis tool.

Calibration of the reference model In the previous section, the JPEG system was modeled at the transaction level, and system-level decisions were made based on the initial exploration results. In this section, cycleaccurate models obtained from Catapult C hardware highlevel synthesis are integrated back into CoFluent Studio for further verification and refinement.

FIGURE 9 Configuration C dynamic power consumption profiles, before and after back-annotation Source: CoFluent Design

FIGURE 10 Power consumption when mapping two encoders on the FPGA Figure 6 (p. 29) shows that, on average, Configuration C consumes less power than the two other configurations. However, the power consumption on the FPGA is higher for Configuration C. In CoFluent Studio, it is possible to explore the impact of generic parameters at the system level for multiple architectures with a single simulation. The results for all configurations are collected and displayed in the same environment. This allows for rapid comparison of architecture. Figure 7 (p. 29) shows the impact of FPGA_SpeedUp on latencies and power consumption. For Configuration C, MBEncoder2 becomes the bottleneck, since the system performance is limited by the DSP. The simulations show that FPGA_Speed-

Functional cycle-accurate models Using Catapult C, the sequential C code that is executed in the computation blocks is converted into SystemC cycle-accurate code. The resulting code is integrated back into CoFluent Studio to verify the behavior of the cycle-accurate models against the reference TLMs. Then, the timing and performance properties of the cycle-accurate models are extracted through simulation to calibrate the architecture exploration process for functions that are mapped onto hardware units. In order to integrate the cycle-accurate models back into CoFluent Studio, SystemC wrappers are created (Figure 8). They convert incoming transaction-level data to cycle-accurate data that is processed by the detailed model, and vice versa. The wrappers handle interfaces and protocols specific to the detailed model, such as handshakes and memories. It took one day to wrap the detailed models and integrate them into CoFluent Studio. The verification task is simplified since the reference TLM is used as a testbench. The processing of macro-blocks and images can be displayed directly in CoFluent Studio. However, the simulation speed is slower. For this example, it is approximately 400 times slower than the transaction-level simulation. These are the exact properties of the cycle-accurate operations in terms of the (measured) number of clock cycles and the (assumed) power consumption. Average exec Average power Function (clock cycles) cons. (mW) DCTMB 161 1000 QuantizeMB 72 800 HuffmanizeMB 576 1200 MBEncoder1 303 1400


EDA Tech Forum June 2009

The back-annotation of the timing information leads to more accurate performance results during the design-space exploration phase. As in the reference model, HuffmanizeMB is the slowest function in the pipeline in MBEncoder2. This is due to the fact that the cycle-accurate model of the HuffmanizeMB function does not read/write elements of a macro-block continuously whereas the three other models do. Performance analysis after calibration In order to explore the performance of the same architectures, the reference model is back-annotated with exact properties of the detailed models. Since timing properties are exact, the value of the FPGA_SpeedUp parameter is set to 1 for this new iteration in the architecture exploration phase. As shown in Table 2, the speed-up obtained after high-level synthesis is approximately 250. Values obtained based on the reference model for FPGA_SpeedUp = 250 are indicated between brackets for comparison. The metrics of interest converge toward similar values from the reference model for all architectures, confirming design decisions made early, based on the reference transaction-level model. Configuration C leads to the shortest latencies. As predicted with the reference transaction-level model, the bottleneck for Configuration C is the DSP, since the real speedup exceeds 15. Configuration C permits each encoder to process two images in parallel every 126ms. As shown in Figure 9, this leads to a peak power consumption of almost 1W on the FPGA. For comparison, the dynamic profiles returned by the reference model with FPGA_SpeedUp = 15 are also shown. The configuration with FPGA_SpeedUp = 15 can process the same number of images, with a lower peak power consumption (approximately 25mW). Maximizing encoding performance By optimizing the latencies in the back-annotated model by mapping both MBEncoder1 and MBEncoder2 onto the FPGA, the DSP limitation is avoided. In this configuration, two images can be processed in parallel in 9ms: each encoder processes more than 100 images per second. The resulting average power consumption is higher than 1.6W on the FPGA. Figure 10 shows the case where the two encoders receive an image every 10ms. The exact durations of the detailed models are indicated in Figure 10; highlighting the bottleneck is the HuffmanizeMB function in the pipeline. This function must be synthesized differently to reach the execution time of 3030ns of the MBEncoder1 function, leading to approximately 150 images per second.

Conclusion Joining TLMs and cycle-accurate models obtained after high-level hardware synthesis using Mentor Graphics’ Catapult C Synthesis for architecture exploration in CoFluent

Design’s CoFluent Studio, provides a complete ESL flow, from architectural exploration to hardware implementation. With the ‘implementation gap’ closed, designers can benefit from the architectural exploration and profiling completed early in the design cycle. The design compared the utilization ratio (resource load), processing latency and dynamic and average power consumption of the three configurations. Generic parameterized models of platform elements and the drag-and-drop mapping tool allow quick completion of initial architectures. Once the impact of a generic parameter that represents the hardware acceleration was analyzed, the minimum value required for that parameter to optimize both latencies and power consumption was found. Reference C algorithms are converted to SystemC cycleaccurate models using the Catapult C synthesis tool. The resulting cycle-accurate models are integrated back into CoFluent Studio to refine the TLMs for those functions that map onto hardware processors. Wrapping SystemC around cycle-accurate models enables the transaction-level models to interface with cycle-accurate models. The behavior of the wrapped, detailed models was verified within CoFluent Studio against the behavior of the reference model, which served as a testbench. Back-annotated timing properties of the reference TLM are based on exact timing obtained by simulating the detailed models. The back-annotated model is used to explore the same architectures as with the reference model. Reaching the same conclusions confirms that decisions can be made early and with a high level of confidence based on the reference transaction-level model. This also confirms that external SystemC models—hand-written or synthesis result—can be easily integrated into CoFluent Studio, but cycle-accurate models should only be used for validation and calibration, and be replaced by their transaction-level equivalent models to maintain simulation efficiency.

Acknowledgements The author would like to thank Thomas Bollaert from Mentor Graphics for providing the sequential C code of the JPEG application as well as for the four detailed models generated using Catapult C. CoFluent Design 24 rue Jean Duplessis 78150 Le Chesnay France T: +33 139 438 242 W: www.cofluentdesign.com

33


34

< TECH FORUM > VERIFIED RTL TO GATES

Parallel transistor-level full-chip circuit simulation He Peng and Chung-Kuan Cheng, University of California, San Diego This paper is based on one originally presented at Design Automation and Test in Europe 2009, in Nice, France.

1. Introduction With the rapid scaling down of VLSI feature sizes, the huge size of today’s designs is making transistor-level circuit simulation extremely time-consuming. The evolution of hardware technology has made quad-core systems already available in mainstream computers, and higher core counts are planned. The shift from single-core to multicore processing is creating new challenges and opportunities for circuit simulation. Parallel computation is becoming a critical part of circuit simulators designed to handle large-scale VLSI circuits. Several parallel circuit simulation techniques have been proposed. The Siemens circuit simulator TITAN [6] partitions the circuit by minimizing the total wire length for the circuit, and parallel simulation is performed by using a nonoverlapping domain decomposition technique. However, the efficiency of the approach can deteriorate quickly as the size of interface increases. Weighted graphs and graph decomposition heuristics are used in the Xyce parallel circuit simulator [7] to partition the circuit graph to achieve load balance and minimize communication costs. The waveform relaxation technique has been proposed for parallel circuit simulation [8], however, it is not widely used because it is only applicable to circuits with unidirectional signal flow.

Parallel domain decomposition methods have been developed for the simulation of linear circuits such as power ground networks [2]. All these methods are mainly based on parallel matrix solving and device model evaluation; they are either incapable of handling circuits with nonlinear components or their performance drops quickly for circuits with nonlinear components as the number of processor cores and circuit size increase. WavePipe [3] complements previous methods by extending classical time-integration methods in SPICE. It takes fundamental advantage of multi-threading at the numerical discretization level and exploits coarse-grained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points. MAPS [4] explores inter-algorithm parallelism by starting multiple simulation algorithms in parallel for a given task. The method proposed here is a fully parallel circuit simulation for full-chip transient circuit analysis. It can simulate large industrial designs in parallel achieving speedups of orders of magnitude over SPICE without jeopardizing accuracy. It provides orthogonal improvements over methods like WavePipe [3] and MAPS [4] and complements other parallel matrix solver and/or parallel device model evaluation-based approaches in a number of ways. 1. It can perform transistor-level full-chip circuit simulation for general circuit designs with interconnect, clock, power and ground, and analog components with capacitive and inductive couplings. 2. The circuit is partitioned into a linear subdomain and Source: UCSD

PowerNetwork

Signal/Clock Network1

PowerNetwork

Signal/Clock Network2

GroundNetwork

Signal/Clock Network3

Signal/Clock Network1

Signal/Clock Network2

Signal/Clock Network3

Signal/Clock Network2

GroundNetwork

(a)

(b)

(c)

FIGURE 1 Circuit partition example. (a) Original circuit (b) Nonlinear subdomain 1 (c) Nonlinear subdomain 2


EDA Tech Forum June 2009

The paper presents a fully parallel transistor-level full-chip circuit simulation tool with SPICE accuracy for general circuit designs. The proposed overlapping domain decomposition approach partitions the circuit into a linear subdomain and multiple nonlinear subdomains based on circuit nonlinearity and connectivity. A parallel iterative matrix solver is used to solve the linear domain while nonlinear subdomains are parallelly distributed into different processors topologically and solved by a direct solver. To achieve maximum parallelism, device model evaluation is also done in parallel. A parallel domain decomposition technique is used to iteratively solve the different partitions of the circuit and ensure convergence. The technique is several orders of magnitude faster than SPICE for sets of large-scale circuit designs on up to 64 processors. multiple nonlinear subdomains. Overlapping domain decomposition is used as an interface to ensure convergence. The linear subdomain is further partitioned by ParMETIS [9] and in parallel solved by the iterative solver GMRES with BoomerAMG as the preconditioner to achieve parallel scalability. 3. Convergence is further improved by solving different nonlinear subdomains according to their circuit topological order. 4. To achieve maximum parallelism, device model evaluation is done in parallel. 5. SPICE accuracy is guaranteed by using the same convergence check and dynamic time stepping techniques as Berkeley SPICE3f5. This paper is an extension of a method proposed in [5], with the following major improvements: 1. An improved parallel simulation flow with new parallel device model evaluation, numerical integration and linearization steps that increase parallel scalability of the proposed approach. 2. A new and improved parallel implementation that runs on up to 64 processors. The paper is organized as follows. Section 2 presents the parallel simulation approach in detail as well as the overall simulation flow. Experimental results are then described in Section 3. Finally, we offer our conclusions and proposals for future research.

2. Parallel domain decomposition simulation A. Domain decomposition partition and parallel graph partition The proposed approach reads the circuit description in SPICE format and partitions the circuit at linear and nonlinear boundaries at gate level. The circuit is partitioned into a linear subdomain and multiple nonlinear subdomains. First, we partition the circuit into different nonlinear partitions (i.e., subdomains) based on circuit nonlinearity and connectivity. Figure 1(a) shows an example of two NAND gates surrounded by coupled power, ground and signal networks. Figure 1(b) and 1(c) show two different nonlinear subdomains. The nonlinear subdomains include nonlinear functional blocks

Source: UCSD

A1

A2

A0

FIGURE 2 Overlapping domain decomposition partition of system matrix A as well as input/output signal networks connected to these functional blocks. Power and ground networks are not included because we need to make the nonlinear subdomains small enough to be solved efficiently by a direct solver. Partitions that break feedback loops are avoided. Since these feedback loops usually are not very large, it is feasible to keep them in a single nonlinear subdomain. Once the circuit is partitioned into different nonlinear subdomains Ω1, ..., Ωk, we add the whole circuit as a linear subdomain Ω0. Unsymmetry in the system matrix of Ω0 caused by nonlinear components in the circuit are removed as described in 2C in order to improve the matrix properties of Ω0. This allows us to use parallel iterative matrix solvers to solve the linear subdomain efficiently. We use the parallel graph partition tool ParMETIS [9] to further partition the linear subdomain Ω0 to achieve parallel load balance and make the proposed approach scale with the number of processors. Continued on next page

35


36

< TECH FORUM > VERIFIED RTL TO GATES

Source: UCSD

Load Netlist

 agg agd  Vg   I g     =    adg add  Vd   I d  We move the off diagonal terms in the matrix to the right hand side:

Domain Decomposition Partition

t  agg 0  Vg   I g − agdVd      =  t 0 a  I d − adgVg   dd  Vd 

Partition Linear Domain (ParMetis)

Generate Topological Order

Device Evaluation

Device Evaluation

Device Evaluation

Numerical Integration

Numerical Integration

Numerical Integration

Linearization

Linearization

Linearization

Processor 1

Processor 2

Processor N

Parallel Solve Linear Domain

Parallel Distribute and Direct Solve Non-Linear Domains

Schwarz Alternating Procedure Convergence?

Newton-Raphson Iteration Convergence?

No

No

Yes Next Time Step

FIGURE 3 Parallel transient simulation flow

B. Gate decoupling and topological order simulation Because of nonlinear elements in the circuit, the system matrix for linear subdomain Ω0 is not symmetrical and hence it is unsuitable for fast parallel linear solvers. The main asymmetry in the matrix comes from the gate-source and gatedrain coupling in device models. We move this coupling from the system matrix of the linear subdomain to the right hand side of the linear system. For example, the following sub-matrix demonstrates gate-drain coupling in the BSIM3 model:

where Vtd and Vtg are solutions at previous iteration. This process simplifies the linear subdomain and improves the matrix properties of the system matrix. With this simplification, we could use parallel solvers like GMRES to solve the matrix efficiently. The linear subdomain Ω0, which is the entire circuit with simplification, is then partitioned by ParMETIS and solved in parallel using GMRES with BoomerAMG as a preconditioner. Nonlinear subdomains are evenly distributed into different processors and solved by the direct solver KLU [13]. To increase the convergence rate, we generate a topological order from primary inputs and flip-flops in the circuit and solve nonlinear subdomains according to this order. Feedback loops are reduced into a single node when the topological order is generated. Convergence is achieved by using parallel domain decomposition techniques, which will now be introduced in 2C. C. Parallel domain decomposition techniques for circuit simulation Domain decomposition methods refer to a collection of divide-and-conquer techniques that have been primarily developed for solving partial differential equations [10], [11]. The partition method introduced in 2A generates overlapping subdomains. For example, as shown in Figure 1, nonlinear subdomains 1 and 2 overlap as they share the same signal/clock network 2. Partitioning the circuit into a linear subdomain Ω0 and K overlapping nonlinear subdomains Ω1, ... , Ωk is equivalent to partitioning the system matrix A of the circuit into a matrix A0 and K overlapping sub-matrices A1, ..., AK as shown in Figure 2 (p. 33), where Ai is the matrix representing subdomain Ωi. Schwarz Alternating Procedure [11] is used to iteratively solve the linear system Ax=b. The algorithm is described below. The proposed method first solves the linear subdomain Ω0. The linear subdomain is partitioned by ParMETIS and solved in parallel using GMRES with BoomerAMG as a preconditioner. Next, all nonlinear subdomains Ω1, ...,ΩK are distributed into different processors according to their topological order and solved by the direct solver KLU. Residue values at a subdomain boundary are updated as soon as the solution for a given subdomain is available. The Schwarz Alternating Procedure continues until convergence is reached. Schwarz Alternating Procedure 1. Input: Matrices A, A0, A1, ..., AK, right hand side b. 2. Output: Solution x


EDA Tech Forum June 2009

3. Choose initial guess x 4. Calculate residue r = b−A x 5. repeat 6. for i = 0 to k do 7. Solve Aiδi = ri 8. Update solution x. xi = xi+δi 9. Update residue values on boundary 10. end for 11. until Convergence 12. Output x D. Parallel device loading Device model evaluation, an essential part of a circuit simulator, could easily consume more than 30% of the total runtime in a serial simulation. For parallel circuit simulation, the device model evaluation part becomes more sensitive as it will introduce significant communication overhead if it is not done in parallel and optimally. The proposed method performs the device loading, numerical integration and linearization steps in parallel. Once the circuit is partitioned, each processor calculates the device model, numerical integration and linearization for its own partition. This approach reduces the device model evaluation runtime and reduces the communication overhead needed for the stamping of Jacobian matrix entries among the processors.

the same dynamic step size control, numerical integration and nonlinear iteration methods as are used in Berkeley SPICE3. The GMRES solver and BoomerAMG preconditioner in the PETSc [12] package were used as iterative solvers, and KLU [13] was used as a direct solver. Five industrial designs were tested on the FWGrid [14] infrastructure with up to 64 processors. Table 1 lists the transient simulation runtime, where DD refers to the proposed method. Test cases ckt1 and ckt2 are two linear circuit dominant test cases. Figure 4 shows the waveform of one node of the ckt1 circuit. The waveform shows that our result matches the SPICE result. Test cases ckt3, ckt4 and ckt5 are cell designs with power and ground networks. We can see from Table 1 that the proposed method is orders of magnitude faster than SPICE. The results also show that the proposed method scales very well as we increase the number of available processors. However, the performance increase from 32 to 64 processors is not as great as the increase from 16 to 32 processors. This is due to crossrack communication overhead on the FWGrid infrastructure. With 64 processors, we need to use at least two racks as each rack Continued on next page Source: UCSD 1.0025

E. Parallel transient simulation flow The overall parallel transient simulation flow is presented in Figure 3. The device loading, numerical integration and linearization steps are the same as Berkeley SPICE except that they are done in parallel. As shown, the linear subdomain is partitioned into N partitions by ParMETIS, where N is the total number of available processors. Each processor loads its own part of the circuit. After the parallel device loading, numerical integration and Newton-Raphson linearization, the linear subdomain is solved in parallel. Nonlinear subdomains are then evenly distributed into available processors according to their topological order and solved by a direct solver.

1.0015

Voltage

1.001 1.0005 1 0.9995 0.999 0.9985 0.998 0

3. Experimental results The proposed approach was implemented in ANSI C. Parallel algorithms are implemented with MPICH1. We adopted

SPICE Proposed Approach

1.002

0.1

0.2

0.3

0.4

0.5 Time

0.6

0.7

0.8

0.9

1 −9 x 10

FIGURE 4 Transient simulation waveform of ckt1 design Source: UCSD

Transient simulation runtime Case

#Nodes

#Tx

#R

#C

#L

SPICE

DD DD (1 Proc.) (4 Proc.)

DD DD (16 Proc.) (32 Proc.)

DD (64 Proc.)

ckt1

620K

770

550K

370K

270K

4,257s

661s

245s

106s

58s

49s

ckt2

2.9M

3K

1.9M

1.2M

810K

N/A

18,065s

6,429s

2,545s

1,493s

1,179s

ckt3

290K

80K

405K

210K

0

20.5h

4,761s

1,729s

703s

439s

337s

ckt4

430K

145K

360K

180K

50K

49.4h

7,297s

2,731s

1,182s

782s

596s

ckt5

1M

6.5K

2.2M

1M

5K

N/A

5,714s

2,083s

855s

443s

318s

TABLE 1 Transient simulation runtime

37


38

< TECH FORUM > VERIFIED RTL TO GATES

on a FWGrid has only 32 processors. The ParMETIS partition of the linear subdomain is very important for parallel scalability, as we have noticed that without ParMETIS it is very hard to achieve performance gains when more than 16 processors are used.

[2] K. Sun, Q. Zhou, K. Mohanram and D. C. Sorensen ”Parallel domain decomposition for simulation of large-scale power grids”, in Proc. ICCAD, 2007, pp. 54-59 [3] W. Dong, P. Li and X. Ye ”Wavepipe: Parallel Transient Simulation of Analog and Digital Circuits on Multi-core Shared-memory Machines”, in Proc. DAC, 2008, pp. 238-243. 4. Conclusions and future research [4] X. Ye, W. Dong, P. Li and S. Nassif ”MAPS: MultiA fully parallel circuit simulation tool for transistor-level Algorithm Parallel Circuit Simulation”, in Proc. ICCAD, 2008, simulation of large-scale deep-submicron VLSI circuits has pp. 73-78. been presented. An orders-of-magnitude speedup over [5] H. Peng and C.-K. Cheng, ”Parallel Transistor Level Circuit Berkeley SPICE3 is observed for sets of large-scale real deSimulation using Domain Decomposition Methods”, in Proc. sign circuits. Experimental results show an accurate waveASPDAC, 2009, pp. 397-402. form match with SPICE3. [6] N. Frohlich, B. M. Riess, U. A. Wever, and Q. Zheng ”A For future work, we would like to extend this method to suNew Approach for Parallel Simulation of VLSI Circuits on percomputers with hundreds of processors. Parallel load bala Transistor Level”, IEEE Transactions on Computer-Aided ancing and machine scheduling techniques will be developed Design of Integrated Circuits and Systems, vol. 45. No.6, June to ensure our tool scales with growing numbers of processor 1998 and for circuit sizes with hundreds of millions of elements. [7] S. Hutchinson, E. Keiter, R. J. Hoekstra, H. A. Watts, A. J. Waters, R. L. Schells and S. D. Wix ”The Xyce Parallel Electronic Simulator - An Overview”, IEEE International Acknowledgment Symposium on Circuits and Systems, Sydney (AU), May 2000 The authors would like to acknowledge the support of NSF CCF[8] A. Ruehli and T. A. Johnson ”Circuit Analysis Computing 0811794 and the California MICRO Program. by Waveform Relaxation”, in Encyclopedia of Electrical and Electronics Engineering, vol. 3, Wiley, 1999. References [9] G. Karypis, K. Schloegel and V. Kumar, ParMETIS - Parallel [1] L. Nagal, ”Spice2: A Computer Program to Simulate Graph Partitioning and Sparse Matrix Ordering, University of Semiconductor Circuits,” Tech. Rep. ERL M520, Electronics Minnesota, http://glaros.dtc.umn.edu/gkhome/metis/parmetis/ Research Laboratory Report, UC Berkeley, 1975 overview, 2003 [10] B. Smith, Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial Differential Equations, Cambridge University Press, 2004. [11] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, 2003. [12] S. Balay, K. Buschelman, V. Eijkhout, W. D. Gropp, D. Kaushik, M. You’ve found it. Our software is built to G. Knepley, L. Curfman McInnes, B. run right out of the box—no integration F. Smith and H. Zhang, PETSc Users required—and we provide full support for Manual, ANL-95/11, Argonne National popular tools. With Micro Digital you have Laboratory. low-cost, no-royalty licensing, full source [13] T. A. Davis, Direct Methods code, and direct programmer support. So for Sparse Linear Systems, SIAM, get your project off to the right start. Visit Philadelphia, Sept. 2006. Part of the www.smxrtos.com/processors today. SIAM Book Series on the Fundamentals of Algorithms. SMX RTOS TCP/IP USB Device BSPs FAT File System USB Host [14] FWgrid Home Page: http://fwgrid. Device Drivers Flash File System USB OTG ucsd.edu/ Kernel Awareness GUI USB Class Drivers Simulator C++ Support Floating Point Department of Computer Science and Engineering Free Evaluation Kits: www.smxrtos.com/eval University of California, San Diego Free Demos: www.smxrtos.com/demo RTOS INNOVATORS La Jolla 800.366.2491 sales@smxrtos.com CA 92093-0404 USA ARM s ColdFire s PowerPC s X86

Looking for the right software for your processor?

Untitled-12 1

6/1/09 3:39:56 PM


A Powerful Platform for Amazing Performance Performance. To get it right, you need a foundry with an Open Innovation Platform™ and process technologies that provides the flexibility to expertly choreograph your success. To get it right, you need TSMC. Whether your designs are built on mainstream or highly advanced processes, TSMC ensures your products achieve maximum value and performance. Product Differentiation. Increased functionality and better system performance drive product value. So you need a foundry partner who keeps your products at their innovative best. TSMC’s robust platform provides the options you need to increase functionality, maximize system performance and ultimately differentiate your products. Faster Time-to-Market. Early market entry means more product revenue. TSMC’s DFM-driven design initiatives, libraries and IP programs, together with leading EDA suppliers and manufacturing data-driven PDKs, shorten your yield ramp. That gets you to market in a fraction of the time it takes your competition. Investment Optimization. Every design is an investment. Function integration and die size reduction help drive your margins. It’s simple, but not easy. We continuously improve our process technologies so you get your designs produced right the first time. Because that’s what it takes to choreograph a technical and business success. Find out how TSMC can drive your most important innovations with a powerful platform to create amazing performance. Visit www.tsmc.com

Copyright 2009 Taiwan Semiconductor Manufacturing Company Ltd. All rights reserved. Open Innovation Platform™ is a trademark of TSMC.


< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION

Reducing system noise with hardware techniques Bonnie C. Baker, Texas Instruments Bonnie Baker is a senior applications engineer for Texas Instruments and has been involved with analog and digital designs and systems for over 20 years. She has written hundreds of articles, design and application notes, conference papers, and authored the book A Baker’s Dozen: Real Analog Solutions for Digital Designers.

Transmission line reflection and ground bounce are two of the main issues that arise in any discussion of noise issues for digital circuitry. Generally, though, digital circuits operate with relatively large signal levels that have high noise margins, making them inherently immune to low-level noise pick-up. If a circuit performs analog or data acquisition activities, a small amount of external noise can cause significant interference. For instance, 10mV of noise in the analog ground between a 12-bit analog-to-digital converter (ADC) and

the converter’s driver amplifier can cause an error of eight least-significant-bits (LSB). In contrast, digital systems can tolerate hundreds of millivolts of this type of ground error before intermittent problems start to occur. Finding the origin and then eliminating interfering noise in the analog domain presents a formidable challenge. Of particular interest are ‘slow’ sensor systems where designers are tempted to ignore problematic, high-frequency noise issues. This article looks at hardware noise reduction strategies for signal conditioning paths with sensors. It will explore noise topics such as conducted, device and radiated noise from an analog perspective.

Data acquisition circuit using a load-cell sensor Figure 1 shows the example circuit used in this discussion. Source: TI

Wall Wart

9V DC Out VDD

Two-Op Amp Instrumentation Amplifier

A6 R3

REF2925 2.5V Reference

uLM 78

RG

A5

R4

VDD=5V R3

VDD VDD R4 R1

DCLOCK LCN

R2

LCL816G

A4

R2 A1

R1

ADS7829 A2

LCP

DOUT CS/SHDN

to TMS320C6713 DSK

40

1/2 of OPA2337

FIGURE 1 A 12-bit ADC, combined with an instrumentation amplifier, converts a low-level signal from a bridge sensor


EDA Tech Forum June 2009

Circuit noise problems can originate from a variety of sources. By carefully examining attributes of the offending noise you can identify it’s source, thereby making noise reduction solutions become more apparent. There are three subcategories of noise problems: device, conducted and radiated noise. If an active or passive device is the major noise contributor, you can substitute lower noise devices into the circuit. You can reduce conducted noise with by-pass capacitors, analog filters and/or rearrange positions of the devices on the board with respect to the power connectors and signal path.

Number of Occurrences

Source: TI 90 80 70 60 50 40 30 20 10 0

Code Width of Noise = 44 (Total Samples = 1024) 6.54 Noise-Free Bits

2960

2970

2980

2990

Output Code of 12-bit A/D Converter

FIGURE 2 Poor implementation of the 12-bit data acquisition system easily could have an output range of 44 different codes with a 1024 sample size Source: TI

You can minimize the contribution of radiated noise with a careful layout that avoids signal-coupling opportunities, inclusion of ground and power planes and system shielding techniques. This article discusses and illustrates these strategies with reference to a data acquisition circuit using a load-cell sensor. The instrumentation amplifier consisting of two op amps (A1 and A2) and five resistors creates a 153V/V gain. This gain matches the instrumentation amplifier block’s fullscale output swing to the ADC’s full-scale input range. The SAR ADC has an internal input sampling mechanism. With this function, each conversion produces a single digitized sample. The processor, for example the TMS320C6713B [4], acquires the data from the SAR converter, performs some calibration and translates the data into a usable format for tasks such as displays or actuator feedback signals. The transfer function, from the sensor to the output of the ADC is:

 212  DOUT = (( LCP − LC N )(GAIN ) + VREF1 )   VDD 

nV/ Hz (log)

1 / f Noise

where:

Broadband Noise Frequency (log)

FIGURE 3 Noise contributions from devices across the frequency spectrum emulate a 1/f characteristic at low frequencies and a flat response (broadband noise) at the higher frequencies Its analog portion consists primarily of the load-cell sensor, a dual operational amplifier (op amp) (OPA2337 [4]) configured as an instrumentation amplifier, and a 12-bit, 100kHz SAR ADC (ADS7829 [4]). The sensor (LCL-816G [4]) is a 1.2kW, 2mV/V load cell with a full-scale range of ±32oz. In this 5V system, the electrical full-scale output range of the load cell is ±10mV.

LCP = VDD ( R2 / ( R1 + R2 )), LC N = VDD ( R1 / ( R1 + R2 )), GAIN = (1 + R3 / R4 + 2 R3 / RG ) In this equation, LCP and LCN are the positive and negative sensor outputs, GAIN is the gain of the instrumentation amplifier circuit. VREF is a 2.5V reference, which level shifts the instrumentation amplifier output, VDD is the power supply voltage and sensor excitation voltage, and DOUT is a decimal, whole number representation of the 12-bit digital output code of the ADC. If the design implementation is poor, this circuit could be an excellent candidate for noise problems. The symptom of a poor implementation is an intolerable level of uncertainty over the digital output results from the ADC. It is easy to assume that this type of symptom indicates that the last device in the signal chain generates the noise problem. On the conContinued on next page

41


< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION

Source: TI

Wall Wart VDD 9V DV Out

Two-Op Amp Instrumentation Amplifier

A6 R3

REF2925 2.5V Reference

uLM 78

RG R4

A5 VDD=5V

R3

VDD

2nd Order Low-Pass Filter C1

VDD R4 R1

R2 LCN

R2

R8

R1

LCL816G

A4

A1 A2

R7

C2

OPA340

LCP 1/2 of OPA2335

A3

ADS7829

to TMS320C6713 DSK

42

FIGURE 4 A second revised design has lower noise devices, by-pass capacitors, a second-order anti-aliasing filter and ground plane trary, the root cause of poor conversion results could stem from the other active devices, from passive components, the PCB layout, or even extraneous sources. For instance, if a designer did not take appropriate noise reduction measures, the 12-bit system in Figure 1 could output a large distribution of codes for a DC input signal as shown in Figure 2 (p. 39). The data it shows is far from an optimum implementation. Forty-four bits of peak-to-peak error changes the 12-bit converter system into a noise-free, 6.5-bit system. Noise problems can be separated into these three subcategories: 1. Device Noise. This originates in active or passive devices on the board. 2. Conducted Noise. This appears in the PCB traces, and originates in devices on the board, or as a result of efields or b-fields. 3. Radiated Noise. This is transmitted into the system via e-fields or b-fields.

Device noise You can find device noise in both passive and active devices. The materials in passive devices can be films or composites. Resistors, capacitors, inductors and transformers fall into this category. The material in active devices is silicon. Active devices include bipolar transistors, field effect transistors, CMOS transistors and integrated circuits that use these

transistors. When you add device noise sources together, the equations are different from those used to describe voltage, current and number of bits. The fundamental difference is that noise signals are uncorrelated. Therefore, you implement a simple addition of voltage or current noise sources with an RSS formula, or the square root of the sum of the squares. If adding several voltage sources, you would use the following formula: 2 VTOTAL = V12 + V22 + V32 + ... + VN2

This formula applies to noise contributions over a specific bandwidth (BW). If there is no bandwidth definition, the particular test frequency must be used. In this case, the noise units are V/Ă–Hz. These units of measure describe the voltage noise density (also known as spot noise). Spot noise is measured at a specific frequency over a 1Hz bandwidth. Generally, the units of measure for voltage noise are: nV/ Ă–Hz, mVrms or mVp-p. Passive devices/Resistors There are three basic classes of fixed resistors: wirewound, film type and composition. Regardless of construction, all resistors generate a noise voltage. This noise is primarily a result of thermal noise. Lower quality resistors such as the composition type have additional noise in the lower fre-


EDA Tech Forum June 2009

Source: TI

The 1/f noise is a low-frequency noise where power density varies as the reciprocal of frequency (1/f). This noise is a consequence of trapped carriers in the semiconductor material, which are captured and released in a random manner. The time constant of this energy is concentrated within the lower frequencies. Figure 3 (p. 39) shows an example of 1/f noise. Broadband noise is associated with the DC current flow across p-n junctions. This noise is due to a random diffusion of carriers through the base of the transistor and a random generation and recombination of whole electron pairs. You can reduce the noise that the active devices generate by selecting low-noise devices at the start.

Conducted noise Conducted noise is the noise present on PCB traces. These problems can often be corrected at the point of origin.

Code Width of Noise = 1 (Total Samples = 1024)

FIGURE 5 When implementing the circuit in Figure 1 using noise reduction techniques, a 12-bit system can be achieved

quency spectrum due to shot and contact noise. Thermal noise (aka Johnson noise) is generated by the random thermal motion of electrons in the resistive material. This noise is independent of DC or AC current flow, and is constant across the entire frequency spectrum. The ideal thermal noise for resistors is:

VN = 4 × K × T × R × ( BW ) In this equation K is equal to Boltzman’s constant (1.38e19), T is equal to temperature in Kelvin, R is the resistance value in Ohms, and (BW) is the noise bandwidth of interest. Wirewound resistors are the quietest of the three and come closest to ideal noise levels. Composition resistors are the noisiest because of their contact noise, which is aggravated by current. Otherwise composition resistors have the same noise as wirewound. You can reduce resistive noise by reducing the value of the resistors on your board. Active devices This category of devices includes op amps, instrumentation amplifiers, voltage references and voltage regulators, among others. Two areas of voltage noise in the frequency domain are the 1/f and broadband regions.

Power supply filter strategies Regardless of the power source, good circuit design implies that by-pass capacitors are used. While a regulator, DC/DC converter, linear or a switching power supply can provide power to the board, in all cases by-pass capacitors are a required part of the design. By-pass capacitors belong in two locations on the PCB: one at the power supply (10mF to 100mF or both), and one for every active device (digital and analog). The value of each by-pass capacitor will depend on each device it is associated to. Generally speaking, if the device’s bandwidth is less than or equal to ~10MHz, a 0.1mF by-pass capacitor will reduce injected noise dramatically. If the bandwidth of the device is above ~50MHz, a 0.01mF by-pass capacitor is probably appropriate. Between these two frequencies, either or both can be used. In all cases, it is best to refer to the manufacturer’s guidelines for specifics. By-pass capacitor leads must be placed as close as possible to the device’s power supply pin. If two by-pass capacitors are used for one device, the smaller of the two should be closest to the device pin. Finally, the lead length of the by-pass capacitor should be as short as possible in order to minimize lead inductance. Signal path filtering strategies A system such as that shown in Figure 1 requires an analog filter. The primary function of the low-pass, analog filter is to remove the input signal’s high-frequency components going into the ADC. If these high frequencies pass to the ADC, they will contaminate the conversion data by aliasing during the conversion process. To attenuate high-frequency noise, a two-pole, anti-aliasing filter is added to the circuit.

Continued on next page

43


44

< TECH FORUM > DIGITAL/ANALOG IMPLEMENTATION

Finding the origin and then eliminating noise in the analog domain presents a formidable challenge. Layout strategies Device placement is critical. In general, the circuit devices can be separated into two categories: high-speed (>40MHz) and low-speed. Then, they should be separated again into three sub-categories: pure digital, pure analog and mixed signal. The pure analog devices should be furthest away from the digital devices and the connector to ensure that digital switching noise is not coupled into the analog signal path through the traces or ground plane.

Emitted or radiated noise A circuit’s level of susceptibility to extraneous noise is directly related to the implementing signal traces across the board, ground plane and power plane strategy, and subtleties such as using differential signal paths and shielding. Signal traces As a basic guideline, both analog and digital signal traces on PCBs should be as short as possible. Shorter traces minimize the circuit’s susceptibility to onboard and extraneous signals. The amount of extraneous noise that can influence the PCB is dependent on the environment. Opportunities for onboard signal coupling, however, can be avoided with careful design. One set of terminals to be particularly cautious with are the input terminals of an amplifier. Radiated noise problems can arise because these terminals typically have high-impedance inputs. Signal coupling problems occur when a trace with a highimpedance termination is next to a trace with fast changing voltages. In such situations, charge is capacitively coupled between the traces per the formula: I = CdV/dt

(Formula 1)

In Formula 1, current is in amperes, C is capacitance, dV is change in voltage, and dt is change in time. Ground and power supply strategy Board layout definition, ground plane implementation and power supply strategy are critical when designing low-noise solutions. The PCB used in the data for Figure 2 did not have a ground plane. Ground planes solve problems such as offset errors, gain errors and noise problems. The inclusion of the power plane in a 12-bit system is not as critical as the presence of the ground plane. Although a power plane can solve many problems, power

noise can be reduced by making the power traces two or three times wider than minimum trace widths on the board.

Back to the drawing board If we modify the circuit in Figure 1 with low-noise strategies in mind, we end up with the circuit in Figure 4 (p. 40). The PCB has an added ground plane, lower value resistors, lower noise amplifiers, a low-pass filter and by-pass capacitors. Figure 5 (p. 41) shows the resulting noise data in histogram form.

References 1. Morrison, Ralph, Noise and Other Interfering Signals, John Wiley & Sons, 1992. 2. Ott, Henry W., Noise Reduction Techniques in Electronic Systems, John Wiley & Sons, 1988. 3. Allen, Holberg, CMOS Analog Circuit Design, Holt, Rinehart and Winston, 1987. 4. These datasheets are available for download using the following URLs: (a) “MicroSIZE, Single-supply CMOS Operational Amplifiers” (OPA337, OPA2337, OPA338, OPA2338), Texas Instruments, March 2005, www.ti.com/opa-ca. (b) “100ppm/°C, 50mA in SOT23-3 CMOS Voltage Reference” (REF2912, REF2920, REF2925, REF2930, REF2933, REF2940), Texas Instruments, February 2008, www.ti.com/ voltageref-ca. (c) “10/8/12-bit High speed 2.7 V microPOWER Sampling Analog-to-Digital Converter” (ADS7826, ADS7827, ADS7829), Texas Instruments, June 2003, http://www.ti.com/ ads7826-ca. (d) “TMS320C6713B Floating-point digital signal processor”, Texas Instruments, November 2005, www.ti.com/tms320c6713bca. (e) “Single-supply, Rail-to-rail Operational amplifier (OPA340, OPA2340, OPA4340),” Texas Instruments, November 2007, www.ti.com/opa340-ca. (f) “0.5mV/°C max, single-supply CMOS Operational Amplifiers Zero Drift Series” (OPA334, OPA2334, OPA2335, OPA2335), Texas Instruments, www.ti.com/opa334-ca. Texas Instruments 12500 TI Boulevard Dallas TX 75243 USA T: 1 972 995 2011 W: www.ti.com


at the heart... of SoC Design ARM IP — More Choice. More Advantages. • Full range of microprocessors, fabric and physical IP, as well as software and tools • Flexible Foundry Program offering direct or Web access to ARM IP • Extensive support of industry-leading EDA solutions • Broadest range of manufacturing choice at leading foundries • Industry’s largest Partner network

www.arm.com The Architecture for the Digital World® © ARM Ltd.AD123 | 04/08


< TECH FORUM > DESIGN TO SILICON

Computational scaling: implications for design Phil Strenski, Tim Farrell, IBM Microelectronics Source: IBM

Phil Strenski is Computational Technology Rules Lead in Design & Technology Integration at IBM’s Semiconductor Research and Development Center. He holds a doctorate in physics from Stanford University.

10000 Nanometers

Tim Farrell is an IBM Distinguished Engineer in IBM’s Systems and Technology Group and is currently leading its initiatives in Computational Technology. He joined IBM in 1982 with dual degrees in Optical Engineering and Economics from the University of Rochester.

Optical Scaling (λ/NA) (Historically ~ 10%/year)

1000

100

1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010 Historical Trend

Outlook

FIGURE 1 Optical scaling since 1980 Source: IBM

193nm Optical Range of Influence

22nm Technology Node

In these early years of the 21st century, major obstacles to circuit design can be seen in terms of premature perturbations to design practices attributable to the later-than-desired realizations of advanced semiconductor technologies. The perturbations were inevitable, but they still underlined the absence of key elements from the technology roadmap. The most widely known example of this is the demise of traditional CMOS performance scaling experienced during the first half of the decade. The inability to control off current as device channel length was scaled for performance led to the architectural shift from single to multiple core processors. Although the laws of physics were unavoidable, the change in design practice took place earlier than anticipated because industry initially lacked a high-K gate dielectric material. As we near the end of this decade, we face a similar perturbation in circuit design techniques as they relate to density scaling. Historically, density scaling has relied on lithographic scaling. However, delays to next-generation lithography (NGL) now present us with a discontinuity in the lithographic roadmap supporting that link. IBM recognized the need for innovation to address this problem some time ago and recently announced that it is pursuing a Computational Scaling (CS) strategy for semiconductor density scaling [1]. This strategy is an ecosystem that includes the following components, alongside necessary technology partnerships: • a new resolution enhancement technique (RET) that uses source-mask optimization (SMO); • virtual silicon processing with TCAD; • predictive process modeling; • design rule generation; • design tooling;

Lenses

90nm Technology Node

46

45nm Technology Node

FIGURE 2 An expanding range of influence • design enablement; • pixilated illumination sources; • variance control; and • mask fabrication. This article describes the lithographic discontinuity that created the need for this solution, the implications for design, and the design tooling needed for the CS strategy.


EDA Tech Forum June 2009

The article presents the context for the use of computation scaling (CS) to eke out more from existing lithography tools until next-generation techniques are finally introduced. It discusses the critical elements in the CS ecosystem developed by IBM and partners to overcome roadblocks to optical scaling that demand the use of non-traditional techniques for the incoming 22nm process node. The differing roles of engineers in the design and process segments of a project flow are discussed, as are some of the tools that will make CS a reality.

Patterning technology Density scaling is the progressive packing of increasing numbers of circuit patterns into a set area of silicon. For nearly three decades this has been accomplished by optical scaling. Optical scaling is the introduction of lenses with either shorter exposure wavelengths (l) and/or larger numerical apertures (NA). A metric for optical scaling is l/NA where smaller values equate to smaller feature sizes and higher circuit density. Operationally this has been accomplished by the periodic purchase of a new exposure tool and the optimized selection of wafer processing set points and mask types. As shown in Figure 1, optical scaling historically enabled a 10% annual reduction in feature size and an 81% annual increase in density through 2007. However, due to economic and technical issues, traditional scaling will not resume until next-generation lithographic (NGL) techniques such as extreme ultraviolet (EUV),

nano-imprint and multi-column electron beam become available. Although we have been able to realize a 10% annual improvement in optical scaling, this did not by itself support the two-year technology development cycle introduced in the late 1990s. As such, there has been a growing gap between desired optical scaling and realized optical scaling. One impact of this gap has been a decrease in the optical isolation of individual design constructs. The consequence, as shown in Figure 2, has been that individual constructs need to be viewed in the context of an expanding neighborhood. The industry has managed this gap by introducing 193nm lithography, off-axis illumination, immersion lithography and double patterning. However, as shown in FigContinued on next page Source: IBM

Problem Visualization: 45nm

32nm

k1=0.45 Single Exposure

k1=0.35 Single Exposure Design Topology Will Not Migrate to 22nm

FIGURE 3 Traditional optical scaling is not enough for 22nm

22nm

k1 ~ 0.25 Double Exposure (DDL)

47


< TECH FORUM > DESIGN TO SILICON

Source: IBM

Design

Target

Mask

OPC

Retarget

On Wafer

Pattern Extract From This

FIGURE 4 Retargeting shapes for eventual wafer contours Source: IBM

0 A

B

-5 % change in yield

48

-10 -15 -20 Layout A -25

0

0.5

Layout B 1

1.5

2

Lithography Variation

FIGURE 5 Advanced modeling techniques are the way forward ure 3, attempts to extend traditional optical scaling to the 22nm/20nm process node for a traditional 2D circuit pattern produce unacceptable results. The current industry direction to address the highlighted problem at 22nm is the use of double (or triple) patterning with highly restrictive design rules (e.g., single orientation, near singular pitches, forbidden constructs) and design for manufacturing (DFM) tools that place responsibility for managing technological complexity on the shoulders of the designer. All of these approaches are driven by the increasing variance between designed 2D wafer patterns and resultant wafer patterns. Such a path drives a costly and complex departure from traditional IC design migration paths and increases the cost of wafer production for the fabricator.

Design implications The first important point to observe is that because l/ NA has not been scaling consistently with incoming geometries, the radius of influence for lithographic concerns has been growing in terms of design features. This problem is illustrated in Figure 3 (p. 45). In the past, this radius

might cover at most a nearby edge pair, so width/space rules were generally a sufficient response. As this radius has grown, the complexity of rules has grown as well, resulting in width-dependent space rules and other multiple edge constraints. At the same time, the typical curvature introduced on the wafer has become comparable to feature size, so that it is no longer reasonable to assume that wafer contours will essentially resemble the original drawn shapes, except for some minor corner rounding. It is necessary therefore to consider patterns of larger radius, and correspondingly less detail. A second concern is that the various lithographic solutions available are not simply ordered. Any given approach to sub-wavelength lithography favors some classes of layouts at the expense of others. It is critical to work with design evaluation processes that will lead to the selection of that technique that best fits your design. For example, a strong dipole is good at printing parallel lines in one direction at certain pitches. But that comes at the cost of wires in the other direction. Pushing the minimum pitch may also introduce dead zones of forbidden pitch. Going to multiple exposures introduces further trade-offs. Does one use the second exposure to print alternating lines at a tighter pitch at the cost of the other direction, or print both directions with a more relaxed pitch, or enhance the printability of difficult 2D situations? A helpful concept here is the idea of retargeting (Figure 4). This involves the adjustment of drawn shapes to serve as targets for eventual wafer contours. Of necessity, this is already happening to print certain features, such as isolated lines. But it can also be exploited to simplify the design representation. Given the flexibility to adjust shapes so that they satisfy manufacturability needs, a design can be represented on a coarser grid, capturing the topological intent without undue attention to fine details of edge movement, and without the need for identifying or following an inordinate number of rules when such small movements are allowed. Density needs can be assisted by the identification of prevalidated constructs, consisting of topologies not generally allowable, but manufacturable in certain well-defined contexts with certain specified retargeting (cf, SRAM cells but in the context of logic cells). A close design-technology interaction is required to make sure such constructs are properly validated along with ordinary content, and defined for maximum utility to design. Updates to this methodology are likely, but much of the infrastructure is already present in the form of parameterized cells. It is helpful when thinking about these concepts for design and design automation to consider the design community as falling into two camps. One is made up of those who use technology to produce full chip designs. The other comprises those who work with technology


EDA Tech Forum June 2009

Any given approach to sub-wavelength lithography favors some classes of layouts at the expense of others. to define its character. The first group is increasingly focused on productivity, away from detailed layout and toward automation, micro-architecture and the balance between power and performance. The second is aware of technology limitations and uses tools like lithographic simulation to evaluate the trade-offs between manufacturability and design issues like density, rule complexity and design methodology. For the first community of chip designers, the overriding technological direction is fairly synergistic and transparent. More design will be prevalidated in cell libraries and other building blocks. Wiring level rules will be evaluated for designability (i.e., friendliness to router technology), making automation more likely. And an early focus on high value patterns (with detailed implementation left to technology) should reduce the risk of significant layout change late in design. Moving toward more regular design rules should also contribute to this simplification, removing the need for this community to worry about detailed edge movements. There are some areas that could affect the first community, depending on how the design rules evolve. One clear example is the anisotropy in wiring levels. Lithographic solutions often strongly favor one direction, so wire pitch and width values will differ depending on the preferred direction. The value of patterns in expressing design constructs and layout issues suggests opportunities to impact productivity by exploiting such patterns in construction (routing, cell assembly) or analysis (extraction, rule checking) tasks. For example, technology could directly deliver router building blocks rather than delivering edge rules that the router configuration must experiment with to produce useful router rules. Cell assembly could use predefined blocks to achieve better density but still maintain a coarse topological grid. Extraction could use look-up for predefined content to improve accuracy and runtime, and access a retargeting process to improve accuracy. The second community of designers contributing to technology definition is more obviously affected by the emerging discontinuity. More prevalidated content will be developed (similar, again, to SRAM cells today), with simulation validation and advanced lithographic techniques as illustrated in Figure 5. The modeling of all aspects of the manufactur-

ing flow (in addition to lithography) will need to improve to allow trade-offs to be made intelligently with simulation. Design and technology will need to develop efficient means of communicating: technology simulation and uncertainty to design, and design evaluations and proposed content to be added to technology offerings. Some aspects of the handoff between design and technology may evolve as a result (e.g., delivering patterns for predefined content rather than rules, or delivering router configurations directly). The involvement of design will occur earlier in a technology node, to help make the difficult decisions among the unavoidable trade-offs. New mechanisms for expressing design intent (beyond just drawn shapes) can enable technology to further optimize the contours for added value (yield, density, power -performance).

Summary Severely sub-wavelength lithography presents some unavoidable conflicts with traditional scaling assumptions. However, disciplined design-technology co-optimization provides opportunities to define effective value by carefully considering the necessary trade-offs early in the technology cycle. Design communities will be affected differently based on their interaction with technology. Chip designers will see fairly evolutionary changes, with more regular design rules, perhaps supplemented with patterns of predefined content in constrained situations. Designers working with technology will have an increased ability to influence its direction by understanding tradeoffs and working to optimize design value. IBM is engaged in all aspects of this in delivering its computational scaling solution, and is working with its partners to deliver valuable manufacturable technologies in this deep sub-wavelength realm.

References [1] “IBM Develops Computational Scaling Solution for Next Generation ‘22nm’ Semiconductors”, press release issued 17 September 2008 (access online at http://www-03.ibm.com/press/ us/en/pressrelease/25147.wss). IBM Microelectronics Division East Fishkill facility Route 52 Hopewell Junction NY 12533 USA W: www.ibm.com

49


< TECH FORUM > TESTED COMPONENT TO SYSTEM

Antenna design considerations Brian Petted, LS Research Brian Petted is the chief technology officer of L.S. Research, a wireless product development company and EMC testing laboratory. He holds a BSEET degree from the Milwaukee School of Engineering (MSOE) and a MSEE from Marquette University.

Antenna requirements Gain and communication range With the advent of prolific wireless communications applications, system designers are in a position to consider the placement and performance of an antenna system. The first step in establishing antenna requirements is to determine the desired communication range and terminal characteristics of the radio system (i.e., transmit power, minimum receiver sensitivity level). Given those parameters, one can ascertain the amount gain or loss required to maintain the communication range by using the Friis Transmission formula [1]:

Pr c 2 gt gr = Pt ( 4π rf )2 where:

Pr = received power [W ] Pt = transmitted power [W ]

m c = speed of light   s

W  gt = transmit antenna gain   W  W  gr = receive antenna gain   W  f = cyclic frequency [Hz] r=communication range [m]

This relation is only valid for free-space propagation, but illustrates the important role of the antenna gain in the maximization of the receive-to-transmit power ratio, or system link gain. Antenna size and clearance Antenna gain (or loss) must be part of a trade-off study between performance and the physical realization considerations of size, placement and clearance (distance from obSource: LSR

Gain Coverage Probability Density Functions 100

Directive Gain Probability

50

Gain Coverage Probability 100

v pdf h pdf

10-1

10-1

10-2

10-2

10-3

10-3

10-4

10-4

10-5

10-5

10-6 -20

v ccdf h ccdf

10-6 -15

-10

-5

0

5

10

15

Directive Gain (dBi)

FIGURE 1 Gain pdf (left) and associated ccdf (right)

20

-20

-15

-10

-5

0

5

Directive Gain (dBi)

10

15

20


EDA Tech Forum June 2009

Source: LSR

An overview of antenna design considerations is presented. These considerations include system requirements, antenna selection, antenna placement, antenna element design/simulation and antenna measurements. A center-fed dipole antenna is presented as a design/ simulation example. A measurement discussion includes reflection parameter measurements and directive gain measurements. Source: LSR

λ /4

λ /4

FIGURE 4 Sleeve Dipole Design Input into CST Microwave Studio simulator FIGURE 2 Antenna evolution from the half-wave dipole (left): quarter-wave monopole over a ground plane (center), L-antenna (right) Source: LSR

mension of the antenna, D, and the operating wavelength determine this distance [3].

 2 D2  d>  ; r>>D; r>>λ  λ  For example, if the largest dimension of the antenna is a half of a wavelength, the minimum clearance zone is a halfwavelength. This serves as a basic guideline, however in many physical realizations, this clearance zone is compromised and the effects must be determined through simulation or empirical measurement. 2

FIGURE 3 Inverted F Antenna evolution from the L-antenna by feeding the antenna at a more favorable impedance point (left): extruded version of Inverted F antenna produced Planar Inverted F antenna (right)

 λ 2   2 λ = d> 2 λ

structions). One basic antenna relationship presented below shows that antenna gain, g, and then antenna effective aperture (area) are directly proportional. This roughly indicates that antenna gain is proportional to the physical size of the antenna [2].

Antenna gain details Antenna gain is defined as the ratio of radiated power intensity relative to the radiated power intensity of an isotropic (omni-directional) radiator. Power intensity is the amount of radiated power per unit solid angle measured in steradians (sr) [4]. The sphere associated with the isotropic radiator has a steradian measure of 4π steradians and serves as the normalization reference level for antenna gain.

 gλ 2  c Ae =  ; λ = wavelength [m] =  f  4π  Another basic antenna relationship shows the Fraunhofer or Rayleigh distance, d, at which the near/far-field transition zone exists. Ideally, there should be a free-space clearance zone around the antenna of at least d. The largest di-

Continued on next page

51


52

< TECH FORUM > TESTED COMPONENT TO SYSTEM

Source: LSR

FIGURE 5 Sleeve Dipole reflection coefficient (left) and impedance prediction (right). Source: LSR

FIGURE 6 Sleeve Dipole 3D antenna pattern for θ-directed electric field component

gt =

U rad U [W / sr ] = rad ; Pt [W / sr ] Uisotropic 4π

The antenna gain expression can be expanded further to reveal other factors that contribute to the overall antenna gain. The radiation intensity for the antenna is a function of the antenna efficiency, η, and the directivity, D. The antenna efficiency is a product of the reflection efficiency or mismatch loss and the losses due to the finite resistances and losses in the antenna element conductor and dielectric structures. The mismatch loss can be ascertained through simulation or measurement of the antenna’s input impedance or reflection coefficient, Г. The directivity is a description of the gain variation as a function of the link axis angle(s) or the angle(s) of arrival/departure as described by the standard spherical coordinate system.

Antenna gain patterns Ideally, antenna patterns are displayed as 3D plots (an example is shown to accompany the case study in Figure 6). The plot is often constructed from multiple cross sections known as conical cuts. A typical conical cut is formed by holding the elevation angle, θ, constant and measuring the pattern over a complete revolution of the azimuthal angle, φ. Secondly, a separate plot is generally made for each component of the electrical field or polarization (Eφ-horizontal or Eθ-vertical). Examples of conical cuts are presented in Figure 7, accompanying the case study. Since most antenna patterns are not necessarily omni-directional, the description of antenna gain is fairly complex. In order to serve a system analysis in terms of determining communication range or system gain, the minimum, maximum and average gain over the entire pattern of a particular cut is typically used as the singular antenna gain value in the Friis transmission formula. However, designers may want to determine the distribution of communication ranges and system gains, given the non-uniform nature of a directional antenna that is used in an omni-directional application. In those cases, probability density functions (pdfs) can be associated with antenna patterns, both conical cuts and 3D patterns [5]. Even though the directional antenna patterns are deterministic, the fact that their application is omni-directional with a random link axis angle makes the antenna gain a random variable with respect to communication range and system gain analyses. Figure 1 (p. 48) shows the pdf associated with both the omni- and non-uniform patterns presented on the left. On the right is the complementary-cumulative density function (ccdf), which is


EDA Tech Forum June 2009

derived from the pdf and indicates the probability that the antenna can provide a minimum level of gain, given a random link axis angle. Note for the case of the omni-directional antenna, the pdf is an impulse since the gain is single-valued and has no real distribution. The omni-directional case presents an interesting step-function ccdf. It shows that the probability of having a directive gain at least as large as the abscissa is 1 for gains less than the fixed gain value.

Antenna topologies There are many possible topologies or structures for an antenna. One interesting set of structures is that which evolved from the basic half-wave dipole (Figure 2, p. 49). Starting with the half-wave dipole, the lower element of the dipole can be realized by a reflected image of the upper element onto a ground plane (using electric field boundary conditions and/or image theory). The monopole can be folded over, however, with degradation in impedance match and gain. The degradation due to matching can be recovered by feeding the antenna at a different point along the resonant length of the antenna (recall the impedance variation of a transmission line with a standing wave present). This results in the inverted ‘F’ antenna. The elements may be extruded from the wire form to a planar form to realize an increase in impedance and gain bandwidth, but with a small degradation in gain. These additional evolutions are presented in Figure 3 (p. 49).

Antenna design and simulation The initial design of an antenna can arise from a set of dimensional formulas based on closed-form electromagnetic relations. In practice, however, these antennas require some empirical adjustment and/or tuning steps before you arrive

at a final design. Secondly, the electromagnetic relations associated with most antennas are not of a closed form and therefore do not yield dimensional synthesis equations. Therefore, in order to design and validate an antenna prior to fabrication, it is worthwhile simulating the antenna using a electromagnetic field solver that can predict the behavior of radiating systems. One such solver, CST Microwave Studio [6], offers many options that can simulate open-boundary, radiating structures. Figure 4 (p. 49) shows the relative utility of the simulation tool. It is the input page for a 2.4GHz sleeve dipole antenna, containing the dimensional and material parameter inputs required to carry out the simulation. Upon completion of the electromagnetic simulation, the radiation pattern of the electric field is available as a 3D plot and as conical cuts. Further, the simulator predicts the input reflection coefficient and represents it as a scattering parameter (S11). The simulator provides all of the essential information about the antenna prior to its physical realization in order to pre-validate the design approach. The predicted reflection coefficient and driving point impedance are presented in Figure 5. The predicted 3D radiation pattern is presented in Figure 6, with associated conical cuts shown in Figure 7.

Antenna design validation and measurement With the antenna synthesized and realized, the design must be validated through measurement. The first necessary measurement is to measure the reflection coefficient of the antenna input port or driving point. The reflection coefficient and associated driving point impedance is measured with a vector network analyzer Continued on next page

Source: LSR

FIGURE 7 Associated conical cuts for θ-directed electric field over angle φ at fixed angle θ=90 degrees (left) and θ-directed electric field over angle θ at fixed angle φ=0 degrees (right)

53


54

< TECH FORUM > TESTED COMPONENT TO SYSTEM

(VNA). Care must be taken during this measurement to ensure that the antenna is radiating and not being disturbed by any surrounding objects. Ideally, this measurement is performed in an anechoic chamber. However, with sufficient separation between the antenna and any perturbing obstructions, this measurement can typically be performed within a normal laboratory environment. In order to initially validate the antenna design, the reflection coefficient and associated driving point impedance must be such that the antenna is reasonably matched to the system impedance (generally 50Ω). Once it has been established that the antenna is matched to the system impedance, the radiation pattern must be measured to complete the final steps of design validation. The measurements are performed in an anechoic chamber by exciting the antenna under test with a known transmit source power and measuring the received power, received voltage or electric field intensity at a fixed distance. The antenna is swept through a series of conical cuts in an effort to compare them to simulated results or to build a set of cuts to assemble into a 3D gain pattern. The absolute received signal is normalized either by the conducted power applied to the antenna or compared to a known reference such as a half-wave dipole. Both polarization cases are measured. With the set of pattern data at hand, the measurements can also be examined against the system requirements in terms of minimum, maximum and average gain, or against gain distribution requirements, if applicable.

Conclusion Antennas provide the primary interface between the radio and the propagation environment. The antenna requires special considerations in terms of performance requirements, design constraints, design and realization. Specification of the antenna gain and relating those requirements to the system performance in terms of range and system link gain is a foundation for the design goals of the antenna. During the antenna topology/structure selection process, consider packaging constraints in terms of the size, location and possible obstructions. Be prepared to compromise performance versus package conformance. Ideally, one should use a simulation tool to assess the performance of the antenna prior to realization, not only to gauge the fundamental performance of the antenna, but also to check the effects of antenna compaction, obstructions and other compromised parameters. The final physical realization and consequent measurement of input terminal reflection/impedance and antenna gain complete the

design process. Often, the measurement results require that the antenna structure be modified to empirically optimize its performance.

References [1] Harald Friis, “A Note on a Simple Transmission Formula,” Proc. IRE, 34, 1946, pp. 254-256. [2] Theile and Stutzman, Antenna Theory and Design, Second Edition, John Wiley and Sons, 1998, p. 79. [3] Theile and Stutzman, Antenna Theory and Design, Second Edition, John Wiley and Sons, 1998, p. 30. [4] Theile and Stutzman, Antenna Theory and Design, Second Edition, John Wiley and Sons, 1998, pp. 39-43. [5] B. Petted, “Antenna Gain Considerations in Communications System Range Analysis”, Seminar in Microwave Engineering, Marquette University, March 20, 2009. [6] CST Microwave Studio, CST of America, 492 Old Connecticut Path, Suite 505, Framingham, MA 01701. W: www.cst.com. LS Research W66 N220 Commerce Court Cedarburg WI 53012 USA T: 1 262 375 4400 W: www.lsr.com


STAY ON THE

FRONTOFLINE EE DESIGN

Attend a 2009 EDA Tech Forum® Event Near You

Remaining 2009 Worldwide Locations • • • • • • • • • • •

June 16 – Munich, Germany August 25 – Hsin-Chu, Taiwan August 27 – Seoul, South Korea September 1 – Shanghai, China September 3 – Santa Clara, CA September 3 – Beijing, China September 4 – Tokyo, Japan September 8 – New Delhi, India September 10 – Bangalore, India October 1 – Denver, CO October 8 – Boston, MA

2009 Platinum Sponsors:

Register Now


Low power Highest functionality in its class First 65-nm low-cost FPGA

VERY COOL Cool off your system with Altera® Cyclone® III FPGAs. The market’s first 65-nm low-cost FPGA features up to 120K logic elements—2X more than the closest competitor—while consuming as little as 170 mW static power. That’s an unprecedented combination of low power, high functionality, and low cost— just what you need for your next power-sensitive, high-volume product. Very cool indeed.

Copyright © 2008 Altera Corporation. All rights reserved.

www.altera.com


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.