The magazine of record for the embedded computing industry
August 2013
www.rtcmagazine.com
New COM Module Spec for ARM Processors Optimize Size, Weight and Power (SWaP) Get the Most out of High-Level Code An RTC Group Publication
MicroCloud
12/8 Modular UP Server Nodes in 3U High-Density . High-Performance . High-Efficiency . Easy-to-Service . Cost-Effective Optimized for Cloud Computing and Web 2.0
ML/MC-H12 Node (2x 3.5” SATA3 HDDs)
ML/MC-H12 Node (4x 2.5" HDDs)
SYS-5038ML-H12TRF SYS-5037MC-H12TRF
• • • • • • •
MR-H8 Node
8 Hot-pluggable Nodes in 3U (Rear I/O) NEW! SYS-5038ML-H8TRF
SYS-5037MC-H86RF SYS-5037MC-H8TRF SYS-5037MR-H8TRF
Intel® Xeon® processor E5-2600, E3-1200 v2, and E3-1200 v3 product families High density with 12/8 hot-pluggable UP nodes in 3U Excellent expansion capability with 1 PCI-E 3.0 x8 LP slot per node Up to 2x 3.5” or 4x 2.5" SATA 3.0 (6Gbps) HDDs per node High efficiency Platinum Level (94%) 1+1 redundant power supplies IPMI 2.0 remote management plus KVM with dedicated LAN per node 130W high-performance CPU support available
www.supermicro.com/MicroCloud © Super Micro Computer, Inc. Specifications subject to change without notice. Intel, the Intel logo, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and/or other countries. All other brands and names are the property of their respective owners.
SMCI-20121020 - 4
NEW!
MC-H8 Node
SATA 3.0
12 Hot-pluggable Nodes in 3U (Front I/O)
ML-H8 Node
SOFTWARE DRIVES The Design of SoCs
44 Quad-Core PXI Embedded Controller for Multitasking Test and Measurement
46 Two Channel 200 MS/s 14-Bit PCI Express Digitizer
TABLEOF CONTENTS
47 PCI Express Two-Port RS-232 Serial Interface
VOLUME 22, ISSUE 8
Departments
Technology in Context
TECHNOLOGY DEPLOYED
ARM-Based Modules
Development Tools for SoCs
SMARC Specification Delivers 6Editorial 16 In the Face of Integration, Abstraction Standardized Module Building Empowers Complexity Blocks for Ultra-Low-Power
8
Industry Insider Latest Developments in the Embedded Marketplace
Mobile Connected Applications Norbert Hauser, Kontron
TECHNOLOGY CONNECTED
Form Factor Forum Advances in Compiler Technology 10Small Designing a Universal Carrier Compiler Optimization: Getting the Most out of High-Level Code Products & Technology 22 44Newest Embedded Technology Used by Shawn A. Prestridge, Mentor graphics
Industry Leaders
EDITOR’S REPORT Human Interfaces for the Internet of Things
Meets Web Applications 12M2M Spawning the Internet of Things
TECHNOLOGY IN SYSTEMS
30Software-Driven SoC Development a New Virtual Prototyping Methodology for 36Establishing Hardware/Software Co-Design Jim Ready, Cadence Design Systems
Ric Vilbig and Colin Walls, Mentor Graphics
Industry watch Strategies for Fabrics Strategies for Fabrics—Some
40Choices Settled While Others Go On Wayne McGee, Creative Electronic Systems
Optimizing Size, Weight and Power the Heat Out of Ever Smaller Systems 26 Getting Mike Jones, Adlink Technology
Tom Williams
Digital Subscriptions Available at http://rtcmagazine.com/home/subscribe.php RTC MAGAZINE AUGUST 2013
3
AUGUST 2013 Publisher MSC Embedded Inc. Tel. +1 650 616 4068 info@mscembedded.com www.mscembedded.com
Qseven™ -
MSC Q7-IMX6 Compatible Modules from Single-Core to Quad-Core
Freescale i.MX6 Quad-, Dualor Single-Core ARM Cortex-A9 up to 1.2 GHz up to 4 GB DDR3 SDRAM up to 64 GB Flash GbE, PCIe x1, SATA-II, USB Triple independent display support
The MSC Q7-IMX6 with ARM
HDMI/DVI + LVDS up to 1920x1200
Cortex™-A9 CPU is a compatible
Dual-channel LVDS also usable as 2x LVDS up to 1280x720
module with economic single-core CPU, strong dual-core processor
OpenGL® ES 1.1/2.0, OpenVG™
or a powerful quad-core CPU with
1.1, OpenCL™ 1.1 EP
up to 1.2 GHz, and provides a very
UART, Audio, CAN, SPI, I2C
high-performance graphics.
Industrial temperature range
V-7_2013-WOEI-6535
Untitled-3 1
8/14/13 2:16 PM
PRESIDENT John Reardon, johnr@rtcgroup.com
Editorial EDITOR-IN-CHIEF Tom Williams, tomw@rtcgroup.com SENIOR EDITOR Clarence Peckham, clarencep@rtcgroup.com CONTRIBUTING EDITORS Colin McCracken and Paul Rosenfeld MANAGING EDITOR/ASSOCIATE PUBLISHER Sandra Sillion, sandras@rtcgroup.com COPY EDITOR Rochelle Cohn
Art/Production ART DIRECTOR Kirsten Wyatt, kirstenw@rtcgroup.com GRAPHIC DESIGNER Michael Farina, michaelf@rtcgroup.com LEAD WEB DEVELOPER Justin Herter, justinh@rtcgroup.com
Advertising/Web Advertising WESTERN REGIONAL ADVERTISING MANAGER Stacy Mannik, stacym@rtcgroup.com (949) 226-2024 MIDWEST REGIONAL AND INTERNATIONAL ADVERTISING MANAGER Mark Dunaway, markd@rtcgroup.com (949) 226-2023 EASTERN REGIONAL ADVERTISING MANAGER Shandi Ricciotti, shandir@rtcgroup.com (949) 573-7660
Billing Cindy Muir, cmuir@rtcgroup.com (949) 226-2021
ATCA, μTCA, VME AND CPCI SYSTEMS... FASTER.
To Contact RTC magazine:
Schroff Systems and Subracks EXPRESS provide VITA and PICMG compliant product solutions faster and at a competitive price. Protect your application with standard or customized electro-mechanical and system products – shipped in as few as two weeks and backed by our global network and more than 60 years of engineering experience. See our complete offering online. ®
RAPID DELIVERY
W W W.SCHROFF.US
4
AUGUST 2013 RTC MAGAZINE
VITA and PICMG compliant solutions.
HOME OFFICE The RTC Group, 905 Calle Amanecer, Suite 250, San Clemente, CA 92673 Phone: (949) 226-2000 Fax: (949) 226-2050, www.rtcgroup.com Editorial Office Tom Williams, Editor-in-Chief 1669 Nelson Road, No. 2, Scotts Valley, CA 95066 Phone: (831) 335-1509
Published by The RTC Group Copyright 2013, The RTC Group. Printed in the United States. All rights reserved. All related graphics are trademarks of The RTC Group. All other brand and product names are the property of their holders.
EDITORIAL AUGUST 2013
Tom Williams Editor-in-Chief
In the Face of Integration, Abstraction Empowers Complexity
T
here is a common wisdom that goes something like, “with the scale of integration these days, so many things that were once implemented on a board module are now integrated on a single piece of silicon.” There is, of course, truth to that as far as it goes, but what are the practical implications? Yes, processors are certainly more highly integrated with multiple CPU cores, highperformance on-chip graphics, a variety of integrated peripherals and more. This has most certainly and almost universally reduced the component count for small form factor systems, making them smaller and reducing power consumption and costs. The picture changes rapidly when we start talking about custom silicon. Whenever we contemplate a custom silicon design, even when it only involves putting together predefined blocks of silicon IP, that means at some point firing up a fab. And that means many dollars that can only be justified by sufficient volume. Back when Motorola was in the business of making the 68000 microcontroller line, they had a large selection of designs with a wide choice of mixes of different peripherals. That came about partly because the company would do a custom design to a customer’s order, but that design would also become part of the 68000 product line. A similar variety grew up around the 8-bit Intel 8051 microcontroller family, resulting in vast catalogs of variants. Partially as an alternative to this, Cypress came up with a configurable Programmable System on Chip, or PSoC, and its PSoC 3 line is based around an 8-bit 8051 core. The family has since grown to include PSoCs based on 32-bit ARM cores as well, but has in common an area of programmable PLD-based logic that a developer can use to define a selection of peripheral devices. A tool called PSoC Creator offers a high-level graphical environment that lets the user define this peripheral set at a high level of abstraction. More recently, we have seen the advent of what we have here called application service platforms (ASPs), which combine a single or dual ARM core with a set of commonly used hard-wired peripherals along with a full-blown FPGA fabric. These are offered by Xilinx (Zync-7000), Microsemi (SmartFusion) and Altera (Cyclone V). More about these a bit later. The cost barrier with a full-blown ASIC or SoC is that it involves IC design and most often the use of electronic design automation (EDA) tools, which remain the domain of specialists and are not generally used by embedded system developers, who mostly
6
AUGUST 2013 RTC MAGAZINE
use the silicon products developed with these tools. However, that may be changing somewhat. High levels of integration involve high levels of complexity, and the best or only way to deal with high complexity is with high levels of abstraction. We are starting to see the emergence of some tools that will let developers specify their designs at higher levels of functionality, most often defined by the desired function of the software that the tool will then automate at the lower hardware levels. See the articles in this issue from Mentor Graphics and Cadence for a closer look at this trend. There is even a cooperative effort underway between Associated Compiler Experts and Synopsis to integrate compiler with processor design technology in such a way that would result in an application-specific instruction set processor (ASISP). This would allow the design of custom instructions that would in turn be able to generate a C compiler along with the hardware design. While it is not clear what the level of abstraction is for the processor and instruction set design, it nonetheless brings in the software developer to the early stages of hardware definition. This means that the application concepts drive the silicon design from the start. Again, any of these breakthroughs will still involve a fab and the corresponding volume demands. They will, however, greatly limit the risk of a redesign after the first fabrication run. By allowing the software requirements to define the hardware design and then run on the virtual hardware, both can be better verified prior to actually throwing the switch at the fab. The greatest promise for really custom designs at relatively low volumes and hence costs seems to us to lie with the above-mentioned ASPs. I have heard objections that they are still pretty pricey, but we all know that will go away if they come into ever wider use. What appears to be lacking is a common high-level development environment. Now each of these companies has created a suite of tools that seem to work very well with their products. What is needed is a tool metaphor that can unite the worlds of programmer and FPGA developer at a high level of abstraction and that could be applied to any of these products the way the C language can be used with a vast number of CPUs. That would allow choices among a unified number of applications, and drive up the acceptance while driving down the costs of these truly advanced devices. I know that’s asking a lot, but you should see what I expected of Santa as a kid.
Empowering Diversification - with IBASE Embedded Solutions
Network Appliances
Casino Gaming
Medical Healthcare
Digital Signage
In-Vehicle Infotainment
Industrial Automation
IBASE Embedded Solutions - with AMD Embedded Technology Slim & Powerful Embedded Systems SI-22 » AMD Embedded G-Series SOC » Fanless Design, Slim and Compact » Optional Wi-Fi, Bluetooth Expandibility » 2x HDMI, 1x DDR3 SO-DIMM Max. 8GB » 2x USB 3.0, 1x USB 2.0, Digital I/O » 1x RJ45 GbE LAN, 1x RS232 COM Port
SI-38 » AMD Embedded R-Series Quad Core APU » AMD RadeonTM HD7660 GPU in APU » Supports Full HD, CRT/ DVI/ HDMI with Audio » Unique Segregated Ventilation Design » Optional Wi-Fi, Bluetooth Expandibility
Patient Care Medical System BST-1850 » 18.5” LCD with IP65 Front Panel Protection » Built-in Web Camera with Indication Light » Card Reader & 2D Barcode Scanner » Fingerprint Authenticator(Optional) » Capacitive 10-Point Multi-touch Panel » Smart Mounting with HMI Controller
Small Form Factor Mainboards 3.5” Disk-Size SBCs » AMD Embedded N54L Processor » Compact with Low Power Consumption » Expansion Slot, DVI, LVDS Available » Quality Customization & Manufacturing » Rich I/O onboard: GbE LAN, RS232, SATA II
COM Express » AMD Embedded N36L Processor » DVI, LVDS Available » Quality Customization & Manufacturing » Supports Dual Channel DDR3 Memory
Mini-ITX Mainboards » AMD Embedded R-Series Quad Core APU » AMD RadeonTM HD7660 GPU, 384 Cores » Supports DDR3 1600 Max. 16GB » 2x DVI, 6x COM, 2x GbE LAN, 4x USB 3.0 » 2x Mini PCI-E(x1), 1x PCI-E(x16), 5x SATA III
OEM / ODM Services Available
1050 Stewart Drive, Sunnyvale CA 94085, USA | Tel: 1-800-292-4500 | Email: info@ibase-usa.com | www.ibase-usa.com Corporate names and trademarks stated herein are the property of thier respective companies. Copyright © 2013 IBASE Technology, Inc. All rights reserved.
The Industry Leader in Embedded Computing
INDUSTRY
INSIDER AUGUST 2013 GizmoSphere Welcomes Newest Partner: Embedded Linux Expert Timesys GizmoSphere, the embedded development community and support center for x86-based open source designs, has gained its newest partner, Timesys Corporation, a specialist in embedded Linux since 1995. Timesys complements its roster of partners, which includes AMD Corporation, Sage Electronic Engineering, Viosoft Corporation and Texas Multicore Technologies (TMT). Partners contribute time, talent and resources to propel GizmoSphere forward and serve a growing community of embedded developers. At the heart of GizmoSphere is Gizmo, an x86-based development board that supplies the power of a supercomputer and the I/O capabilities of a microcontroller. Gizmo is part of the Gizmo Explorer Kit, available through SemiconductorStore.com. With the kit, designers can supercharge their projects, gaining access to the full spectrum of functions provided by the AMD Embedded G-series APU resident on the Gizmo board. What’s more, Gizmo delivers 52.8 GFLOPS for a performance level that far surpasses what many other development boards offer. With the addition of Timesys to the team, GizmoSphere now offers expertise in the embedded Linux arena, which is attractive to design teams targeting Linux as their go-to market platform. Timesys is committed to providing easy-to-use, embedded Linux offerings to support development. As the first organization to develop and maintain a real-time embedded Linux distribution, Timesys is active in many different industries including consumer electronics, medical, automotive and industrial control.
Microsoft Announces General Availability of Windows Embedded Compact 2013
Microsoft has announced the general availability of Windows Embedded Compact 2013. Optimized for building small-footprint industry devices, Windows Embedded Compact 2013 includes powerful new tools and capabilities—including new support for Visual Studio 2012—that extend the experience of Windows and help businesses capitalize on the Internet of Things. The release is the latest generation of one of the smallest and most flexible products in the Windows Embedded portfolio, designed to power devices that need real-time performance and silicon
8
AUGUST 2013 RTC MAGAZINE
flexibility, with support for x86 and ARM architectures. Windows Embedded Compact 2013 is suitable for powering some of the smallest industry devices, such as programmable logic controllers and human-machine interface panels used to monitor processes in manufacturing, RFID scanners in retail environments, and portable ultrasound machines and diagnostic lab equipment in healthcare settings. When these devices are connected via the cloud to backend systems, the resulting intelligent system generates data that can be harnessed and analyzed to provide actionable insight for the enterprise. That data is considered the new currency of business. Users will see some major
improvements to device functionality with Windows Embedded Compact 2013. “Performance was a particular focus in this release,” says Steven Bridgeland, senior product manager, Windows Embedded at Microsoft. “We have spent countless hours optimizing our code to greatly improve system and network performance, making applications feel snappier.” Windows Embedded Compact 2013 features include improvements to the core operating system, such as memory management and networking capabilities, and improved filesystem performance, enabling devices to always be available. Optimized startup features snapshot boot, which allows devices to boot within seconds to a known state, such as a specific UI with device drivers loaded. There is also built-in support for Wi-Fi, cellular and Bluetooth technologies, and a seamless connection to Windows Azure, for a robust, connected intelligent system.
Silicon Labs Acquires Energy Micro
Silicon Labs has announced that it has signed a definitive agreement to acquire Energy Micro. Based in Oslo, Norway, the late-stage privately held company is known for its power-efficient portfolio of 32-bit microcontrollers (MCUs), and is developing multi-protocol wireless RF solutions based on the ARM Cortex-M architecture. Energy Micro’s energy-friendly MCU and radio solutions are designed to enable a broad range of powersensitive applications for the Internet of Things (IoT), smart energy, home automation, security and portable electronics markets. This strategic acquisition accelerates Silicon Labs’ growth opportunities and positions the
company as an innovator in energy-friendly embedded solutions. The growth of the IoT market, coupled with continued deployment of smart grid and smart energy infrastructure, is driving strong demand for energyefficient processing and wireless connectivity technology to enable connected devices in which lowpower capabilities are increasingly important. Predictions are that the number of connected devices for the IoT will top 15 billion nodes by 2015 and reach 50 billion nodes by 2020. Energy Micro’s portfolio complements Silicon Labs’ 32-bit Precision32 MCU, Ember ZigBee and sub-GHz wireless products, and targets a growing embedded market. The acquisition greatly expands Silicon Labs’ MCU portfolio, adding nearly 250 ARM-based EFM32 Gecko MCU products ranging from extreme-low-power, small-footprint MCUs based on the ARM Cortex-M0+ core to higher-performance, energy-friendly MCUs powered by the Cortex-M4 core capable of DSP and floating-point operations. The acquisition is expected to enhance Silicon Labs’ radio portfolio with the addition of Energy Micro’s ultra-lowpower EFR Draco radio products. These versatile wireless transceivers and system-on-chip (SoC) devices will support frequency bands ranging from sub-GHz to 2.4 GHz, and multiple standard and proprietary protocols including Bluetooth Low Energy (LE), 6LoWPAN, ZigBee, RF4CE, 802.15.4(g), KNX, ANT+ and additional protocols.
Industrial Automation Equipment Poised to Grow Stronger in Second Half
The market for industrial automation equipment (IAE) is set to grow 6.2 percent this year to ap-
proximately $170 billion, helped in part by the recovery of global manufacturing in the first quarter, according to a report entitled “The World Market for Industrial Automation Equipment” from IMS Research. The more optimistic outlook for 2013 contrasts with the anaemic market conditions of last year that had been aggravated by the interconnected nature of a globally linked space. Conditions all around have now improved, however. For one, leading indicators—including machinery orders and manufacturing activity—point to increasing demand for industrial products during the next six months. Moreover, progress has been observed in the markets of China, Europe and the United States in the first half this year, fueling confidence that the IAE space is headed toward renewed vigor. In China, manufacturing indices indicate that a slow and steady improvement that began in September 2012 has continued through the first five months of 2013. China is also reporting that inventory replenishment has started to occur in equipment markets of extremely low levels last year. Meanwhile in Europe, increases in German machine-tool orders also point toward bolstered demand that could redound to the IAE market. Signs of economic health are likewise springing up in the United States, where greater growth will help propel demand for industrial automation equipment and boost the IAE global market. Moreover, the global interdependence that hampered market expansion last year will prove to be a boon this time around, especially because growth of the United States in 2013 will not be a key requirement to foster overall expansion of the worldwide IAE space. The U.S. no longer needs to buoy the rest of the world—at least not this year.
New Atomic Clock Technology Could Redefine the Second
Tests on an alternative atomic timekeeper have revealed a more precise method than the cesium atomic clocks we currently use to count the seconds. The device, called an optical lattice clock, lost just one second every 300 million years—making it three times as accurate as current atomic clocks. Planetary motion is a notoriously imprecise way to measure time due to factors such as precession, or the wobble of the Earth’s axis, which can make some days shorter or longer than others. Since the 1960s, the cesiumbased atomic clock has been used to define a second in the International System of Units (SI units). That definition reads “The second is the duration of 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom.” This means a cesium clock is accurate to one second every 100 million years. The optical lattice clock is expected to improve the precision by a factor of three. Current atomic clocks, called cesium fountains, expose clouds of cesium atoms to microwaves to get them to oscillate. But the optical lattice atomic clock uses laser light to excite strontium atoms. Since laser beams have a much higher frequency than microwaves, they can split time into shorter intervals. This could potentially lead to the redefinition of the second. Practical uses include the possibility that new GPS satellites could bring their precision down to a foot.
Green Hills Announces Trusted Mobile Device Partnership with Samsung
Green Hills Software has announced its enhanced partnership with Samsung Electronics
for the deployment of Green Hills Software’s Integrity Multivisor for Trusted Mobile Devices technology. As a member of the Samsung Enterprise Alliance Program (SEAP), Green Hills has ported its mobile virtualization technology to Samsung GALAXY devices, including the popular Samsung GALAXY Note II. The Integrity Multivisor technology adds a foundational layer of data protection and isolation below the mobile operating system and enables a secure, dual-persona BYOD solution that cannot be achieved with application-level sandbox mechanisms. Green Hills Software and Samsung have been working together to enable Samsung mobile devices with Green Hills Software’s separation kernel-based Type-1 hypervisor. The Integrity Multivisor solution for enterprise and government enables mobile device users and administrators to take advantage of off-the-shelf mobile phones and tablets for both personal and business use, with both parties having confidence in security and privacy. Shipping since 2003, the Integrity Multivisor technology is a virtualization solution built upon security-certified separation kernel technology that provides highly assured isolation between personas while also providing a native, open standards-based execution environment for security-critical tasks. The Integrity Multivisor solution consists of the certified Integrity real-time separation kernel technology coupled with facilities to execute one or more “guest” operating systems. The Integrity Multivisor solution also provides a native, open standards-based environment for critical applications, such as security functionality and fast-boot real-time apps. It offers several approaches to virtualization in ARM-powered devices, including paravirtualization, full virtualization with or without hardware hypervisor acceleration.
LDRA Expands Certification Management Market into Russia’s Safety-Critical Industries
LDRA has announced that it has chosen Softline to distribute the LDRA tool suite and LDRA’s Certification Services to Russia and the Commonwealth of Independent States (CIS). Softline boasts a 35% market share in Russia’s software market after 20 years of strong sales, implementation and education in various markets, including safety-critical ones. LDRA’s certification management and software test verification technologies offer new compliance efficiencies and process management to Russia’s safety-critical industries. Together, Softline and LDRA will target standards-based certification for aerospace and defense (KT-178B, DO-178B/C, DO-278), automotive (ISO 26262), rail and industrial safety (IEC 61508), nuclear (IEC 60880) and medical (IEC 62304). Softline plans to take the LDRA software test and verification expertise to its large customer base as well as generate new clients through extensive education and training outreach. With a full range of services for building, optimizing, supporting and developing software and IT solutions, Softline shares LDRA’s focus on end-to-end software technologies. The LDRA tool suite—which includes automated tools that cover the entire development lifecycle from requirements engineering through code compliance, static and dynamic analysis, target testing, and verification—will support the end-to-end software testing requirements of Softline’s clients.
RTC MAGAZINE AUGUST 2013
9
SMALL FORM FACTOR
FORUM Colin McCracken
Designing a Universal Carrier
A
t last. Through an unprecedented spirit of cooperation, the industry has come up with the Holy Grail for computeron-module (COM) users. A true “universal” carrier board that allows system OEM customers to easily qualify multiple COM suppliers. It supports all of the various modules, processors and form factors that OEMs want. It uses all of the popular (though incompatible) connectors and pin assignment “types.” There are more switches on it than a Manhattan telecom central office. The carrier board requires larger fabrication and pick and place equipment, because it’s as big as an ocean-going vessel. So it has earned the name “Aircraft Carrier.” Okay, maybe including every module form factor is impractical. How about even just one? COM Express is popular for a wide range of processor performance within Intel’s and AMD’s mobile and ultra-mobile series roadmap. Perhaps a universal COM Express carrier is feasible. It could use one or both connectors, support all of the various module sizes and mounting holes, and connect the signals that are common to all of the popular pin-out types. Not bad. The board isn’t large, and it has more holes than Swiss cheese in the middle. It’s limited to x86 processors, and most (but not all) of the type 2 power and ground pins are hooked up, along with a few PCI Express lanes and USB ports. The LPC bus can’t be used in order for all modules to have a chance of booting on the first try without BIOS customization fees and minimum order quantities. To be safe, be sure to disable the ceiling smoke detectors when powering up the second module. On a more serious note, COM Express carrier boards have indeed become much easier to design in the past few years, thanks to a rare free publication from PICMG called the Carrier Design Guide. A painstaking effort by multiple suppliers who knew they had to cooperate for the survival of the standard resulted in this very thorough document that contains circuit examples for terminating most module interfaces. You might need to ask a board supplier for the Compact size (95 x 95 mm) mounting-hole positions relative to the carrier board connectors
10
AUGUST 2013 RTC MAGAZINE
and board outline, since the document predates R2.0 and R2.1 of the module specification. As far as selecting a pin assignment type, resist the temptation to select based solely on I/O required by that application. For example, you may want to connect to Compact Flash, LVDS or SDVO for display, PCI for serial ports, etc. If you want to maximize the “life” of the carrier board before re-spinning it, such as for medical and mil/aero applications, consider type 6 modules instead of type 2, even if it means adding circuitry to the carrier board. It seems counterintuitive at first, but nearly all new modules introduced starting in June and going forward with performance that scales up to Core i7 are only available with type 6 pin-out. Finally, I/O that is considered “legacy” by the desktop/laptop PC world must be attached to one of the module interfaces; select chips according to device driver availability. For OEMs who are focused on the low-power SoCs (< 10 watts), some form factors like Qseven have supported ARM and x86 processors for years. It’s important to review user manuals to confirm not only which features are populated, but what port numbers are provided on what pins. For deeply embedded headless designs or applications with modest graphics requirements running Linux or certain Windows variants, such modules are commonplace now and are based on free open standards, unlike their proprietary counterparts from 10 years ago. Although a true universal carrier board remains impractical, a “close enough” carrier board is fairly straightforward for system OEMs and third parties to design. OEMs who are brand new to COMs can celebrate their blissful ignorance of how messy it was to wallow through the mire of specification revisions and the shedding of parallel interfaces to arrive at the current form factors and pin-out types. A ground-up new design just takes enough patience to pour through the documentation of every single module you want to support, and the willingness to add circuitry to the carrier as needed after the lowest common denominator of module features is determined.
4th Generation ™ Intel Core i7 from ®
VPX · cPCI · VME · XMC · COM Express · Custom High-performance, rugged, and versatile 4th Generation Intel® Core™ i7 solutions from X-ES
Extreme Engineering Solutions 608.833.1155 www.xes-inc.com
100% Designed, manufactured, and supported in the USA
editorâ&#x20AC;&#x2122;s report Human Interfaces for the Internet of Things
M2M Meets Web Applications Spawning the Internet of Things The Internet of Things is growing rapidly. Managing the billions of devices is a challenge that must meet the expectations of users who have come to expect rich, graphical human interfaces via web browsers. by Tom Williams, Editor-in-Chief
W
ith all the talk, and with all the very real activity around the Internet of Things, we hear of numbers like 50 billion devices connected to the Internet. The possible applications are equally vast, targeting such things as efficient building control, industrial controls, military devices, medical instruments, smart consumer applications, transportation, environmental monitoring and more. A large portion of these 50 billion will be small and dedicated to a limited number of functions individually. Collectively, however, they will span huge applications such as those mentioned and generate vast amounts of data that are coming to be known as â&#x20AC;&#x153;Big Data.â&#x20AC;? That Big Data eventually ends up on servers and server farms in the Cloud where it can be analyzed, combined and used for ap-
12
AUGUST 2013 2013 RTC RTCMAGAZINE MAGAZINE
plications we may not have yet imagined. What we are really getting with the Internet of Things is the foundation of what is coming to be known as Intelligent Systems, where devices communicate with each other mostly autonomously and yet their functions serve human ends, so human operators and consumers must interact with these systems in some manner. Since that interaction takes place via the Internet, it is natural that they are accessed through browsers. And increasingly, Internet access for things and people takes place with tablets and smartphones with their touch screen browsers. So how does all this work with a universe of small M2M devices that also must offer human access? It should come as no surprise that humans require more resources to interact with devices and their applications than
machines do when they simply communicate with one another. In other words, as Wilfred Nilsen, CEO of Real Time Logic, points out, to have meaningful interaction with an application, you need more than simple access to static pages, which is what you get with a simple web server. A simple web server, such as the well-known Apache, is really just an HTTP protocol stack that can access static, pre-defined web pages. But web servers like Apache can be enhanced with plug-ins and components to add functionality.
Getting to Rich Human-Machine Interfaces
Now of course most web sites provide more than just static pages, so there is some underlying application that dynamically creates pages in response to some user input. For embedded devices, that underlying application would mostly be some sort of control program that can execute input commands and return data about the status, etc. And of course, that added functionality requires resources in the form of processor power and memory. Real Time Logic has concentrated its efforts at providing small, compact web servers for embedded devices with its Barracuda line. What it calls its Barracuda web server is actually more than just a simple HTTP protocol stack as described above, but it is aimed at requiring minimal resources. The Barracuda web server manages secure HTTP connections for Machine-to-Machine (M2M) communication and Human-to-Machine (H2M) interfaces. With C/C++ Server Pages (CSP) included, it delivers dynamic web applications, enabling live updating of secure data by authenticated connection (Figure 1). The goal here is to provide access and interaction with a minimum footprint. The Barracuda web server is about 200 Kbytes and requires around another 60 Kbytes. The software development kit (SDK) provides a number of host tools that compile and link CSP files. The tools function similarly to a compiler or crosscompiler, and convert the C Server Pages files to either C or C++ code and to data files. A linker combines all the data files into one file, which is then embedded in the application. The produced C/C++
editor’s report
Your C or C++ Application
CSP Page
Barracuda Web Server Thread Pool
RTOS
HTTP Engine
TCP/IP Stack
Figure 1 The Barracuda Web Server is an industrial-strength, small embeddable web server engine that is optimized for compact, deeply embedded devices.
Customer’s LSP Application
Customer’s C or C++ Application CSP Page
Lua Server Pages (LSP) Barracuda Application Server
Barracuda Web Server Thread HTTP Pool Engine
Other Plugins
WebDAV, EventHandler, Web Services, etc.
SharkSSL
Kernel (OS/RTOS)
TCP/IP Stack
Lua Virtual Machine
NetIo
File System (I/O Interface)
ZipIo Flash Memory (Optional)
SQLite Database (Optional)
DiskIo
File System (Optional)
Figure 2 The full Barracuda Application Server is a C source library that lets designers access their own proprietary C functions from Lua code. It includes a rich selection of components and plug-ins.
code is compiled using your standard C/ C++ (cross) compiler and the code is then linked with the application. There are, of course, other ways to connect with embedded and networked systems from an IT environment. For example, there are dedicated programs that communicate with embedded devices using TCP/IP over the Internet because the developers may want interactive graphically rich user interfaces (e.g., dials, gauges, switches, displays) from a number of small remote devices. The trouble here is that such programs must be developed from scratch and cannot take advantage of many of the benefits of a web-based
approach by piggybacking on the existing infrastructure. Of course, that in turn requires more resources on the embedded device to enable a rich human interface (Figure 2). The Barracuda Application Server is an embeddable C source code library that builds on the web server to allow rich graphical applications and human-machine interfaces for interaction with embedded code. Since it is a C library, it can be compiled and linked to an embedded C application on the device. User interaction takes place via pages created with the Lua scripting language that interact with the C application via its API. These Lua server
pages (LSPs) also interact with other components and plug-ins such as database, I/O and others. According to Nilsen, the LSP pages are Lua code that are accessed via the HTTP engine, which will parse a request and then access that page. For example, after receiving “turn engine on” it will parse that page—the Lua code—on the fly and execute it using the Lua virtual machine, which interprets the page and can send a command to the C application and return the results to the user. The interfaces between the scripting code and the C side are “Lua bindings” that enable the scripting language to call functions in the C code. RTC MAGAZINE AUGUST 2013
13
editor’s report
Connect https://company.com/path/2/web-service/
Application SharkSSL
Web Service
Persistent connection
TCP/IP
Server Figure 3 The RTL M2M approach requires minimum resources for a microcontroller to exchange data with other devices and with an application server.
Using a scripting language like Lua at this level, Nilsen notes, greatly simplifies development. Of course, the underlying application that will often sit on top of an RTOS such as Green Hills Integrity or Wind River’s VxWorks will be written in C because of the need for detailed control of the underlying hardware. Once that is done, the use of a scripting language like Lua lets developers work at a higher level to make the application on the remote device easily available via a browser. And browsers themselves are relatively simple, so the device should be accessible from PCs, Macs and Android or iOS-based tablets and smartphones. That is what users increasingly expect and demand.
Rich Interaction with Tiny Devices
The options for developers, however, should not be a choice between a simple web interface on a resource-limited device or a rich interface on a larger, more powerful device. There are, after all, these millions of small devices that are collectively doing all this important stuff. We want rich interaction with them as well. For interaction with a microcontroller such as an Intel Atom or an ARM CortexA4, you certainly can’t embed an application server, but you can rely on a combina-
14
AUGUST 2013 RTC MAGAZINE
tion of M2M communication among small devices and a small, dedicated server with the resources to run the Barracuda Application Server. The classic M2M design uses standard SOAP/XML web services. But a SOAP stack with its XML parser is often too big for a microcontroller. Even the HTTP engine required by the web server may be too big for a microcontroller’s internal memory. A microcontroller can communicate with a specialized online web service by using secure communication managed with a TCP/IP stack and a secure socket layer (SSL) client stack, in this case Real Time Logic’s SharkSSL. The added benefit of the approach used by RTL is that the need for an HTTP protocol stack is eliminated because the device connects to the specialized web service by sending an initial HTTP header that is then morphed into a persistent socket connection as soon as the connection is established with the server. Any data sent over the persistent connection is encrypted by the SSL stack (Figure 3). With this approach, small microcontroller-based devices can communicate with each other by simply exchanging data once the connection is established. They can also communicate with the application server running on a small, low-cost
but resource richer platform. On the one side they don’t even need to be on the Internet, but simply on a local Ethernet connected to a port on the server device. That server device then is connected to the Internet where its pages can be accessed via browsers from anywhere. Such a server can connect to potentially hundreds or thousands of small distributed devices. There are many single board computers on the market that can easily fulfill these requirements. How those devices with their embedded applications are managed is then entirely a matter of the application on the server, which is accessed from a normal browser. They can, for example, be configured or updated collectively, be selected from lists or by defined groups, or even individually since all will have their own IPv6 IP address. The approach of using an intermediate server is also costeffective because only one platform needs to run the full application server. Still, all the options are available. Real Time Logic Monarch Beach, CA. (949) 388-1314. [www.realtimelogic.com]
Technology in
context ARM-Based Modules
SMARC Specification Delivers Standardized Module Building Blocks for Ultra-Low-Power Mobile Connected Applications A new COM-like modular specification offers a path to increasing designs of low-power and mobile systems primarily oriented around the ARM architecture. by Norbert Hauser, Kontron
T
he requirements of today’s new range of smart connected tablet and mobile tool applications go beyond technology and power specifications to also include rugged extended lifecycle product support. These smaller portable systems challenge embedded designers with space constraints and fully sealed fanless enclosures that must operate reliably over extended periods. Designers of ultra-low-power systems have found that existing standards and higher power consumption processor architectures were not an ideal fit for these types of applications. Therefore, they have been searching for focused standards and solutions that are specifically designed to support the unique combination of computing performance, small space, power and interface requirements. The ARM processor architecture fulfills these application demands with processors that are small in size and height, do not require a chipset, and offer long product life up to 15 years. In addition, ARM offers simplified passive cooling and thermal management to ensure higher system reliability that also provides an optimized platform for higher density systems. What has been
16
AUGUST 2013 RTC MAGAZINE
needed is a strong ecosystem of ARMbased hardware and software suppliers that can streamline the development of ARM and SoC subsystems in low profile designs. Answering the call for more focused subsystem resources, a new vendor-independent standards organization has been formed. The goal of the new Standardization Group for Embedded Technologies (SGET) is to help speed development of standardized hardware and software solutions for embedded computing. The first SGET working group created under SGET has ratified a versatile, small and ultra-low-power Computer-on-Module standard that it has named SMARC for “Smart Mobility ARChitecture.” Kontron played a leading role in the development of the specification, which had the working title ULP-COM. The SMARC specification brings standardized ARM/SoC-based miniature format building blocks as a welcome solution to fill a very significant gap in the embedded market.
Key SMARC Features
The SMARC specification is characterized by its extremely flat form fac-
tor dimensions that are as low profile as 1.5 mm from the top of the carrier board to the bottom of the module. It features an optimized pin-out for SoC processors that uses a 314-pin connector with a height of just 4.3 millimeters (the MXM 3.0) that is pitched at a 0.5 mm right angle. This robust, vibration-resistant connection method was defined to specifically meet smaller mobile/portable system design needs for very lowprofile, high-performance, rugged and cost-effective modules. Matching various space-constrained needs, SMARC defines two module sizes: 82 mm x 50 mm and 82 mm x 80 mm. And solving the portable systems power limitations, the SMARC module power envelope is typically under 6W during active operation to deliver fanless and passive cooling. SGET initially defined the SMARC module standard for ARM SoCs as this processor architecture has become popular, and thus familiar, for tablet computer and smartphone designs. The specification, however, is designed to be flexible enough to accommodate alternative low-power tablet-oriented x86 or RISC processors and SoCs and CPUs,
Industrial ARM® Single Board Computers High-Performance Graphics with Industrial I/O and Expansion -40° to +85°C Operating Temperature Designed for demanding applications and longterm availability, WinSystems’ SBC35-C398 single board computers feature Freescale i.MX 6 industrial application processors with options for expansion and customization.
Features • ARM Cortex™-A9 Processors; Quad, Dual, or Single Core • Multiple Graphics Interfaces • Wide Range DC or PoE Power Input • Gigabit Ethernet with IEEE-1588™ • USB 2.0 Ports and USB On-The-Go • Dual FlexCAN Ports • Multiple Storage Options • Mini-PCIe and IO60 Expansion • Linux and Android™ Supported
Call 817-274-7553 Ask about our product evaluation program.
Learn more at www.WinSystems.com/ARMR 715 Stadium Drive • Arlington, Texas 76011 Phone 817-274-7553 • FAX 817-548-1358 E-mail info@winsystems.com WinSystems® is a registered trademark of WinSystems, Inc. Freescale and the Freescale logo are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Android is a trademark of Google Inc. The Android robot is reproduced from work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.
Scan this tag to read more about our ARM SBCs.
technology in context
Figure 1 Designed to help ultra-low power SFF system OEMs drive down the cost of development, the Kontron SMARC-sAT3874 features an extremely low 3W TDP (Thermal Design Power) and extended operating temperature range of -40°C to +85°C making this SMARC module ideal for space-constrained, fanless and harsh environment applications. The Kontron SMARC-sAT3874 offers a costeffective and highly scalable COMs building block that can be used by a broad range of SFF systems. ULP-COM port overview Pin Group
Pin Count
Description / Primary Function
Alternate Function
Parallel LCD
28
Primary Display: 24 bit parallel RGB data
–
LVDS LCD
10
Primary Display: Single channel 18 / 24 bit LVDS data
eDP
Panel and backlight enable, PWM, dual pixel clock
–
LCD Support
4
HDMI
12
Secondary Display: HDMI
–
CAMO
7
Camera Input: CSI 2 lane
–
GBE
13
Gigabit Ethernet
–
PCIe
27
3 PCIe x 1 ports with supporting signals
–
USB
11
3 ports, one is OTG (client or host); other 2 host only
–
SATA
5
1 port (may be boot device)
–
SDIO
9
1 port 4 bit
–
eMMC
11
1 port 8 bit (may be boot device)
–
2 ports (one of the two may be a boot device)
–
SPI
9
I2S
16
4 ports
–
2
1 port
–
I2C
10
5 ports
–
Serial
12
4 ports (two 2 wire and two 4 wire)
–
2 ports
–
General Purpose I/O
–
SPDIF
CAN
4
GPIO
12
Boot Sel
3
Boot device select pins
–
WDT
1
Watch Dog Timer output
–
MISC
15
Power management pins
–
RSVD
9
Reserved pins - for future use
–
RSVD / DSI
8
Reserved pins - grouped for possible DSI interface
–
RSVD / SS
5
Reserved pins - grouped for possible USB SS interface
–
Power
11
10 pins for Module input power; 1 for RTC
–
GND
46
Grounds - circa 15% of total pins
–
Pin Total
314
–
Figure 2 The SGET SMARC standard offers new interface support particularly suitable for ARM architecture-based platforms dedicated camera interfaces and video display LVDS outputs.
18
AUGUST 2013 RTC MAGAZINE
such as tablet-oriented x86 devices, and other RISC CPUs may be used as well. To maintain low costs, low power and small physical size, the SMARC specification integrates the core CPU and support circuits, including DRAM, boot flash, power sequencing, CPU power supplies, Gigabit Ethernet and a single channel LVDS display transmitter together on the module. SMARC modules are designed to be used with application-specific carrier boards that implement other features such as audio CODECs, touch controllers and wireless devices. The advantage of a modular approach is that it delivers flexible building blocks that provide OEMs with scalable technology and upgradeability that results in faster time-to-market while still maintaining low costs, low power and small physical size (Figure 1).
How SMARC Differs from COM Express
Prior to the SGET SMARC specification, previously defined computer module specifications were primarily based on x86 technology and its associated chipsets. Standards defined for x86 were developed mostly with PC designs in mind that required support for chipset-compatible interfaces. A prime example is the COM Express standard, which is optimized for PC-based applications and hugely successful. The COM Express feature aligns very well to PC chipsets and offers broad support for USB, multiple PCI Express (PCIe) lanes, PCI Express Graphics, the LPC bus and the PCI bus. Additionally, based on PC power requirements, the COM Express definition offers power pins supporting more than 100W. On the other hand, SMARC targets lower power, smaller form factor systems. The SMARC pin-out is optimized for the features common to ARM CPUs that may not be required for PCs. These features include parallel LCD display interfaces, serial and parallel camera configurations, multiple I 2C, I 2S and serial port options, and USB operation/ support signals. COM Express and SMARC modules do share some features in common including a limited number of PCIe,
technology in context
SATA and USB ports. However, how these features are combined is different due to the specialized features of interest that need to be utilized on a SMARC module.
ARM-Based Interface Support
Giving designers a standardized feature set specifically matched to ARM I/Os, the SMARC specification supports cost-effective Parallel TFT display bus and MIPI display interfaces. SMARC modules also offer additional display interface support with full featured implementation of HDMI ports, 24-bit parallel RGB LCD data and control signals, single channel LVDS LCD 18/24-bit, and panel support signals (I 2C, Power Enables, PWM). Dual channel LVDS support is a needed feature to drive high-resolution panels, and SMARC accommodates a second LVDS channel to be implemented on the carrier board. Embedded DisplayPort is planned for future designs. An industry first for module stan-
dardization, SMARC offers camera interfaces with consumer camera and phone memory cards supported by multiple Serial Peripheral Interface (SPI) links and SDIO (Secure Digital I/O) interfaces. SMARC also supports general purpose I/O with 12 GPIO signals, CAN error signaling, HD audio reset and PWM/Tachometer capabilities. Because real-time Ethernet and Fieldbus implementations typically use a number of LEDs to indicate the system status, the SMARC specification selects four GPIOs to support this LED functionality (Figure 2)
Resources to Streamline Development
An important aspect of SoC-based hardware is that it requires a different design approach to address a distinctive I/O mix. Beside the flexibility and scalability a module approach provides, designing with high-end ARM processors brings an element of complexity and design risks. That is because next-generation
ARM processors integrate high-speed memory and I/O buses, which need complex routing. More and more customers are now looking for development and validation resources to mitigate design challenges and allow them to just focus on their dedicated application I/O development. Opting for a module design approach delivers cost benefits especially for higher volume applications compared to a dedicated SBC design. For a supplier to be a valuable partner in helping OEMs wade through ARM-based embedded design complexities, they need to have proven capabilities in leveraging existing standards and the ability to simplify hardware integration by offering standardized building blocks that enable leaner development schedules. SMARC modules offer building blocks in the development of ultra-lowpower applications. Design resources in the form of a complete SoC solution are available that include the carrier board, firmware and drivers, and operating sys-
Solid or Spin... we go both ways
Bridge the gap between ARM and x86 with Qseven Computer-on-Modules
Ruggedized VPX Drive Storage Module Whatever your drive mount criteria, everyone knows the reputation, value and endurance of Phoenix products. The new VP1-250X, compatible with both solid state or rotating drives, has direct point-to-point connectivity or uses the PCI Express interface with the on-board SATA controller. f controlle It is available in conduction cooled, conduction with REDI covers (VITA 48) and air cooled conďŹ gurations. conďŹ guration Leading the way in rugged COTS data stortechn age technology for decades, Phoenix keeps you on the leading edge with very innovative products!
One carrierboard can be equipped with FreescaleÂŽ ARM, IntelÂŽ Atomâ&#x201E;˘ or AMDÂŽ G-Series processor-based Qseven Computer-on-Modules.
We Put the State of Art to Work
conga-QMX6
conga-QA6
conga-QAF
ARM Quad Core
IntelÂŽ Atomâ&#x201E;˘
AMDÂŽ G-Series
Ä&#x161;Ä&#x161;Ĺ?Ć&#x;ŽŜÄ&#x201A;ĹŻ Ä&#x161;Ä&#x17E;Ć&#x161;Ä&#x201A;Ĺ?ĹŻĆ? Ä&#x201A;Ć&#x161;Í&#x2014; www.congatec.us XXX QIFOYJOU DPN t 714-283-4800 PHOENIX INTERNATIONAL IS AS 9100 REV C / ISO 9001: 2008 CERTIFIED
Untitled-1 1
congatec, Inc. 6262 Ferris Square | San Diego | CA 92121 USA | Phone 1-858-457-2600 | sales-us@congatec.com
2/28/13 9:52Untitled-4 AM 1
19
RTC MAGAZINE AUGUST 20138/14/13
2:18 PM
technology in context
Figure 3 Kontron’s new ready-to-use SMARC Starterkit includes all the cables and necessary components, including a display and power supply. The SMARC Starterkit can be delivered with the pre-installed SMARC module of choice, operating system, a Board Support Package and cooling solution, which allows developers to immediately evaluate their application.
tem. Offering multiple layers that make up a complete ARM solution provides time-to-market development benefits and added value for OEMs SMARC evaluation carrier boards eliminate some design complexities to allow application developers to get up and running quickly on the SMARC modular platform. Evaluation carrier boards let OEMs install the SMARC module best suited to their application needs, includes support for a broad range of interface options, and offers dual power options for mobile and stationary applications (Figure 3). There is also a new SGET Design Guide for SMARC, which further helps to facilitate SMARC carrier designs, thus accelerating their time-to-market and helping OEMs achieve an advantage in the competitive mobile application market. The Design Guide provides routing guidelines, reference schematics and useful development recommendations for SMARC carrier design, enabling OEMs to save significant time generating their own carrier board. There are many factors to consider when deciding on a Computer-on-Module/baseboard approach versus a full custom SBC design. This crucial decision depends upon many factors includ-
20
AUGUST 2013 RTC MAGAZINE
ing an organization’s business model and customer requirements.
SMARC: A Specification for the Future
With scalable SMARC module building blocks, designers now have a standardized solution to help them achieve their performance/power ratio requirements. And, performance in SMARC modules is forecasted to further increase with future technical progress. This also translates into a more cost-effective design approach that permits portable and fully enclosed systems to have a competitive price. Additionally, the lower power consumption from ARM-based solutions supports simplified cooling methods that significantly contribute to lower costs from a less-complicated mechanical design that results in decreased assembly requirements and higher reliability because of a fanless design. While these requirements may previously have had solutions, it would have required additional design compromises, time and resources. SMARC modules promise standard solutions for a diverse range of applications—from industrial automation to graphics and image-centric devices
that must operate at extremely low energy consumption and withstand severe environmental conditions. Other harsh condition small form factor systems can also benefit from SMARC feature advantages such as applications in the military, digital signage and medical markets. In addition, the SMARC specification offers support for Linux, Windows Embedded CE, VxWorks and a variety of real-time operating systems, so embedded OEMs are able to leverage a growing and strong ecosystem of development partners. There continues to be increased embedded system OEM market acceptance and favorable customer reviews of the connector definition, which encourages more Computer-on-Module suppliers to support this new standard, resulting in a comprehensive ecosystem for ultra-flat ARM/SoC-based Computeron-Modules in miniature format. The specification is available free of charge on the SGET website. Kontron Poway, CA. (888) 294-4558. [www.kontron.com]. SGET Munich, Germany. [www.sget.org].
Intelligent Solutions Finder Search, Compare & Purchase Intel-Based Boards & Systems The quickest way to locate and purchase boards for all your specic needs • Purchase the perfect solution directly from your search results • Compare a myriad of solutions across a variety of ecosystem partners • Search across more and more components and solutions as you build intelligent systems
intel.intelligentsystemssource.com brought to you by
RTC MAGAZINE AUGUST 2013
21
Technology
connected Advances in Compiler Technology
Compiler Optimization: Getting the Most out of High-Level Code Given the size of todayâ&#x20AC;&#x2122;s projects, it is now imperative to write code in a high-level language, specifically C. But that makes code optimization all the more desirable. Here are some of the techniques and technologies of how a compiler optimizes your code for minimal footprint, highest performance, or in an optimized combination. by Shawn A. Prestridge, Mentor Graphics
M
icrocontrollers used to be very simple. They had no pipelining, a limited set of registers and the only peripherals were I/O ports that the hardware designer tied to other pieces of hardware to make them work. As such, writing assembly language code was a relatively straightforward task. These days, architectures can have multistage pipelines, banked registers and many on-chip peripherals. Because of the rising complexity of the devices, C has become the language of choice to write software for microcontrollers. But can a compiler generate code as efficient as a human can with an assembler? Assuming the individual has unlimited time, the answer is no. However, in real-world conditions where you must meet schedules and achieve faster time-to-market, a compiler can generate code far more efficiently than any human can. Conceptually, the operation of a compiler is simple. It can take C source code and compile it into object code. The object will later be linked together by a linker into an executable, but most
22
AUGUST 2013 RTC MAGAZINE
C Source Compiler
Parser Intermediate Code
High-Level Optimizer
Code Generator Low-Level Optimizer
Target Code
LDR LDR SUB STR
R0,x R1,[R0,#0] R1,#15 R1,[R0,#0]
Assembler Object Code
Figure 1 The compiler performs several levels of processing to turn the C source code into object code.
optimizations for the code are performed by the compiler. The compiler has several stages of processing that it performs in order to turn your source
code into object code (Figure 1). The first stage runs the source code through a parser, which parses the C statements into a binary tree. The result of this
technology connected
Use cheaper operation
a=b*2
a=b+b
Find common subexpressions
a=b+c*d e=f+c*d
temp=c*d a=b+temp e=f+temp
Propagate Constant Values
a=17 b=56+2*a
a=17 b=90
Remove useless computations
a=b*c+k a=k+7
a=b*c+k a=k+7
Figure 2 High-level optimizations seek to use more efficient C constructs without altering the logic or control flow of the application.
parsing is referred to as “intermediate code.” The first stage of optimization is performed on this intermediate code by the high-level optimizer (HLO). The HLO analyzes the code and performs transformations based upon C language constructs, so no target-specific optimizations are performed by the HLO. Even though the IAR Embedded Workbench products support over 30 different architectures, a large portion of the code in the compiler is the same from one IAR Embedded Workbench to another because the parser and HLO are the same. After the HLO optimizes the intermediate code, the code generator translates the optimized intermediate code into target-specific code. This target code is then optimized by the lowlevel optimizer, which performs architecture-specific optimizations. The optimized target code is then transformed into object code by a compiler-internal assembler. Optimization takes place in three phases: analysis, transformation and
placement. The analysis portion of optimization tries to understand the intention of the source code that you wrote so that it can make intelligent decisions about how to transform your source code into more efficient C language constructions while preserving the original meaning of the code. These transformations are based on heuristics and generally lead to much tighter code. The compiler also performs register allocation, which is a key part of producing efficient code. Register allocation decides which variables should be located in registers rather than being in RAM. Having variables in a register allows you to quickly perform mathematical operations on them without having to read or write them from RAM. The problem is that the microcontroller only has a limited number of registers to hold these variables, so the code has to be analyzed carefully. The analysis is split into two parts: control flow and data flow. The control flow analysis is performed first and it is the basis for
the data flow analysis. Control flow analysis detects loops, optimizes jumps and finds “unreachable” code. The data flow analysis finds constant values, useless computations and “dead” code. The difference between unreachable and dead code is that unreachable code cannot be executed based on the code structure while dead code cannot be reached based on the value of variables. The second stage of optimization is transformation. There are two different levels of transformation, high-level (which is architecture-independent) and low-level (which takes advantage of the facilities provided to it by the architecture). In Figure 2, we see some of the high-level transformations that can occur in the code. The first transformation is called “strength reduction” and aims to use an operation with fewer instructions and/or MCU cycles. The other transformations in Figure 2 seek to eliminate code that is either redundant (common subexpression elimination) or unnecessary (constant folding RTC MAGAZINE AUGUST 2013
23
technology connected
Move code out of loops
for(i=0;i<10;i++) { b = k * c; p[i] = b; }
/* copy 20 elements */ for (i=0;i<20;++i) { a[i]=b[i]; }
b = k * c; for(i=0;i<10;i++) { p[i] = b; }
/* unrolled four times */ for (i=0;i<20;i+=4) { a[i]=b[i]; a[i+1]=b[i+1]; a[i+2]=b[i+2]; a[i+3]=b[i+3]; }
Figure 3 Loop transformations performed by the High-Level Optimizer are used to dramatically speed up loop-based code.
and useless computations). Loop transformations are also performed by the high-level optimizer and can be found in Figure 3. The first transformation in this figure is referred to as “loop-invariant code motion” and seeks to move code that is not impacted by the loop operations outside of the loop (as the name of the transformation implies). The second transformation is called “loop unrolling” and is used to amortize the overhead of the test-andbranch conditions associated with the loop at the expense of slightly larger code. Lastly, the high-level optimizer makes decisions about whether or not to inline a function call based upon the number of times the function is called and the size of the code contained within the function. Function calls are very costly partially due to the branch instructions needed to jump to a function and return from it, but mostly because of the overhead that the microcontroller’s application binary interface (ABI) enforces on the compiler. This ABI requires that certain registers are preserved across function calls, so ev-
24
AUGUST 2013 RTC MAGAZINE
ery function call must be preceded by a push of those registers to the stack and followed by a corresponding pop of the registers back off the stack to save the context. If the function’s code is inlined, this overhead is eliminated and the function runs faster (and is sometime smaller!) than if the function is actually called. Inlining gives you the functionality of a macro, but makes the code type-safe. The low-level optimizer (LLO) uses the instruction set of the underlying architecture to find ways to optimize the code. The LLO examines the target code to find places where the architecture can accomplish the goal with a small series of assembler instructions. Figure 4 illustrates two such constructs that can be reduced to just a few fastexecuting instructions. The LLO also looks at register allocation to decide which variables should be located in registers. Although this allocation is normally not considered an optimization per se, it has a dramatic effect on how fast the resulting code can execute since operations can be performed di-
rectly on the data in the register rather than having to first read the value from some other memory source. The LLO also decides where to place the code and data using a technique that is referred to as “static clustering,” which collects the global and static variables into one place. This has two important benefits: it allows the compiler to use the same base pointer for many memory accesses and it eliminates alignment gaps between the memory elements. There are limits to the optimization that can be performed. For example, common subexpression elimination can only be applied to parts of expressions not involving functions. The reason is that function calls may have side effects that cannot be determined at compiletime, therefore the compiler must play it safe and preserve all function calls. If the function is inlined, however, the compiler can more effectively examine the code and do common subexpression elimination to avoid unnecessary computations with the added benefit of avoiding needless function calls. The C language provides for the concept
technology connected
Keil MDK-ARM x = (x>>n)|(x<<(32-n))
MDK-ARM™ is the complete development environment for ARM® Cortex™-M series microcontrollers.
MOV R0,R0,R0R R2
www.keil.com/mdk
if((x & 0x03) !=0) x >>=2;
TST R0,#+0X3 MOVNE R0,R0,LSR #+2
Figure 4 The Low-Level Optimizer uses the microcontroller architecture’s instruction set to optimize target code into tightly-executing blocks of assembly.
of separate compilation units, which means that source code files in the project can be compiled individually. While this is indeed a very handy feature for writing source files that are separated into common groups, it has the unfortunate side effect that the compiler may not be aware of what is happening in other source files, which causes the compiler to generate extra code in order to be conservative in its assumptions. This is particularly true if you are calling small functions that are defined in other pieces of source code. The IAR Embedded Workbench has a unique feature that allows you to choose “Multifile compilation” where the compiler treats several pieces of source code as one monolithic piece of code so that the compiler has greater visibility into what the code is doing and can therefore make better decisions about how to optimize the code effectively. IAR Embedded Workbench allows you to control these optimizations at several different levels to give you optimum granularity in your code development. The project-level setting is a global setting that becomes the default for all files in the project. Several pieces of source code can be contained within a group and that group can override the inherited optimization settings. Similarly, optimization can be overridden at the file level or even at the function
level by the use of pragma directives. Additionally, optimization can have different goals for the compiler to achieve: size, speed or a balanced approach. As the names of the first two imply, the compiler will optimize purely for size or speed, respectively. When you use the balanced setting, the compiler tries to strike a healthy balance between size and speed, sometimes giving a little on one to achieve a little of the other. Moreover, IAR Embedded Workbench products also allow control over which transformations are applied to the code so that you can get exactly what you need. Embedded compilers have evolved greatly over the last thirty years, especially as it pertains to their optimization capabilities. Many years ago, developers had to be very careful to structure their C code in such a way that it could be easily optimized by the compiler. However, modern compilers employ many different techniques to produce very tight and efficient code so that you can focus on writing your source in a clear, logical and concise manner.
ARM DS-5 DS-5™ Professional is a full featured development solution for all ARM Powered® platforms.
www.arm.com/ds5
IAR Systems Uppsala, Sweden. +46 18 16 78 00. [www.iar.com].
+1 800 348 8051 © ARM Ltd. AD364 | 01.13
Untitled-5 1
RTC MAGAZINE AUGUST 2013
25
1/31/13 11:57 AM
technology in
systems
Optimizing Size, Weight and Power
Getting the Heat Out of Ever Smaller Systems As systems shrink in size and gain in computing power, all the techniques in the book are needed to handle the heat. Interestingly, there are design techniques that can increase efficiency and reliability, while reducing system size and increasing performance. by Mike Jones, Adlink Technology
D
emand for performance in computer systems continues to grow, and the embedded systems used in mobile or exposed applications are no exception. What is exceptional in these applications is the environment in which they operate. When temperatures are high, or weight and size are costly, cooling these computer systems becomes more difficult. In small form factor (SFF) designs, building systems that focus on external rather than internal standardization allows for more flexibility in achieving these thermal efficiencies. Better cooling can be achieved by using the latest, lower power processing components and by orienting them, as well as other heat-generating components, so they connect directly to the dissipating surfaceâ&#x20AC;&#x201D;and paradoxically, by reducing system size. These mechanical steps enable lowered thermal resistance via shorter paths and fewer thermal junctions.
Smaller, Faster . . . Hotter
Onboard processors in transportation systems for control and public
26
AUGUST 2013 RTC MAGAZINE
safety, and roadside installations of monitoring equipment as well as unmanned vehicles for commercial, industrial, and military applications, all have increasing demands for wide, high-speed data, more and higher resolution sensors and greater processing power. The ability of these platforms to provide the cooling necessary for reliable high-performance operation is critical. Unmanned military vehicles are a good example of the need to improve computer system efficiency. These vehicles can be much smaller, so the systems that comprise their capabilities need to be smaller, too. Systems need to provide connectivity as well as the wide, high-speed I/O required to support visible spectrum and infrared (IR) cameras, radar and other fast, high-definition sensors. At the heart of these systems is the processing power (CPUs, GPGPUs, FPGAs) required to process that data for object detection, classification and tracking. In applications like these, every bit of weight and volume that can be removed has the potential to improve the range, capabilities or cost of a deployed unit.
To this end, engineers must consider the function of every cubic centimeter of space and each gram of weight. All unnecessary space should be squeezed from the system to reduce its size, and any unnecessary mass should be eliminated. Because dissipation of the thermal energy these systems produce requires space, making the electronics portion of a system smaller leaves more room available for thermal dissipation. At the same time, there are standards that define the entire scope of system architectures. These standards provide uniformity and modularity at various levels of a system. They describe the size and shape of internal components, as well as the outer size and shape of the system itself. All of these requirements put pressure on the system designer to improve efficiency. Considering what improvements can be made to keep up with increasing performance demands in smaller platforms while keeping the benefits of physical modularity for serviceability, would seem to be a tall order, but we find that there are multiple opportunities to improve efficiency.
tech in systems
Elements of “SWaP2C2E2R”
Cost
Power
Weight
Size
Electrical Efficiency
Cooling
Reliability
Performance
Thermal Efficiency Figure 1
In SWaP2C2E2R, electrical efficiency, thermal efficiency and size are the true basis for the equation, with all remaining properties derived from these three core elements.
System size and weight can be reduced while improving thermal efficiency. Standards from legacy systems—which offer no advantage in modern applications, yet still have a cost in size and weight—can be replaced by more efficient versions to better align with current requirements, allowing for more efficient system design.
Smaller Size Demands More Efficiency
The increasing pressure for lower COTS cost and the continuing desire to reduce size, weight, and power comes without a reduction in the importance of reliability or performance. Taken as a whole, the array of design criteria (SWaP 2C2) can be reduced to just a few key elements, with all others being derived from them. Size reduction is the number one
goal of SWaP. Happily, when you find a way to put the same functions in a smaller package, the weight, and to some degree the cost, will go down. The value of all parts of the design must be considered. Any component of the system that can be eliminated or made smaller without reducing performance should be eliminated. One part often overlooked is empty space. While it can have a purpose, that purpose should be well understood so, where possible, empty space that doesn’t serve a purpose can be removed. Another consideration is the material that makes up the system: the structure of the enclosure, the thermal shunting, stiffeners, mounting plates, covers, etc. These parts contribute to the weight of the system. Improvements in electrical efficiency will either increase performance or reduce power requirements or both.
Electrical efficiency is chiefly driven by the evolution of processors. Decreasing junction sizes and lowering operating voltages provide continuous improvement in power consumption for a given level of performance (Gigaflops per Joule). As for thermal efficiency, ΔT between the critical heat producing components and the system’s dissipation interface must be minimized. This can be done by using conduction materials with lower thermal conductivity, reducing or eliminating thermal junctions, and/or reducing the length of the conduction pathways. Improvements in thermal efficiency allow for the use of higher performance processors or improvements in reliability resulting from a reduction in component operating temperatures. So, because weight, cost, power requirements, cooling, reliability and RTC MAGAZINE AUGUST 2013
27
Tech In Systems
Wedgelock Thermal shunting
Thermal shunting
Other components (RAM, Network, etc.)
GPU Cores
CPU Cores
Wedgelock Chassis
Cold Plate
Figure 2 VPX systems use a fairly long thermal pathway to guide heat off of components and boards.
performance all hinge on size, electrical efficiency and thermal efficiency as the root drivers of SWaP, as a whole (or, SWaP 2C2E2R), when we achieve these goals, the other elements of SWaP will follow (Figure 1). As we have said, many standards work together to provide uniformity to the infrastructure of a system. The popular 3U VPX (VITA 46) systems are a good example. The suite of specifications defining VPX covers every detail. This includes internal connections, connector types, and the form factor and clearance of PC boards and thermal shunting. These relatively compact solutions are currently providing high-performance computing in mobile platforms. Systems built on the 3U
28
AUGUST 2013 RTC MAGAZINE
VPX form factor have evolved to keep up with the latest bus speeds of modern CPUs while providing good SWaP performance. VPX specifies the infrastructure in which the critical, high-speed electronics operate, including a robust bus scheme featuring rugged bus connectors. VPX also provides an infrastructure of bladed expansion cards. This modular bladed architecture has the ability to be easily expanded and reconfigured to accommodate changing requirements. With removable and reconfigurable cards, VPX systems are ideal for in-lab system design and prototyping. This approach makes heavy use of internal thermal conduction. Boards use thermal shunts, which guide the cards
into card cage slots, and provide a thermal pathway from the components on the board to the enclosure body (Figure 2). These shunts are made from aluminum or copper alloys with high thermal conductivity. Choosing these materials is important, because in a 3U VPX system, the path from the components through the thermal shunts and out to the dissipating surfaces is fairly long. This is not ideal, but rather the price for modularity. Itâ&#x20AC;&#x2122;s not ideal because heat is more efficiently removed by conduction, when the heat source is placed as close as possible to the dissipating surface. This forces a smaller Î&#x201D;T because a much greater portion of the heat flux is present at the dissipating surface. When the heat flux is directly coupled
Other components (RAM, Network, etc.)
GPU Cores
Thermal Shunts
CPU Cores
tech in systems
Chassis
Cold Plate
Figure 3 VITA-75 can drastically reduce the length of heat shunting required over that of VPX.
to the dissipating surface, not only is the heat transfer more efficient, but there is also less radiation of heat back into the system. Heat re-radiating from large internal thermal shunts creates an oven for your components. So paradoxically, when internal conduction is reduced, ΔT is reduced because total thermal resistance is lowered, and internally radiated heat is minimized. Giving heat the bum’s rush in this way lowers component temperatures. Lower component temperatures translate directly into improved reliability and/or higher performance. Therefore, heat shunting, while required in a bladed design such as VPX, can be drastically reduced in a VITA-75 form factor SFF design (Figure 3). Contrary to the popular saying that newer processors demand better cooling, the latest processors actually reduce thermal requirements. This erroneous thinking comes from the fact that modern processors provide the ability to operate above thermal design power (TDP) when the thermal environment allows. Because of this, they are capable of much higher performance, which produces higher thermal loads.
This “allows” for designs capable of dissipating more heat to take advantage of peak performance, but the thermal loads of processors should be comparable for a given level of performance. In addition to the ability to operate above 100% TDP, modern processors continue to shrink in size. Intel’s 3rd Generation Core i7 CPUs use a 22nm process. That combined with lower voltage operation provides reduced thermal dissipation requirements for any performance level while operating at normal TDP. When compared properly, modern processors continue to improve on efficiency, and because of these improvements, for any given performance requirement, modern processors actually reduce the thermal requirements of a platform and increase reliability. So, even if your applications don’t require higher compute performance, moving to the latest CPUs and GPUs will keep your system cooler and more reliable. The thermal efficiency improvements of eliminating thermal shunts come from the lowered thermal resistance of shorter paths and fewer thermal junctions. In SFF designs, this
is accomplished by orienting heat-generating components so they connect directly to the dissipating surface. By externally conforming to the VITA-75 form factor, but eliminating the internal card cage and backplane, ADLINK’s HPERC eliminates the SWaP costs caused by over-constraining standardization internal to the system. Compact SFF solutions often use internal boards based on COM Express, QSeven, PC/104, or custom form factors. Standardizing on external form factors provides the correct solution to mobile platforms where replacement, upgrade and reconfiguration are best accomplished at the box level. This simple redesign directly improves the size, weight, reliability, performance and even cost of system over a VPX solution. ADLINK Technology San Jose, CA (408) 360-0200. [www.adlinktech.com].
RTC MAGAZINE AUGUST 2013
29
technology deployed Development Tools for SoCs
Software-Driven SoC Development If we can reduce the dependency of software developers on getting real silicon in order to do their development and verification, we can shrink the overall time-to-market equation for SoCs significantly. by Jim Ready, Cadence Design Systems
I
t is well known in the semiconductor industry that software development costs for a new SoC outpace the hardware development costs, and at an ever increasing rate. This dramatic encroachment of software costs deeper into the semiconductor industry and the desire to reduce or at least control them drives an important inflection point for the capability of development tools and techniques used to build complex SoCs. The current SoC development methodology is often not sufficiently powerful to create functioning software-intensive SoCs for the smartphone, tablet and other markets, and at the same time meet aggressive time-to-market and cost objectives. The most significant problem is that software is often developed too late in the process-in some extreme cases starting post-silicon-even though software has a large impact on overall performance and
30
AUGUST 2013 RTC MAGAZINE
functionality of the SoC. Figure 1 (bottom row) illustrates this SoC development flow. In the traditional SoC flow, the software development effort is serialized after the hardware development, relying on availability of the silicon to begin verification and application software efforts. This approach has the obvious problem of lengthening time-tomarket (TTM) for the SoC under development, and is hopefully more the exception rather than the rule today. Over the past ten years or so, much effort has been exerted to bring the software development activities into the pre-silicon domain, via technologies such as software virtual platforms, FPGA prototyping environments and hardware emulation capabilities. This software-enhanced SoC flow (middle row) is more or less the current practice for many semiconductor companies today.
However, even though efforts were made to move the software development earlier in the process, the amount of software that needed to be executed before silicon has skyrocketed, in many cases perhaps negating the forward progress already made. The reason is easy to see. The availability of powerful OSs, such as Apple’s iOS as well as open source software, such as Linux and Android, enabled the development of consumer devices such as smartphones and tablets on a scale previously unimaginable. But wait, there’s more, as in Moore’s Law. With more and more transistors available, silicon designers are adding increasingly powerful multicore processors and hardware subsystems, such as graphics, video processing and wireless networking, which in turn need support in the software stack to fully utilize this new hardware capability. So it’s a perfect storm of powerful and widely used system software that must be adapted to and optimized for increasingly powerful SoC hardware, in less and less time to meet shrinking market windows. The problem, however, is that all the operating systems require massive amounts of software. Hundreds of megabytes of executable images need to run in order to develop, test and optimize the SoC hardwaredependent software being developed by the SoC software team. Being able to execute thousands of instructions at 10 kHz in some form of simulation and/or emulation doesn’t scale well when billions of instructions must be executed to boot Android and run and debug system software and apps on the SoC under development. While emulation provides a stop-gap with execution speeds in the MHz domain, booting full Linux, Android or iOS can still take hours. FPGA-based prototyping extends execution into the 10s of MHz domain, but in contrast offers less debug insight into the hardware compared to simulation and emulation, and is available significantly later in the project when hardware descriptions have become mature enough. This requirement to execute large-scale software stacks puts extreme pressure on the SoC development flow technologies such as software virtual platforms, FPGA prototyping environments and hardware emulation to increase their capability to execute software while at the same time maintaining their fidelity with and visibility of the hardware
Technology deployed
Legend
Next Generation Software-Driven SoC Flow
SW System HW
Powered By Software-Driven Platforms Emulation + Virtual Platform + FPGA
Continuous SW Development & Bringup Continuous System Validation HW Development & Verification Software-Enhanced SoC Flow SW Dev On model
Enabled By Virtual Platform FPGA Prototype Emulation
SW Dev & Bringup On real HW design, Silicon System Validation HW Development & Verification
Traditional SoC Software Flow
SW Dev & Bringup on Silicon
HW Development & Verification
System Validation
Tapeout Silicon Samples
Product Ships
Figure 1 A comparison of SoC software development flows.
under development. Constraints or limits on the ability of SoC software developers to execute and debug large amounts of software before silicon availability adversely affects their productivity and will increase costs and delay time-to-market. Ideally, we’d like to get to the SoftwareDriven SoC flow shown on the top of Figure 1, where there is essentially continuous development of hardware and software utilizing the best available representation of the hardware at any point in the flow, with “meaningful” software execution speed, i.e., capable of running Android, Windows RT and other large-scale operating systems. The software-driven SoC flow defines a new development methodology and uses new development tools and technology with which extensive system software is not only able to be brought up and verified early, but development teams are also enabled to use software early in the development cycle to verify hardware via software (software-driven verification). They can also measure and debug hardware/
software interactions and develop, test and optimize the software deliverable to the SoC customer (e.g., Android ported and optimized to the SoC under development). That means the emphasis is on software execution being supported during all phases of SoC hardware development.
Software Productivity and Its Limits
Increasing software productivity has proven to be a challenging task, so when we aim to improve the SoC software developer’s productivity, we must know what we’re up against, and deliver meaningful improvements. The classic paper by Frederick Brooks, “No Silver Bullet - Essence and Accident in Software Engineering,” divides the difficulties in software development into two components: accidental and essential. Accidental difficulty is centered around transforming the conceptual representation of software into the reality of running (correctly) on a particular piece of hardware.
The single largest gain in software productivity has been made by the introduction of high-level languages, which automated this transformation process. What is left is the essential difficulty, and Brook’s thesis is that no one tool or tools will ever make the kind of productivity improvements gained from the introduction of high-level languages in attacking essential difficulties. However, Brooks was not altogether pessimistic. He recommends closely examining make vs. buy trade-offs as a powerful weapon in increasing productivity. Having developers not writing software where they don’t have to is a great way to make them dramatically more productive. Another more “mechanical” approach to reduce accidental difficulties is to dramatically reduce the “turnaround” time from finding a defect, repairing it and testing the fix. The obvious implication for SoC software development, pre-silicon, is having a software execution environment that is fast enough to have a turnaround time that is interactive or runs in minutes rather than hours or days. RTC MAGAZINE AUGUST 2013
31
technology deployed
S o ftwa re C o m m unica tio ns L 3
A pplica tio ns
C o m m unica tio ns L 2
M iddle wa re
C e llula r M o de m
S o C Inte rco nne ct F a bric
C a m e ra Inte rfa ce
S D 3.0
C S I2
UFS
e M M C 4.5
C a che C o he re nt F a bric
LP D D R 2
GPS R e ce ive r
LP D D R 3
M o de m
DSP A /V
NAND NAND F L AF LSA S HH
D RL PAD DMR
C S I3
T o uch S cre e n C o ntro lle r
I2C
M ultim e dia P ro ce sso r
D igR F
D ispla y D rive r DSI
O C P 2.0
P H Y
U SB3 .0
3.0 PHY
2.0 PHY
P C Ie G en 2,3 PHY
Et h er n et P H Y
H igh spe e d, wire d inte rfa ce pe riphe ra ls
GP IO
HD MI SAT A
Display
PM I2C U MI SPI PI Ti LJTo w-spe e d pe riphe ra lme AG subsyste m r
MIPI WLA N LTE
O the r pe riphe ra ls
UA RT INT C
LLI
W iF i S D IO
B lue to o th
M o tio n S e nso rs P o we r C o ntro l
O C P 3.0
S L IM bus
U S B 2.0
D D R3
S L IM bus
Bare Metal
A pps A cce l
FM R e ce ive r
SP MI
Application Processor
3D G FX
L 2 ca che
L 2 ca che
M e m o ry C a rd
A7
RFFE
A7
F irm wa re / H A L
A pplica tio n S pe cific C o m po ne nts
A R M C P U S ubsyste m A 15 A 15
Modem Comms
D rive rs
UFS
F irm wa re / H A L
R TO S
S D 4.0
D rive rs
C o m m unica tio ns L 1
D S P S o ftwa re
B a re M e ta l S o ftwa re
O pe ra ting S yste m s (O S )
A udio Inte rfa ce
G BT cJT A G U S B 3.0 O T G H D M I 1.4
L o w spe e d pe riphe ra ls
S yste m o n C hip (S O C )
S yste m o n P rinte d C ircuit B o a rd (P C B )
Figure 2 Complex SoC hardware blocks and software stacks.
So when we are looking to increase the productivity of SoC software developers, we are strongly in the camp of finding approaches that have large and easily measureable benefits. At the top of our list is a development flow that fixes the accidental difficulty of slow software execution by adding technology that allows “meaningful” software execution on the best available hardware representation as early as possible during the development cycle.
Software Development Is Strongly Hardware Dependent
After numerous discussions with software developers within semiconductor companies, we have seen a clear picture emerge. For the software developers, ironically, it is all about the hardware. All the software effort, from writing the lowest level driver to building the coolest multimedia Android app, is driven by and limited by the underlying SoC hardware and its availability to developers on the SoC software team. Keep in mind, however, that these software developers are targeting large-scale operating system environments, such as Android, Linux and Windows RT. These systems represent millions of lines of code and megabytes of executable images, so it is critically important that the SoC development environment be able to handle the performance requirements to execute such large and powerful operating systems, while at the same time
32
AUGUST 2013 RTC MAGAZINE
presenting the most accurate representation of the underlying hardware. Figure 2 illustrates a typical complex design with both the hardware and software components. The SoC hardware is composed of a mix of blocks, some unique to the SoC, some based on reuse of previous generation SoCs, some purchased IP and so on. Each SoC maker must decide where to put their effort. In some cases they perceive their value to be substantial across the whole SoC, whereas other makers might decide to concentrate on a particularly important hardware subsystem and use standard offthe-shelf blocks for the rest. Most of these hardware blocks will interface to the application processor in some way. This in turn means that the application processor software will need driver support and maybe more, including OS kernel and middleware changes to effectively work with each hardware block of the SoC. Note also that some of the blocks themselves are processors (a DSP in the case of the current example), which of course run a sophisticated software stack as well. Figure 3 (right side) shows a more detailed view of the multiple software stacks typically developed for a modern SoC. A potentially worrisome situation shown in Figure 3 is that a significant amount of the delivered value is from the software running on the SoC, and it has to be developed from scratch (the block la-
beled Proprietary OS). So now we’re in a bit of a bind. We’re now in the “no silver bullet” domain of limits to increasing software development productivity. The good news is the software required for an SoC is often externally supplied, such as Android from Google. Thus the SoC maker’s software effort is dominated by porting and optimizing the software to exploit the unique capabilities of the SoC, rather than developing Android itself. So whatever rising software costs the SoC maker is seeing, they are not driven by the costs of developing Android, but rather by the cost of adapting Android to their own SoC. Short of taking transistors away from the hardware designers, it’s hard to see how to dramatically reduce the costs. However, we can dramatically reduce the time-to-market by enabling significant software development earlier in the flow. In addition, as shown in Figure 3, SoC developers are not just developing software that ships with the SoC. It’s clear that a significant amount of test and verification software is developed for the SoC and used along with other more traditional hardware verification techniques. It is not uncommon for Linux to be ported to a new SoC for use as a “test” OS since it provides a very rich set of tools and execution features that greatly simplify running verification software on a new SoC. There is no expectation that this Linux would ever be shipped to a customer; its use is purely internal. This is
Technology deployed
Color Key
Production SW Test/Ver. SW
Quality Assurance Tests
Silicon
SoC Stress Tests Graphics Tests SoC Basic Tests Graphics
HAL Drivers Test Linux
Graphics Framework Libraries HAL Drivers Android Linux
Graphics Kernel Mode Resources Memory Power Proprietary OS
SW Development
Boot code, device drivers, Kernel, SoC Drivers
SoC Software Stacks
FPGA
Emulation
Boot code, Firmware
Virtual Platform
Firmware Bring-up
RTL Simulation
HW/Firmware Bare-Metal Tests
Registers, memory, device drivers, sub-system
SoC Software Execution Environments
Figure 3 SoC software stacks and software-driven SoC flow.
software-driven verification and it is an important use-case and subject to the same accuracy (if not more) and performance requirements as the SoC customer-deliverable software stack. Note that Linux, Android, Windows RT and other commercial operating systems are developed outside of the semiconductor industry. In these cases the software effort on the part of the SoC maker is largely on porting and optimizing the software to run well on the SoC under development. However, this effort is not limited to the lowest levels of the stack, e.g., device drivers, as many may think. Although hardware dependencies are often thought of as being taken care of by device drivers, in many cases, changes to the core OS or even middleware layers are often developed to exploit SoC hardware capabilities. So whether it is software that ships with the SoC, or software that is only used internally for testing and verification purposes, there is no way to shortcut the fact that the OS has to boot, load middleware and support applications running in order to develop
and test the OS, its underlying hardwaredependent software and the hardware itself.
Software-Driven Development Supports Hardware Development
From a software perspective, it is interesting to observe that the earlier you move in the SoC hardware development process, the more it resembles software development and is therefore subject to more of the usual problems and solutions in the software world. We can see that the more the SoC hardware effort resembles developing software, the more the productivity gains start to be limited just as they have been in pure software. As is true with software, the best way to make a hardware developer more efficient is to not develop hardware in the first place. SoC designers understand the importance of IP reuse, whether internally developed and/or externally acquired. So it is no surprise that SoC hardware developers have increasingly turned to reusing internal IP or purchasing IP to help decrease the amount of new “software,” e.g. hardware
for the SoC that is developed from scratch, and the use of high-level synthesis tools and other means to raise the level of abstraction. Figure 2 shows the numerous IP blocks that compose a modern SoC. Another lesson from the software domain is to raise the abstraction level used to implement the new SoC blocks. If for a hardware developer, register transfer level (RTL) can be considered the same level of abstraction as assembly language for software developers, moving to high level synthesis such as SystemC and tools such as Cadence’s C to Silicon provide that next higher layer of abstraction. Using processor-based software-intensive blocks is another approach. In this case, most of the hardware “implementation” of a block’s function remains in software, written in C or C++ and running on a specialized processor constructed and optimized to execute the block’s function at near RTL speeds. The Tensilica Xtensa core from Cadence and Synopsys’ ARC are good examples of this approach. With the Brooks model in mind, we can RTC MAGAZINE AUGUST 2013
33
technology deployed
also see why in the “classic” EDA world, the tools continue to increase the productivity of developers by keeping up with each change in process technology. In effect, the back-end development process (place and route, timing closure, etc.) for semiconductors is dominated by artificial difficulties, that is the physical changes in the semiconductor process as a result of moving to the next node of technology. As Brooks tells us, artificial difficulties are amenable to tooling
to get dramatic productivity gains, and the EDA business continuously exploits that fact by developing those tools year after year. So where the SoC development is dominated by physics, e.g., place and route, timing closure and other artificial difficulties, engineering efforts will continue to see productivity gains. And where hardware development is dominated by essential difficulties, IP reuse and purchased IP provide significant productivity gains.
TAKE YOUR VIRTUAL ENVIRONMENT TO THE EDGE T A KE Y OUR V IRTUAL E NVIRONMENT T OT HE E DGE
The RES-mini Ruggedized Server ^sǣÞ¶Řs_ ¯ŸNj _sŎ Ř_Þض sŘɚÞNjŸŘŎsŘǼ Ķ OŸŘ_ÞǼÞŸŘǣ ɠÌsNjs ǢÞʊsʰ ɟsÞ¶ÌǼʰ Ř_ ƻŸɠsNj ʹǢɟ ƻʺ ŸƼǼÞŎÞʊ ǼÞŸŘ Þǣ ǣsɚsNjsĶɴ ĶÞŎÞǼs_ʰ ǼÌs NJrǢ˚ŎÞŘÞ OŸŎEÞŘsǣ ǼÌs NjŸEȖǣǼ _sǣÞ¶Ř Ÿ¯ ǼÌs NJrǢ ǣsNjɚsNj ¯ ŎÞĶɴ ɠǼÌ ǼÌs Ķ ǼsǣǼ rˤ ˠ˥˟˟ Ř_ ˡ˥˟˟ ÝŘǼsĶ ɭsŸŘ ƼNjŸOsǣǣŸNjǣ ɠÞǼÌ ¯ŸȖNjʰ ǣÞɮʰ ŸNj sÞ¶ÌǼ OŸNjsǣʳ ^sǣÞ¶Řs_ ǼŸ Es ¯ȖĶĶɴ OŸŎƼ ǼÞEĶs ɠÞǼÌ OȖNjNjsŘǼ ɚÞNjǼȖ ĶÞʊ ǼÞŸŘ ǼsOÌŘŸĶŸ¶Þsǣ ÞŘOĶȖ_Þض əōɟ Njs˖ Ř_ NÞǼNjÞɮ˖ ɭsŘǢsNjɚsNj˖ʰ ǼÌs NJrǢ˚ŎÞŘÞ ENjÞضǣ ŘsɮǼ ¶sŘsNj ǼÞŸŘ ǼsOÌŘŸĶŸ¶ɴ ǼŸ ŎÞǣǣÞŸŘ˚ONjÞǼÞO Ķ ŎÞĶÞǼ Njɴʰ OŸŎŎsNjOÞ Ķʰ Ř_ ÞŘ_ȖǣǼNjÞ Ķ applications.
13.5 in x 4 in x 11 in (W x H x D)
˒ ÝŘǼsĶ˖ ɭsŸŘ˖ ƼNjŸOsǣǣŸNj ʹ¯ŸȖNjʰ ǣÞɮʰ ŸNj sÞ¶ÌǼ OŸNjsǣʺ ˒ ȕƼ ǼŸ ˡˤ˥ µD ^^NJˢ rNN ˒ rÞ¶ÌǼ ˡʳˤ ÞŘOÌ _NjÞɚs E ɴǣ ˒ ǻɠŸ sɮƼ ŘǣÞŸŘ ǣĶŸǼǣ ˒ ˢ ɮÞǣʰ ˢˤµʰ ˡˤŎǣ ŸƼsNj ǼÞض ǣÌŸOĨ ˒ ˢʳ˟ µNjŎǣʰ ˧ Ëʊ ǼŸ ˡ˟˟˟ Ëʊ ŸƼsNj ǼÞض ɚÞENj ǼÞŸŘ ˒ ˟̨N ˛ ȕƼ ǼŸ ˤ˟̨N ŸƼsNj ǼÞض ǼsŎƼsNj ǼȖNjsʲ ˒ ˧̇ ǼŸ ˨˟̇ ŘŸŘ˚OŸŘ_sŘǣÞض ŸƼsNj ǼÞض ÌȖŎÞ_ÞǼɴ ˒ ˠˤ ĶEǣ ʹ˥ʳ˧ Ĩ¶ʺ ˒ ōÝĵ˚Ǣǻ^˚˧ˠ˟µʰ ōÝĵ˚Ǣǻ^˚˨˟ˠ^ʰ ōÝĵ˚Ǣǻ^˚ˠ˥˦˚ˠ ˒ ǢÞضĶs ˠˠ˟ˀˡˡ˟ ə N ʹˤ˟ˀ˥˟Ëʊʰ ˣ˟˟Ëʊʺʰ ǣÞضĶs ˡ˧ ə^Nʰ ŸNj ǣÞضĶs ˠ˧ ə^N ƼŸɠsNj ǣȖƼƼĶɴ ŸƼǼÞŸŘǣ
For current Themis product information, please, go to www.themis.com ©2013 Themis Computer. All rights reserved. Themis, the Themis logo, and RES-mini are trademarks or registered trademarks of Themis Computer. All other trademarks are the property of their respective owners.
34
Untitled-2 1
AUGUST 2013 RTC MAGAZINE
5/2/13 9:44 AM
In the end, as more and more hardware remains in software, if even only early in the development process, or always remains in software with the processor-based block approach, execution of software even on the pure hardware side becomes increasingly important during the development process. The process further strengthens the value proposition of a software-driven SoC flow, faster time-to-market and reduced costs.
The Overall Flow for SoftwareDriven SoC Development
The right side of Figure 3 shows a view of the software-driven flow and its relationship to the software deliverables for a typical SoC in a mobile device. Virtual platforms, emulation and FPGA prototyping (and the silicon itself) all provide software execution environments suitable for various stages of the SoC development flow. A critical underpinning of the software-driven SoC flow is the observation that a huge percentage of the millions of instructions that will be executed booting Android exercise a very small subset of the SoC hardware-mainly the processor and memory. The software-driven SoC flow exploits this situation by moving these elements to faster (although less accurate) attached simulation environments. For example, an ARM Fast Model is moved out of emulation along with its memory. This “hybrid” approach is carefully engineered to interact perfectly with the rest of the SoC RTL being run in emulation or in an FPGA prototyping system. No two SoC designs are alike, so there is always tuning of the configuration. And in some cases over time, different hardware subsystems may be moved out of RTL emulation and into a faster execution environment where emulation accuracy is less important than software execution speed. Ideally, the same tooling (debuggers, etc.) works across the different environments, allowing software developers and their hardware partners to select the most suitable spot in the flow to address the development task at hand. Cadence Design Systems San Jose, CA. (408) 943-1234. [www.cadence.com].
Why Should Researching SBCs Be More Difficult Than Car Shopping? INTELLIGENTSYSTEMSSOURCE.COM IS A PURCHASING TOOL FOR DESIGN ENGINEERS LOOKING FOR CUSTOM AND OFF-THE- SHELF SBCs AND SYSTEM MODULES. Todayâ&#x20AC;&#x2122;s systems combine an array of very complex elements from multiple manufactures. To assist in these complex architectures, ISS has built a simple tool that will source products from an array of companies for a side by side comparison and provide purchase support. PLATINUM SPONSOR
GOLD SPONSOR
INTELLIGENTSYSTEMSSOURCE.COM
technology deployed Software Considerations
Development Tools for SoCs
Establishing a New Virtual Prototyping Methodology for Hardware/Software Co-Design Software and hardware engineers face many issues when working together as a project team, and a unified virtual prototyping methodology can provide design cohesion between both disciplines. by Ric Vilbig and Colin Walls, Mentor Graphics
I
n the early days of embedded systems, most of the design effort was focused on hardware rather than software. However, as embedded systems grew in complexity, demanding more functionality and performance with expectations of drawing even less power (as seen in today’s smartphones), design teams began to include software developers alongside the hardware engineers. Today, the growth of the average embedded software team has outpaced its hardware counterpart by a factor of five. Further, these engineering teams now comprise embedded software specialists who are well-versed in working with the hardware team and those focused specifically on building an embedded system application. With the advancement in CPUs, GPUs and MCUs, consumer electronics in
36
AUGUST 2013 RTC MAGAZINE
just about every industry are loaded with innovative features and capabilities—and the consumer continues to demand more. The need for hardware and software engineering convergence is especially critical today. When designing an embedded system nowadays, it’s essential to use strategies that find the best compromises between performance and power consumption. Along those lines, two key strategies must be adopted: • Designing for optimum power consumption requires a system-wide approach—both hardware and software designers need to be involved. • To enable power consumption estimations early in the design cycle, and to avoid time-consuming redesigns late in the process, it’s important to recognize and adopt the right tools.
The more complex embedded systems are likely to be built on an operating system (OS). The code may be developed in-house leveraged from open source projects, or purchased from a software IP provider. Whatever its origins, the purpose of an OS is to manage hardware resources and provide common services that reach from the kernel all the way up to the middleware stack. A real-time operating system (RTOS) provides a high level of predictability— also known as a “deterministic” behavior. Typically, an RTOS has a small footprint and requires less memory to run. A non-real-time OS can be used when timing is not critical but a robust offering of software features and/or capabilities are needed. A common choice here is Linux or Android. Both platforms offer a familiar programming environment for developers and a large range of available middleware for software reuse. Android is particularly suited to applications where post-deployment software components, or apps, are a requirement. Linux and Android have a significantly larger memory footprint than a conventional RTOS. Every embedded design has a unique set of requirements for an operating system to support. In multicore architectures, quite often more than one OS may be needed to address both deterministic and non-deterministic requirements.
Power Consumption
In the past, power consumption was regarded as the sole province of the hardware designer. However, the influence of software on power consumption can no longer be ignored. The key to addressing the issue of power consumption from the software perspective is power visibility and analysis. This requires the right set of tools to enable hardware simulation and probing since power consumption relates indirectly to the software code, i.e., the software itself does not consume power, but the operation of the software impacts power consumption by way of the underlying hardware. In addition, code execution is radically affected by the processing power of the CPU. A powerful processor executes many millions of instructions per second,
Technology deployed
but runs with a very high clock frequency, which directly affects power consumption. Multiple, low-power CPUs, running at lower frequencies, could offer a similar throughput while consuming less power. From a single, high-powered CPU to multiple core system architectures, every configuration has its own set of complexities that have direct and indirect effects on power consumption. Hardware component selection includes choosing the processors, peripherals and memories that comprise the design. Choosing the right processor is critical since most devices today are optimized for specific applications. Specific processors are available with a wide range of memories with different performance/power characteristics. Some memories may account for up to half of the design gate count and consume large amounts of power. The more advanced system designs need to support numerous interfaces, connecting them to the outside world of video, audio, text messages and other control signalsâ&#x20AC;&#x201D;all supporting a variety of formats. Peripherals support data handling, computation and communication through these interfaces; their large number may also impact power and performance. Finally, the hardware topology impacts important attributes such as latency and throughput of the data flowing through the design; this, too, can impact the level of power consumed.
interactions. This allows design teams to readily change the hardware design topology and influence the register transfer language (RTL) design specification before RTL specs are finalized and implemented. At this stage, it is still relatively easy to add or remove compute resources, change interconnect topology, add a hardware accelerator block, or optimize the design for performance vs. power, to name a few. An integrated approach to virtual prototyping is therefore recommended. This approach provides a methodology that offers several options in both virtual prototyping and hardware/software validation (Figure 1). Mentorâ&#x20AC;&#x2122;s Sourcery CodeBench, an integrated development environment (IDE), allows the software on a system-on-chip (SoC) to be written before the chip is fabricated. Mentorâ&#x20AC;&#x2122;s Vista electronic system level (ESL) product offers a scalable transaction level modeling (TLM) methodology for writing models in compliance with the SystemC TLM2.0 standard. A single, scalable TLM2.0 model handles all ESL abstraction levels and design tasks by separating communication, functionality and the architectural aspects of timing and power into distinct
Figure 1 An integrated approach allows software developers to stay in their native development environment to develop, debug and optimize their software on virtual prototypes and emulation platforms.
yet synchronized models. Scalable transaction level models allow timing and power details to be added, changed, or disabled as needed, while maintaining a single behavioral description throughout the design flow. This approach generates a virtual prototype executable package that can then be distributed to hundreds of software engineers, allowing them to optimize power on any Vista-produced virtual prototype. The
Virtual Prototyping for Early System Validation
According to a 2013 market study by UBM, over 50 percent of embedded system designs fall behind schedule. It is clear that engineering teams need to seek better methodologies to meet product development timelines. One way to meet a given timeline is to perform early validation of software on a virtual prototype of the hardware. Virtual prototyping techniques provide an alternative to traditional hardware prototyping by using abstracted functional models of the hardware. Virtual prototyping opens the way to concurrent development of hardware and software and continuous analysis and optimization of the design, while also providing better insight into complex hardware/software
Figure 2 Sourcery CodeBench Virtual Edition used with Vista Virtual Prototyping shows hardware analysis, such as power consumption and cache utilization, correlated to the software execution flow on the same timeline. RTC MAGAZINE AUGUST 2013
37
technology deployed
For even greater visibility, Mentorâ&#x20AC;&#x2122;s Sourcery Analyzer tool, when combined with the CodeBench IDE, allows designers to visualize system data quickly and monitor software/ hardware interactions. The use of this capability allows Figure 1 designers to spot An example of an early prototyping to first silicon potential errors that process that provides hardware insight in a manner that are normally quite is easily understood among software developers. difficult to identify Sourcery CodeBench Virtual Edition tool deep in the software. It also helps the emintegrates Mentorâ&#x20AC;&#x2122;s embedded software bedded developer understand the perforIDE with the Vista Virtual Prototyping, fa- mance characteristics of either an applicacilitating software development on a virtual tion or a complete system. Since Sourcery prototype (Figure 2). When the hardware Analyzer collects data from several design advances from simulation model to sources (Linux, Nucleus, or other RTOSs) RTL emulation, CodeBench Virtual Edi- and from the user-level application, it can tion integrates with the Veloce hardware run any combination of hardware repreemulator to facilitate hardware/software sentations. CodeBench Virtual Edition co-verification and software development augments the software performance view on the emulation platform. This new tech- with hardware details provided by the virnology and methodology approach enables tual platform to give insight into complex software developers to remain in their na- hardware/software intersections. tive software development environment to develop, debug and optimize their software Final Thoughts on Virtual stack on virtual prototypes and emulation Prototyping platforms before first silicon, and eventuA broad portfolio of tools for both ally on the hardware prototype and final hardware and software development are design after first silicon. now available that effectively interoper-
38
advertisement_multicore_7,375x3,375.indd 1
AUGUST 2013 RTC MAGAZINE
ate together as a virtual prototyping environment (Figure 3). This environment allows a system-wide view of development accompanied by a depth of experience to support hardware and software disciplines. In order for this approach to be truly effective, a unified software development environment must span the entire hardware/software spectrum, from virtual prototypes to emulation, first silicon, and up to reference design boards. Adopting this unified hardware/software development methodology is a very helpful approach for todayâ&#x20AC;&#x2122;s advanced embedded system designs that require maximum performance while adhering to the needs of low power, since bringing software integration into early pre-silicon phases will optimize hardware and system quality. This process also drastically shortens product development schedules. This integrated hardware/software methodology allows design teams consisting of both hardware designers and software developers to achieve their collective goals of getting to market on time, on spec and on budget. Mentor Graphics Wilsonville, OR. (503) 685-7000. [www.mentor.com].
30.01.2012 13:34:54
Real-Time & Embedded Computing Conference Minneapolis, MN September 10, 2013 8:30am – 2:30pm
Chicago, IL September 12, 2013 8:30am – 2:30pm
Toronto, ON September 26, 2013 8:30am – 2:30pm
Sheraton Minneapolis West Hotel
HYATT Lisle near Naperville
Crowne Plaza Toronto Airport Hotel
Brought to you by
www.rtecc.com
Not Your Father’s Embedded Industry Twenty years ago, embedded used to signify a class of dedicated computing left for only the hardiest industrial or military applications. Today, we’ve grown into an integral part of the growing intelligent systems world. Integrating with mobile and enterprise technology, our embedded legacy brings M2M connectivity across markets and applications. See where we’re going at RTECC 2013.
Promo code vip13 for $5 starbucks card* (Invite your colleagues!)
Register today at www.rtecc.com * Must be present at time of event to recieve.
RTC MAGAZINE AUGUST 2013
39
INDUSTRY
WATCH
Strategies for Fabrics
Strategies for Fabricsâ&#x20AC;&#x201D;Some Choices Settled While Others Go On With the conversion of bus-based systems to serial interconnects, there are several different fabrics that can be used to provide fast and efficient data transfer. With InfiniBand, Serial Rapid I/O, 10Gb Ethernet and PCIe 3.0, is there a best single solution or does it depend on the application? by Wayne McGee, Creative Electronic Systems
T
here are a number of commonly available implementations of serial fabrics available to designers and OEMs. These include both board-toboard as well as box-to-box applications. After reviewing a brief description and history for the most popular serial fabric choices, we will take a look at how best to apply them. InfiniBand: The InfiniBand Trade Association, which was chartered in 1999, created the InfiniBand specification. Through a succession of refinements, the throughput has continued to improve over the years, and commonly available transfer rates range from 20 Gbit/s double data rate (DDR) to 56 Gbit/s fourteen data rate (FDR) on a x4 lane. Host Channel Adapters (HCA) support both copper and optical fiber interconnects and can be used boardto-board or box-to-box. Software support is generally available for Linux and Windows Server 2012, including open fabrics enterprise distribution (OFED) for providing remote direct memory access (RDMA) support. Support for real-time operating systems such as VxWorks is limited to proprietary offerings.
40
AUGUST 2013 RTC MAGAZINE
Serial RapidIO: Motorola Semiconductor and Mercury Computing released version 1.0 of the RapidIO specification to the public in 1999. The following year the creation of the RapidIO Trade Association was announced. The current specification, 2.2, was released in 2011.Commonly available transfer rates range from 16 Gbit/s (4 lanes at 5 Gbit/s after encoding) to 20 Gbit/s (4 lanes at 6.25 Gbit/s after encoding). SRIO supports both copper and optical fiber interconnects. Its primary use is chip-to-chip interconnection and it can also be used for board-to-board or box-to-box interconnection. Software support for SRIO appears to be limited to various implementations of Rapid I/O Messaging Network (Rionet), a protocol to send Ethernet frames over SRIO; OpenFabrics Enterprise Distribution (OFED); and proprietary implementations of Multicore Communications API (MCAPI) from the Multicore Association. 10 and 40 Gigabit Ethernet: IEEE Standard 802.3ae first described 10 Gigabit Ethernet in 2003. Unlike previous Ethernet specs, 10Gig Ethernet requires full-duplex point-to-point connections using switches. 10Gig Ethernet bandwidth
is inefficient with packet sizes below 1024 bytes and reaches maximum bandwidth at around 4096 byte packets. This supports a bandwidth of 1.1 Gbyte/s. 10Gig Ethernet supports both copper and fiber interconnections. Due to the higher frequency, copper connections require high-grade cables for box-to-box interconnect or impedance controlled PCB design for backplane interconnection. There are multiple interconnect specifications defined for both copper and fiber implementations. TCP/IP is the default network protocol used with Ethernet and has nearly universal operating system support. However, the TCP/IP stack requires valuable CPU cycles to process, even if an off-load engine is available. TCP/IP over Ethernet has the highest latency of the networks discussed in this article. Using OFED, RDMA is available as are several other protocols such as RDMA over Converged Ethernet (RoCE) and internet Wide Area RDMA Protocol (iWARP). IEEE Standard 802.3ba was ratified in 2010 and documented the 40 and 100 Gigabit Ethernet variants. 40Gig Ethernet is achieved by using four 10GigE lanes in parallel. The cost per port for a 40Gig
INDUSTRY WATCH
60 50 40 30 20 10 0
InfiniBand FDR x4
InfiniBand QDR x4
Figure 1
InfiniBand DDR x4
SRIO Gen 2 x4
Signalling Rate Gb/s
10GigE x1
40GigE x1
PCIE 2.0 x4
PCIE 3.0 x4
Actual Data Rate Gb/s
Bandwidth comparisons of InfiniBand, SRIO, PCIe, 10Gb and 40Gb Ethernet.
Ethernet switch is roughly 10x the cost of a 10Gig Ethernet switch as of this writing. PCIe 3.0: PCI Express was first released by PCI-SIG in 2003. The currently available 3.0 specification was released in 2010. The current spec supports lane widths of 1, 2, 4, 8 and 16 lanes. The x16 implementation will support a bidirectional bandwidth of 32 Gbytes/s. The primary use of PCI Express is chip-to-chip interconnection normally in a single host and several slave configuration, with a secondary use in board-to-board interconnection. The high frequency signals coupled with the 20inch maximum copper line length limit the number of boards that can be connected in a system. While there are optical bus extenders available in several form factors, they are not widely deployed. Figure 1 compares the bandwidths of the different fabrics.
Board to Board Use of Switched Fabrics
VPX and ATCA are the primary open system specifications that provide multiple standard protocols on the backplane. As such, each of the technologies can be applied to board-to-board communications in the VPX and ATCA data planes. InfiniBand usage in OpenVPX is defined in VITA 65, Section 5.2, which refers
back to VITA 46.8. Of possible concern to the reader is that VITA 46.8 is currently in draft form and is not final. In addition, VITA 46.8 defines only three data rates: Single Data Rate (SDR), Double Data Rate (DDR) and Quad Data Rate (QDR), which equate to 2.5 Gbit/s, 5 Gbit/s and 10 Gbit/s data rates respectively. Several suppliers currently offer InfiniBand as an option. A quick survey of existing implementations shows DDR to be the available data rate. With two fat pipes available for InfiniBand in the data plane, that gives the user up to 40 Gbit/s bandwidth. SRIO usage in OpenVPX is defined in VITA 65, Section 5.4, which refers back to VITA 46.3 and is released. SRIO implementations are more likely to be found on PowerPC- based designs as it is natively available on many of the Freescale processors. Intel processors that natively support PCIe can be used with SRIO by adding a bridge chip, but this drives up the cost and complexity of the design. 10Gig Ethernet usage in OpenVPX is defined in VITA 65, Section 5.1.4 and 5.1.5, which refers back to VITA 46.7 and is released. Two different electrical variants of 10Gig Ethernet are enumerated for the data plane, so designers should ensure all boards support the correct variant.
PCIe usage in OpenVPX is defined in VITA 65, Section 5.3, which refers back to VITA 46.4 and is released. Of possible concern to the reader is that VITA 46.4 is written for PCIe versions 1 and 2. It does not address version 3. A web survey of currently available single board computers from some of the leading VPX suppliers yielded data plane feature information based on 3U vs. 6U form factors and PowerPC vs. Intel Architecture CPUs. The results of the survey are shown in Figure 2. PowerPC-based SBCs utilized SRIO two to one over PCIe in the 6U form factor. The numbers reversed themselves for the 3U form factor. InfiniBand and Ethernet were not players in the PowerPC data plane market. In the Intel Architecture 3U form factor, 1Gig Ethernet (not 10Gig Ethernet) had a slight edge over PCIe. The PCIe implementations were either Gen1 or Gen2. The Intel 6U form factor offerings were more diverse. InfiniBand (DDR), SRIO and PCIe (Gen 2) were equally offered. However, 10Gig Ethernet was present at double the frequency of the other technologies. Based on these findings, a system based primarily around 6U PowerPC is more likely to easily connect to other SBCs RTC MAGAZINE AUGUST 2013
41
INDUSTRY WATCH
14 12 10 8 6 4 2 0
Infiniband
SRIO 3u PPC
PCIE 6u PPC
GigE 3u IA
6u IA
Figure 2 VPX boards that support serial fabrics by fabric type. Data obtained from a web survey of key VPX board suppliers.
via SRIO in a multiprocessor system. The choice is not nearly as clear-cut in the 3U format. SRIO would be the choice if bandwidth is the deciding factor, but 1Gig Ethernet is more widely available. If Intel Architecture is the primary CPU basis, the 6U format has the most flexibility in currently available products. InfiniBand implementations appear to be limited to DDR as of this writing. PCIe appears to be at revision 2.0. There may be additional connector and signal integrity issues to overcome before these backplane rates can go higher. There is some concern that optical backplanes may be required to achieve the highest data rates. A newer standard, the ATCA standard, was created by PICMG in 2002 to address the needs of the Telecom sector. The specification has been upgraded to accommodate advances in enabling technology over the years. PICMG 3.0 describes electrical and mechanical aspects of the system, but does not define the backplane fabrics. 10Gig Ethernet usage in ATCA is described in the PICMG 3.1 specification and updated to Rev. 2.0 in 2012. This update added enhanced 10Gig Ethernet and introduced
42
AUGUST 2013 RTC MAGAZINE
40Gig Ethernet to the backplane. InfiniBand was defined in PICMG 3.2. PCIe was defined in PICMG 3.4. Both were adopted in 2003. SRIO was defined in PICMG 3.5 in 2005. These specifications have not been updated since their adoption, nor are they widely deployed. It was reported at the Advanced/Micro TCA Summit in September of 2012 that over 95 percent of the installed systems were based on Ethernet technology. It would seem that the battle for the ATCA data plane is over.
System to System Connections Using Switched Fabrics
Many PCIe-based servers now come with one or more 10Gig Ethernet ports as standard features. Most come with a PCIe 3.0 expansion slot that will host a 10Gig Ethernet NIC or an InfiniBand HCA. If the system can tolerate the lower bandwidth, higher CPU utilization and higher latency, then 10Gig Ethernet is the lower cost selection. If the system canâ&#x20AC;&#x2122;t tolerate any one of those three criteria, then InfiniBand is the higher performance option at a higher cost. Also, InfiniBand has been the most popular
solution for several years in high-performance computing applications where direct application-to-application data transfer is required using several servers. Looking at bus-based systems such as VPX and ATCA is a different situation. Many of the VPX deployments are in rugged and electrically noisy environments. As such, while the VPX box has substantial high-speed resources within the box, it must rely on legacy or lower bandwidth communications with other systems. 1GBASE-T Ethernet is commonly used, and an emerging trend appears to be the use of 10GBASE-T Ethernet where high speed is required. In a lot of cases, VPX systems are replacing existing systems such as VME, and the existing system-to-system connections are used. On the other hand, ATCA box-to-box communications are primarily based on 10Gig Ethernet. A top-of-the-rack switch is frequently used to connect systems within the rack and provide a link to the data center. Connections between the racked systems and the switch are usually copper based, while the link to the data center is usually optical fiber due the distance involved.
INDUSTRY WATCH
Numerous studies have been performed to document the effect of choice of programming techniques to accomplish data transfer over the physical media. In the application programming interface (API) provided in OFED, there are different verbs that define the transfer mechanisms between senders and receivers. In one study, “A Performance Study to Guide RDMA Programming Decisions” by Patrick MacArthur and Robert Russell, an analysis of InfiniBand RDMA vs. RoCE performance using various commands and options is given. The choice of verb in the API along with message size and notification options can have a significant effect on overall efficiency for transfer rate as well as CPU utilization. Concerns about the transfers include
the effects of latency—the time it takes for the message to complete the journey from sender to receiver, as well as jitter—the variability of the latency. Many multiprocessor applications require data coherency for the algorithms to function properly. RDMA over Ethernet can provide a far more efficient transport mechanism in terms of CPU utilization, latency and jitter, provided that the hardware and operating system are RDMA-enabled to support it. For ATCA, many telecom applications utilize the IP protocol instead of using the full TCP/IP stack for board-to-board interconnection. With the variability in availability of fabric technology between Intel and PowerPC architectures, there appears to be no single best solution. However, there do
seem to be some emerging trends. InfiniBand is ahead in PCIe form factor-based systems, with 10 and 40Gig Ethernet coming on strong. Ethernet is the dominant choice in the ATCA world. The fabric battle in the VPX world is still raging. Although SRIO appears to have a strong position, the product introduction dates are older and new product introductions appear to favor the other fabrics. PCIe and Ethernet are widely deployed in the 3U form factor, and many of the newer 6U designs are now including InfiniBand. Creative Electronic Systems Geneva, Switzerland. +41 (0)22 794 74 30. [www.ces.ch]
USB Module & Data Acquisition Showcase Featuring the latest in USB Module & Data Acquisition technologies
H264-ULL-cPCI - Dual-channel 3U CompactPCI HD H.264 Encoder ual channel encode at up to 1080p30 D Single channel encode at up to 1080p60 Dual Analog HD inputs (YPbPr, VGA, RGB) H.264/MPEG-4 AVC (Part 10) encoder Ultra Low Latency encoder (below 40ms) 3U CompactPCI form factor Drivers for Windows XP-Embedded, Linux, QNX Comprehensive Video Recording & Streaming SDK’s Available
Advanced Micro Peripherals Phone: (212) 951-7205
Pixus CompactPCI Backplanes & Enclosures
USB Wi-Fi Modules 802.11b/g/n Compliant SB 2.0 hot swappable interface U Compatible with USB1.1 and USB2.0 host controllers Up to 300Mbps receive and 150Mbps transmit rate using 40MHz bandwidth Up to 150Mbps receive and 75Mbps transmit rate using 20MHz bandwidth 1 x 2 MIMO technology for exceptional reception and throughput 2 U.FL TX/RX antenna ports Wi-Fi security using WEP, WPA and WPA2 Compact size: 1.0” x 1.0” x 0.25” (Modules) Windows 2K, XP, Vista, Win7 support Linux 2.4/2.6 support
Pixus has a large offering of CompactPCI backplanes & enclosures with modular power options. Serial, PCIe Gen3, 2.16, H.110, & PXI versions are available. 3 U & 6U heights with 32/64-bit options 33 MHz & 66 MHz versions Serial, PCIe Gen3, 2.16, H.110, & PXI, Customized versions are available Modular design with wide range of power options Superior signal performance Mil/Aero, Telco, Transportation, Industrial & Energy
Pixus Phone: (519) 885-5775
E-mail: info@pixustechnologies.com Web: www.pixustechnologies.com
Radicom Research, Inc. Phone: (408) 383-9006 Fax: (408) 383-9007
DDR3 / DDR4 Protocol Analyzer Supports ECC SODIMM
Phone: (408) 653-1262 Fax: (408) 727-6622
E-mail: PSGsales@lecroy.com Web: lcry.us/190W4G4
E-mail: sales@radi.com Web: www.radi.com
F22P: CompactPCI PlusIO SBC
ibra 480 protocol analyzer - test and debug DDR3/ K DDR4 Easy setup - no calibration needed Analyzes and triggers on JEDEC timing violations Supports DDR3 ECC SODIMM as well as U-DIMM / R-DIMM Allows faster DDR test and integration for real-time and embedded applications
Teledyne LeCroy
E-mail: sales@amp-usa.com Web: www.amp-usa.com
MEN Micro’s versatile, high-performance F22P uses Intel’s 3rd generation Core i7 processor with processing speeds of up to 3.3 GHz. Delivers excellent graphics performance for computing environments requiring intense data throughput. Includes 16 GB of DDR3 DRAM memory with ECC functionality and 64 Mbits of boot Flash.
MEN Micro Phone: (215) 542-9575 Fax: (215) 542-9577
E-mail: sales@menmicro.com Web: www.menmicro.com
RTC MAGAZINE AUGUST 2013
43
products &
TECHNOLOGY Quad-Core PXI Embedded Controller for Multitasking Test and Measurement
A quad-core PXI embedded controller features the high-performance Intel Core i7-2715QE 2.1 GHz processor, with up to 16 Gbytes of 1333 MHz DDR3 memory for seamless execution in multitasking environments and reduced test times. The PXI-3980 from Adlink Technology features dual BIOS backup to reduce maintenance costs, multiple interfaces for connecting and controlling a wide variety of standalone instruments, user-friendly access design for easy maintenance, and support for hybrid PXI-based testing system control. The PXI-3980’s structure delivers high availability in reliable testing systems, including the adoption of dual BIOS backup, which allows—in the event of a main BIOS crash—the secondary BIOS to boot the system and recover the main BIOS, reducing maintenance costs and efforts. Easy maintenance makes battery, storage device and SODIMM modules swap-out easier than ever. In addition, solid metal case elements protect electrical components and enhance electromagnetic compatibility. In addition to optimum processing performance, the PXI-3980 provides multiple interface choices for connecting and controlling a wide variety of standalone instruments, including two display ports that support VGA + DVI, dual GbE ports, GBIP and trigger I/O for advanced PXI trigger functions. The PXI-3980 also includes four USB 2.0 ports and dual high-speed USB 3.0 ports for connection to storage, easily accessing data from system controllers with limited builtin data storage size and securing data with no storage damage issues. Adlink provides the option of either a pre-installed HDD or Intel 520 Series SSD for unequaled performance and reliability. The PXI-3980 supports Windows 7 32/64-bit and Windows XP 32-bit operating systems. The PXI-3980 controller, combined with the ADLINK 3U 19-slot PXI-2719A chassis, is a winning combination for a wide variety of testing and measurement applications, such as wireless and RF testing. ADLINK Technology, San Jose, CA. (408) 360-0200. [www.adlinktech.com].
Rugged CalTrans Approved Power Supply
A California Department of Transportation (CalTrans) Series of industrial grade power supplies from ETA-USA is designed to meet rugged CalTrans specifications. The 206L provides power for critical highway traffic signal controllers. ETA-USA's 206L meets CalTrans TEES-2009 specifications and has been tested and approved by CalTrans for their most demanding outside signaling cabinets. The California Department of Transportation specifications are used in many states as their standards for traffic signaling cabinets. The 206L is rated for 85 to 264 VAC input with an output of 24 VDC at 5A. The 206L is also power factor corrected. It is available with 6-10 week delivery at competitive prices. Volume discounts apply. ETA-USA, Morgan Hill, CA. (408) 779-2793. [www.eta-usa.com].
44
AUGUST 2013 RTC MAGAZINE
Cost-Efficient COM Express Type 6 Modules with Celeron CPUs
MSC Embedded offers MSC C6B-7S modules with three different Celeron processors and integrated Intel HD Graphics. The Intel Celeron CPU 1047UE with 1.4 GHz has two cores and 2 Mbyte cache. The single-core Intel Celeron 927UE (1.5 GHz) integrates 1 Mbyte cache. Both processors have a thermal dissipation power (TDP) of 17W. The cost-efficient dual-core Intel Celeron processor 1020E is clocked with 2.20 GHz and has 2 Mbyte cache. The TDP is 35W. The Celeron modules integrate the Intel HM76 PCH chip set. Dual-channel DDR3 SDRAM modules (two SO-DIMM sockets) with a maximum storage capacity of 16 Gbytes can be plugged in. According to the COM Express Type 6 specification, the MSC C6B-7S modules offer four USB 3.0 and USB 2.0 interfaces each. Besides the seven PCI Express x1 channels, the modules are equipped with a PCI Express graphics (PEG) x16 interface, HD audio and a Gbit Ethernet interface. For the connection of high-resolution displays, the modules allow for direct access to the digital display interfaces DisplayPort and HDMI with a resolution of up to 2560 x 1600 pixels. Data can be stored via four SATA II channels at up to 300 Mbyte/s or on a NAND flash SSD that can be optionally populated. An onboard plug for a speedcontrolled fan on a special cooling solution permits the implementation of quiet systems for environments that are sensitive to noise. The platform runs under the operating systems Windows 7, Windows Embedded Standard 7, Windows 8 and Linux. The AMI UEFI firmware has been implemented on the boards. The cost-efficient dual-core 1020E-based module is priced at $305 for higher quantities. MSC Embedded, San Bruno, CA. (650) 616-4068. [www.mscembedded.com].
PRODUCTS & TECHNOLOGY
PCI Express Packet Processor Card Features Cavium Octeon II
A high-performance network PCI Express packet processor card is designed for use in PCI Expresscompliant servers and systems. Powered by the Cavium Networks Octeon II CN68XX multicore processor, the O2E-100 platform from JumpGen Systems off-loads network-centric tasks utilizing processor clock rates up to 1.2 GHz and up to 32 cnMIPS64 v2 processor cores and multiple accelerator engines. Multiple selections of CPU speed and core count allow for configurations to support a wide variety of thermal and power limitations. Targeting the needs of developers in deep packet inspection, network security and high-frequency trading, in addition to other packet inspection applications, the O2E-100 enables integrating network-centric processing into a standard server environment. By incorporating the Cavium CN68XX processor in a standard single-slot PCI Express card form factor, the board optimizes network efficiency while simplifying deployment. O2E-100 features include up to 32-way SMP Linux and Cavium Simple Executive and up to 64 Gbytes of quad-channel 72-bit wide DDR3 ECC memory. There are four SFP/SFP+ interfaces provided on the front panel, supporting 1GigE (including 10/100M) or 10GigE and PCI Express Gen2 x4 to base board. JumpGen’s team of experts is available for technical support and hardware design services. Customers can leverage JumpGen’s knowledge base to extend their product functionality in application-specific form factors. JumpGen Systems, Carlsbad, CA. (760) 931-7800. [www.jumpgen.com].
3U VPX Graphics Card with AMD Embedded Radeon for GPGPU Applications
A new 3U VPX graphics card with an AMD Embedded Radeon E6760 GPU is designed to meet the latest demands in avionics and military technology. With its 480 computing cores, the VX3327 from Kontron delivers a parallel data processing performance of up to 576 GFLOPs. Equipped with this processing power, the VX3327 is optimized for compute-intense general purpose graphics processing unit (GPGPU) applications deployed in avionics and military technology that require superior situational awareness. Size, weight, power and cooling (SWAP-C)-critical applications benefit from the board’s lightweight construction, along with minimal power consumption of just 35 watts, plus conduction-cooling—withstanding extreme temperatures of -40° to +85°C. Additionally, the board supports real-time graphic data transmission of up to three independently controlled displays, so that users can extend their field of vision across three high-performance screens. The OpenVPX-compliant VX3327 has been designed for long-term available GPGPU applications in avionics and military technology. It supports the OpenCL programming language as well as AMD’s Accelerated Parallel Processing (APP) technology for supercomputing capabilities. The AMD APP technology allows OpenCL code to be compiled into external libraries to embed highly parallel algorithms into conventional code, such as C++. It is fitted with an AMD Embedded Radeon E6760 GPU-based MXM 3.0 module, which is connected via 8 PCI Express Gen2 lanes. A link-data rate per lane from 1.62 GHz to 5.4 GHz is supported and fast graphic memory of 1 Gbyte GDDR5 with 128 bits and 800 MHz are also featured. The Unified Video Decoder of the GPU supports H.264, VC-1, MPEG-2 and MPEG-4 videostream decoding and effectively unburdens the host-processor system. The Kontron VX3327 supports up to three independent displays through one VGA interface and two DisplayPorts, available as front or rear I/Os. Up to 2560 × 1600 @ 60 Hz, 24 bpp with a 2.7 GHz link or up to 2560 × 2048 @ 60 Hz, 30 bpp with a 5.4 GHz link are further supported. With Kontron’s 3U VPX graphics card VX3327, users and product developers benefit from its efficient exploitation of system resources and its robust architecture, which is expressly designed for the environmental conditions in avionics and military. The VPX graphics card is available either in air-cooled (0° to +55°C) one-inch (5HP) or in conduction-cooled (-40° to +85°C) 0.8-inch (4HP) variants. Kontron, Poway, CA. (888) 294-4558. [www.kontron.com].
Simulator Provides Recording System Evaluation and Speeds Development
A new simulator for analog and digital recording systems made by Pentek includes a virtual recorder server application that simulates disk and I/O transactions for a complete and realistic recording environment. Using either the standard Pentek SystemFlow graphical user interface (GUI) or the SystemFlow application programming interface (API), the virtual server provides live, interactive operation for training or development. The user can easily switch between different recording systems in the Talon recording system product line. The common user interface (UI) and API provides a simple transition from one Talon recorder to another. The SystemFlow Simulator demonstrates actual Configuration, Record, Playback and Status screens of the standard GUI, each with intuitive controls and indicators. It also includes the SystemFlow Signal Viewer to display simulated live signals being digitized and recorded by a Pentek analog signal recorder. The SystemFlow Signal Viewer includes a virtual oscilloscope and spectrum analyzer for signal monitoring in both time and frequency domains. Developers have the option of using the SystemFlow API as an invaluable tool for developing their own UI to control the Talon recording system. The UI can be tested against the simulator prior to receipt of a Talon recorder, saving valuable development time. The simulator can be controlled locally or remotely via the socket-based client-server architecture. This allows system engineers to set up and test remote control of their recording system before it is received. The SystemFlow Simulator is provided in 32- and 64-bit Windows 7 Professional operating system versions. The SystemFlow Simulator is free to download to qualified Talon recording system developers and evaluators. Developers should contact Pentek to be qualified to receive the download. Pentek, Upper Saddle River, NJ. (201) 818-5900. [www.pentek.com].
RTC MAGAZINE AUGUST 2013
45
PRODUCTS & TECHNOLOGY
Two Channel 200 MS/s 14-Bit PCI Express Digitizer
A PCI Express digitizer offers high sampling rates and onboard signal averaging for long-term, high-speed data recording. The PCIe9852 from Adlink Technology features two simultaneously sampled 200 MS/s input channels with 14-bit resolution, 90 MHz bandwidth, and up to 1 Gbyte DDR3 onboard memory. Highly accurate measurement, up to 800 Mbyte/s data streaming and onboard signal-averaging technology combine to make the PCIe-9852 suitable for long-term, high-speed data recording applications such as distributed temperature sensing, radar signal testing and atmospheric science research. The PCIe-9852 provides 14-bit high resolution ADC and up to 200 MS/s sampling, delivering 83 dB SFDR, 62 dB SNR and -81 dB THD, leading the field in high dynamic performance. The PCIe-9852 provides a flexible set of input ranges, ±0.2V, ±2V, or ±10V, software selectable 50Ω or 1 MΩ input impedance, and a wide variety of triggering options and tight synchronization capability, all maximizing convenience of use. Based on Gen 2 PCI Express technology, the PCIe-9852 streams data on both channels at maximum data rate (200 MS/s), enabling continuous delivery to the host PC at rates up to 800 Mbyte/s, and a complementary 8 x 500 Gbyte driver RAID system (4 Tbyte) further extends capture sessions beyond one hour. The PCIe-9852 combines high-bandwidth, dualchannel, simultaneous data streaming with built-in memory up to 1 Gbyte for massive data storage. The PCIe-9852 is equipped with onboard Signal Averaging Technology, allowing detection of small repetitive signals in noisy environments, with no CPU loading, suitable for applications requiring extraction of small signals from background noise as occurs in DTS. Adlink PCIe-9852 supports multiple operating systems, including Windows 8, Windows 7, Windows XP and Linux, and is fully compatible with third-party software such as LabView and Matlab. Users can simply complete the programming through Adlink DAQPilot software development kit. ADLINK Technology, San Jose, CA (408) 360-0200. [www.adlinktech.com].
Customizable 3.5” Touchscreen Device with Accelerometer and Servo
A new expansion board features a 3D accelerometer, audio in/out jacks, two USB mini-B jacks and a console port. The Alto35 from Gumstix also adds an RC servo, LEDs in four different colors, two tactile switches and an included 3.5” resistive touch-screen display from InTouch Electronics to the expansion board. All Gumstix Overo computers-on-module are compatible with the Alto35 expansion board, providing access to the versatile software solutions supported by Gumstix, including Linaro Ubuntu with Robot Operating System and the Yocto Project for easy development. The standard Alto35 expansion board can be ordered from Gumstix, and cloning and customization to custom specifications takes just three weeks with Geppetto. Embedded developers can leverage the availability of the Alto35 as a prebuilt product for evaluation, with Geppetto-customized boards later offering greatly enhanced potential for tailored solutions and certain compatibility. Geppetto users may choose to keep customizations private, or share their version of the Alto35 with the Geppetto developer community. The Alto35 is available for $89. Gumstix, Redwood City, CA. [www.gumstix.com].
46
AUGUST 2013 RTC MAGAZINE
Fourth Generation Core i7-Based VPX, VME, cPCI, XMC and COM Express SBCs
Extreme Engineering Solutions has introduced its 3U VPX, 6U VPX, XMC, 3U CompactPCI, 6U CompactPCI, 6U VME and COM Express Single Board Computers (SBCs) based on the 4th generation Intel Core i7-4700EQ processor (formerly codenamed "Haswell"). X-ES’s line up of conduction- and air-cooled products based on the 4th generation Intel Core i7 processor include the XCalibur4500 6U cPCI, the XCalibur4530 6U VME and the XCalibur4540 6U VPX modules, which maximize memory capacity and I/O capabilities and add configurability with two PMC/ XMC sites. X-ES provides the XPedite7570 3U VPX and the XPedite7530 3U cPCI modules, which are ideal for smaller aerospace and vehicle platforms that require maximum processing performance and I/O capabilities with the flexibility of PMC and XMC support. For applications with severe size, weight and power (SWaP) challenges, such as Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs), X-ES also offers the small XPedite7501 XMC and XPedite7550 Rugged COM Express modules. Significant maintenance and diagnostics advantages are achieved through utilizing the remote configuration and management feature, Intel Active Management Technology (AMT), within X-ES’s previous generation Intel Core i7 processor-based products. As a result, X-ES is continuing to offer those advantages to its customers by including Intel AMT 9.0 support for its 4th generation Intel Core i7 processor-based products. Additionally, improvements within the 4th Generation Intel Core i7 processor include increased raw processing performance per watt, hardware-based memory encryption with Intel AES New Instructions (AES-NI), and increased floating-point and integer performance utilizing Intel Advanced Vector Extensions 2.0 (AVX 2.0). The graphics processing unit (GPU) in the 4th generation Intel Core i7 processor has also been enhanced, adding additional execution units and up to a 24% increase in raw performance. The 4th generation Intel Core i7 processor also supports OpenCL 1.2, enabling it to perform as a generalpurpose graphics processing unit (GPGPU). Extreme Engineering Solutions, Middleton, WI. (608) 833-1155. [www.xes-inc.com].
PRODUCTS & TECHNOLOGY
PCI Express Two-Port RS-232 Serial Interface
ARINC-629 PMC Module Targets Obsolescence Market
An air-cooled ARINC-629 PMC range of modules targets the obsolescence market to keep Boeing 777 test and simulation support alive. The Sy629PMC-BPM from Sycos AES features a full transmitter and bus monitor, with RPP and Multiple XPP in embedded SRAM. The card employs a qualified terminal controller (DATAC) and system interface module (SIM) to guarantee transmission integrity at all times. The card supports 32-bit 33/66 MHz PCI on PMC interface to allow users to gain access to ARINC-629 resources through VxWorks RTOS on VME PMC carrier card or PC through Windows 7 DLL. Data for transmission/reception is possible through embedded switchable dualpaged SRAM or through external 16-channel Cycle Buffer (2Tx, 14Rx) SRAM memory. Rx messages can be directed to any of the 14 Rx Cyclic Data Buffers by RPP programming. Embedded FIFO resource provides chronological data logging with 0.5uS time-stamp resolution. Embedded RPP SRAM memory enables filtering of desired messages, but a Receive Monitor can be enabled to capture all messages present on the connected bus. A Transmit Monitor can also be enabled to capture own terminal messages for transmit checking purposes. Connection to the Arinc-629 Stub Cable is through a 4-pin Lemo connector on the front panel. A further two Arinc-629 PMC modules address the fault insertion/detection (Sy629PMC-BPE) and Multiple-TerminalEmulation (Sy629PMC-BPT) requirements for LRU test and flight simulation, giving a complete suite of hardware that can be installed on any VME, PXI, PCI or PCIe PMC carrier card. Sycos AES, Dorset, UK. + (44) 1747-812-486. [www.sycos.co.uk].
A new low-profile two-port RS-232 PCI Express serial interface adapter offers advanced UART architecture. For maximum compatibility with a wide range of serial peripherals, the 7202e from Sealevel Systems supports all RS-232 modem control signals making it an attractive solution for interfacing instrumentation, barcode readers and other data acquisition/control devices. The 7202e’s high-performance 16C950 UART includes 128-byte FIFOs for error-free operation in high-speed serial applications. Additionally, the UART supports 9-bit framing and is fully software compatible with legacy 16550 UART applications. The board includes a 14.7456 MHz oscillator and provides a flexible clock prescalar to support the widest range of standard and non-standard baud rates up to 921.6 Kbits/s. The 7202e is PCI Express X1 compatible and will work in any PCI Express slot. All Sealevel PCI Express serial adapters include SeaCOM software for Windows and Linux operating systems. For easy installation and troubleshooting, customers also receive WinSSD, a full-featured application for testing and diagnostics including bit error rate testing (BERT), throughput monitoring, loopback tests and test pattern message transmissions. Both items ship with a 36-inch cable that terminates to two DB9M connectors. Standard operating temperature range is 0 - +70°C (extended temperature versions operating from -40°C to +85°C are available). Like all Sealevel I/O products, the 7202e is backed by a lifetime warranty. Pricing is $229 for low-profile (Item# 7202e) and full-height (Item# 7202eS) PCI Express slots. Sealevel Systems, Liberty, SC. (864) 843-4343. [www.sealevel.com].
5 Megapixel Full HD MIPI Camera Support for i.MX6 Processor
A 5 Megapixel Full HD MIPI Camera Board is designed for the Freescale i.MX6 family of processors. The e-CAM50IMX6 camera board from e-con Systems is interfaced directly to the CSI-2 MIPI interface on the Freescale SABRE Lite board. e-CAM50IMX6 includes e-CAM57MI5640MOD, a 5 MP autofocus 2-lane MIPI CSI-2 camera module with 70 mm flexible PCB length. The e-CAM50IMX6 board comes with full schematics and Linux driver support with source code. e-con Systems will be announcing the support for Android soon. e-CAM50IMX6 can stream Full HD 1080p@30fps and also supports HD 720p@60fps. The i.MX6 processor has support for dual camera interfaces and has the ability to record video at 1080p and 720p resolutions in various industry standard video encoding formats. The e¬CAM50IMX6 contains camera board, flex cable and an interface board. The Interface board is plugged directly on to the MIPI CSI-2 camera header of Freescale SABRE Lite board and the camera board is connected through a flex cable supplied along with the kit. Customers designing handheld devices, automotive infotainment, industrial and medical HMI, multimedia navigation, industrial automation, and for those in markets such as portable computing, education, industrial, medical and home automation, will benefit from i.MX6’s high speed and performance. Customers interested in other image sensors/camera modules can approach e-con Systems for interfacing those cameras with i.MX6 and e-con Systems can develop a customized hardware/software solution for them. e-CAM57_MI5640_MOD is based on Omnivision’s OV5640 image sensor and comes with a 70 mm flex cable that can be customized to any form or shape. This provides flexibility in enclosure design as the camera module can be placed as far as 100 mm from the processor interface, and solves the complex mechanical design requirements of a typical consumer device. e-Con Systems, St. Louis, MO. (636) 898-8788. [e-consystems.com].
RTC MAGAZINE AUGUST 2013
47
PRODUCTS & TECHNOLOGY
Secure 3U VPX Ethernet Switches and Routers up to 10 Gigabit
Two new 3U VPX switches are secure, nonblocking, high-performance, Gigabit and 10 Gigabit Ethernet networking solutions that provide versatile management and routing options. The XChange3013 and XChange3018 from Extreme Engineering Solutions have features ranging from managed or unmanaged Layer 2 switching to fully featured IP routers with Cisco IOS and Cisco Mobile Ready Net technology. The XChange3018 is a VPX Ethernet switch that supports 10 Gigabit Ethernet using the 10GBASE-T protocol, eliminating the need for 10 Gigabit optical transceivers, which are large, expensive and not ideal for use in extremely low temperatures or rugged environments. The XChange3013 and XChange3018 both deliver full wire speed across all of their ports and jumbo packets up to 13 Kbytes. They also support IPv6, Quality of Service (QoS), Energy Efficient Ethernet (EEE), and a comprehensive set of IETF RFCs and IEEE protocols. The XChange3013 and XChange3018 include an XMC site that is compatible with the XPedite5205, which implements Cisco IOS IP Routing capabilities and provides a complete Embedded Services Router (ESR) solution. The XChange3013 supports up to twenty Gigabit Ethernet ports, including fourteen 10/100/1000BASE-T ports and six 1000BASE-X SerDes ports. The XChange3018 supports up to four 10 Gigabit Ethernet ports, which are configurable as either 10GBASE-T or XAUI, as well as twelve Gigabit Ethernet ports divided into six 10/100/1000BASE-T ports and six 1000BASE-X SerDes ports. The XChange3013 and XChange3018 can be managed through SNMP, as well as with an industry-standard CLI via their serial ports, telnet, or SSH. When configured as a fully managed Layer 2 switch, support for features such as fast boot, flow control, MAC bridging (IEEE 802.1D), port mirroring, port authentication (IEEE 802.1x), VLANs (IEEE 802.1Q), GVRP, MVRP, port and protocol classification (IEEE 802.1v), GARP, MRP, GMRP, MMRP, LACP, RMON, STP, RSTP, MSTP, RPVST+, AgentX, and IGMP are included. When configured as a Layer 3 router, support for Multicast and Unicast Routing features such as DVMRP, IGMP, PIM-DM, PIM-SM, PIM-SSM, MLD, RIP, BGP, OSPFv2/OSPFv3, and VRRP are added. Extreme Engineering Solutions, Middleton, WI. (608) 833-1155. [www.xes-inc.com].
Freescale PowerQUICC III Processor-Based ARINC-429 PrPMC Module
An air-cooled PrPMC module based upon Freescale Semiconductor’s PowerQUICC III e500v2 core processor is integrated with 32 channels of ARINC-429 to provide a completely autonomous avionics interface to support flight simulator and avionics test applications. The Sy429PrPMC-RT32E from Sycos AES features the Freescale MPC8548E PowerPC e500v2 core processor running at 800MHz, supported by 2 Gbyte user installable SODIMM DDR2 SDRAM and 64 Mbyte of user flash for boot and program memory. The card supports 32-bit 33/66 MHz PCI on PMC interface. ARINC-429 resources are accessible by application programming on the PowerPC or from the host processor through the PCI interface, making the card user friendly to both Windows and real-time computing environments. ARINC-429 data gathered through the PowerPC application can be displayed simultaneously in a Windows application on the PC. Two Gigabit Ethernet ports and two RS-232 interfaces connect to the P14 for access via PIM for VME or Sycos proprietary PCI, PrPMC carrier card. ARINC I/O is through a front panel SCSI connector. The card has battery back-up Real Time Clock and external clock and sync inputs through P14 connector. The card is supported by embedded VxWorks v6.8 RTOS, which boots at power-up, and Windows 7 DLL for PC applications. Sycos AES, Dorset, UK. + (44) 1747-812-486. [www.sycos.co.uk].
48
AUGUST 2013 RTC MAGAZINE
Four-Channel 24-Bit USB-2405 Dynamic Signal Acquisition Module
A new four-channel USB 2.0 dynamic signal acquisition module has a built-in IEPE excitation current source that provides 2 mA on each AI channel. BNC connectors enable the USB-2405 from Adlink Technology to provide high accuracy and excellent dynamic performance for microphone and accelerometer measurement in vibration and acoustic applications. Featuring superior accuracy with low temperature drift, built-in anti-aliasing filters, support for flexible trigger mode, and USB bus power, the USB-2405 embodies an attractive portable solution for time-frequency analysis and research. The USB-2405 supports four analog input channels simultaneously sampling up to 128 kS/s, and delivers 100 dB dynamic range and -94 dB THD of dynamic performance. Built-in anti-aliasing filters enable filter cutoff frequency to be automatically adjusted to the sampling rate, suppressing out-of-band noise and avoiding measurement distortion. It also supports auto-calibration for ensured accuracy, and minimizes temperature drift in the field with maximum gain drift of 11 ppm/°C, an impressive 50% of the maximum available market performance. The USB-2405’s low DC measurement drift, along with temperature deviation, optimizes accuracy in spite of the environment. The USB-2405 provides lockable USB to enhance connectivity, and the included multifunctional stand fully supports desktop, rail, or wall mounting. Adlink’s easy-to-use U-Test software is included at no extra charge. With no programming required, the USB-2405 delivers fast, easy instrument setup and quality data acquisition. The USB-2405 supports Windows 8, Windows 7, Windows XP operating systems, and is fully compatible with third-party software such as LabVIEW, MATLAB and Visual Studio.NET. ADLINK Technology, San Jose, CA. (408) 3600200. [www.adlinktech.com].
PRODUCTS & TECHNOLOGY
Lua Application 1
Lua Application 2
Lua Application 3
Mako Server Executable Lua Server Pages (Lua Bindings) Mako Server Startup Code
6U VPX Module with Freescale QorIQ T4240 or T4160
A 6U OpenVPX module features the Freescale QorIQ T4240 or T4160 communications processor. Freescale’s Power Architecture e6500-based T4240 and T4160 processors combine multiple 1.8 GHz dual-threaded cores, large caches and high-performance networking capabilities with the next-generation AltiVec singleinstruction multiple-data (SIMD) engine to provide high-performance processing for both control and data plane tasks from a single system on a chip (SoC). Freescale’s significantly improved next-generation AltiVec engine delivers DSP-level floatingpoint performance with 172 GFLOPS of vector processing capability, more than seven times the performance of its previous generation, while maintaining compatibility with an already extensive inventory of AltiVec software libraries. The XCalibur1840 from Extreme Engineering Solutions can be built to support either the T4240 or T4160 processor. When configured for the T4240, it maximizes total processing performance with twelve dual-threaded physical cores or up to twenty-four virtual cores. The T4160 provides a lower-power solution with eight dual-threaded physical cores or up to sixteen virtual cores, ideal for platforms with more stringent size, weight and power (SWaP) restrictions. The XCalibur1840 provides a number of high-performance I/O interfaces, including sixteen lanes of Gen 2 PCI Express, four Gigabit Ethernet Interfaces, and front panel or backplane 10 Gigabit Ethernet ports. The board supports two XMC or PMC modules for additional I/O and processing flexibility, and it includes up to 24 Gbytes of DDR3 ECC SDRAM across three channels. The XCalibur1840 is available in either conduction-cooled or air-cooled versions and supports 0.8” pitch or 1.0” pitch 2-level maintenance (2LM) configurations. Extreme Engineering Solutions, Middleton, WI. (608) 833-1155. [www.xes-inc.com].
SQLite (database)
Web Server Delivers Rapid Development of Dynamic Web Applications
A compact web server is targeted for rapid design of server-side web applications. Using the Lua scripting language, the Mako Server from Real Time Logic offers fast, efficient development of web applications, ranging from database-driven business applications to customized applications managing microcontroller-based devices. Mako Server offers extremely fast dynamic content generation, outstripping other web technologies by as much as 60 percent. Mako Server removes the complexity of deploying a web application. To develop web applications, developers must typically integrate and configure many components such as Apache, PHP and an SQL database. In contrast, Mako Server brings all of these components together, bundling them into one unit so that the application developer can immediately focus on application development for Windows, Mac OS and Linux or on devices such as Raspberry Pi. Facilitating easy application deployment, developers can bundle their application into a single zip file so that users can download and run the application just as they would a Windows-based application. Mako Server’s comprehensive library, based on Real Time Logic’s flagship product Barracuda Application Server, integrates and streamlines all resources needed for a very fast web-server back end. In addition to the Lua Server Pages and virtual machine, the Barracuda Application Server library includes REST, AJAX, SOAP, JSON and XML services; an SMTP client; both client and server SSL/TLS security; client HTTPS; the WebDAV file exchange protocol, and a process management API. An SQL database is also integrated into Mako Server. All components have been optimized for robust, high-speed information exchange, even on underpowered computers. The Lua scripting language is easy to learn, typically requiring only an afternoon for developers who have used the ASP or PHP server scripting technologies. Developers with C/C++ experience can extend the server with new functionality by loading C libraries at runtime. Mako Server is available as binary executable for Windows, Mac and Linux platforms and other POSIX-based systems. Hobbyists and educational institutions developing applications exclusively for non-commercial use can acquire Mako Server for free. Single-user business licenses are available for $85 with deeply discounted multiuser licenses available. Companies requiring source code or other development platforms are advised to purchase Real Time Logic’s flagship product, Barracuda Application Server. Barracuda Application Server Library
Real Time Logic, Monarch Beach, CA. (949) 388-1314. [www.realtimelogic.com].
Atom D2550/N2600 Small Form Factor Fanless PC
A fanless, small form factor, industrial computer offers the OEM a choice of Intel Atom D2550 or N2600 processors. Both processors are high performance but low-power consuming at 10W and 3.5W, respectively. The D2550 is a 1.86 GHz, dual-core design; the N2600 delivers 1.6 GHz and is available in dual- and quad-core packages. PL-80510 from Win Enterprises provides a small footprint system in a rugged aluminum chassis with integrated heat-sink design. Additional features include up to 2 Gbytes or 4 Gbytes DDR3 and HDMI/VGA dual display. The module supports 1x 2.5” HDD and 1x half-size mSATA SSD. The rich I/O supports a number of application areas, including digital signage, building automation, Security/Network Video Recorders, point-of-sale, kiosk, industrial control, cart-based medical and others. System I/O includes HDMI and VGA dual-display capabilities and one RS-232/422/485. Expansion is enabled through 2x Mini-PCIe sockets, 1x Mini-PCIe with USB and 1x Half-size Mini-PCIe with USB. The unit provides up to 4 Gbyte DDR3 SDRAM and supports 1x 2.5” HDD and 1x half-size SATA. WIN Enterprises, North Andover, MA. (978) 688-2000. [www.win-ent.com].
RTC MAGAZINE AUGUST 2013
49
with an Application Engineer, or jump to a company's technical page, the goal of Get Connected is to put you in touch with the right resource. Whichever level of service you require for whatever type of technology, Get Connected will help you connect with the companies and products you are searching for.
www.rtcmagazine.com/getconnected
Advertiser Index Get Connected with technology and companies providing solutions now Get Connected is a new resource for further exploration into products, technologies and companies. Whether your goal is to research the latest datasheet from a company, speak directly with an Application Engineer, or jump to a company's technical page, the goal of Get Connected is to put you in touch with the right resource. Whichever level of service you require for whatever type of technology, Get Connected will help you connect with the companies and products you are searching for.
www.rtcmagazine.com/getconnected
Company Page Website
Company Page Website
Advanced Micro Devices, Inc............................................................................................. 52................................................................................................ www.amd.com/embedded ARM, Ltd.......................................................................................................................... 25.................................................................................................................. www.arm.com congatec, Inc.................................................................................................................... 19............................................................................................................. www.congatec.us End of Article Products Dolphin Interconnect Solutions........................................................................................... 51......................................................................................................... www.dolphinics.com Embedded World 2014 Exhibition & Conference................................................................. 15................................................................................................. www.embedded-world.de
Get Connected with companies and Extreme Engineering Solutions, Inc.................................................................................... 11............................................................................................................. www.xes-inc.com Get Connected products featured in this section. with companies mentioned in this article. iBase................................................................................................................................. 7.......................................................................................................... www.ibase-usa.com www.rtcmagazine.com/getconnected www.rtcmagazine.com/getconnected Intel Intelligent Solutions Finder.......................................................................................... 21.....................................................................................intel.intelligentsystemssource.com Intelligent Systems Source................................................................................................. 35................................................................................... www.intelligentsystemssource.com
Get Connected with companies mentioned in this www.lauterbach.com article. Lauterbach........................................................................................................................ 38........................................................................................................ www.rtcmagazine.com/getconnected MSC Embedded, Inc........................................................................................................... 4...................................................................................................www.mscembedded.com www.rtcmagazine.com/getconnected
Get Connected with companies and products featured in this section.
One Stop Systems, Inc....................................................................................................... 5.................................................................................................www.onestopsystems.com Phoenix International......................................................................................................... 19........................................................................................................... www.phenxint.com Real-Time & Embedded Computing Conference.................................................................. 39................................................................................................................ www.rtecc.com Schroff............................................................................................................................... 4..................................................................................................................www.schroff.us Super Micro Computer, Inc................................................................................................. 2........................................................................................................ www.supermicro.com Themis Computer.............................................................................................................. 34.............................................................................................................. www.themis.com USC Module & Data Acquisition Showcase......................................................................... 43........................................................................................................................................ WinSystems, Inc................................................................................................................ 17.......................................................................................................www.winsystems.com
RTC (Issn#1092-1524) magazine is published monthly at 905 Calle Amanecer, Ste. 250, San Clemente, CA 92673. Periodical postage paid at San Clemente and at additional mailing offices. POSTMASTER: Send address changes to RTC, 905 Calle Amanecer, Ste. 250, San Clemente, CA 92673.
50
AUGUST 2013 RTC MAGAZINE