A - LOOP
AMP system: 2-cores ARM Cortex A9/Linux OS and 4-cores Leon3/Linux OS, OpenMP library and Hardware Profiling system G. Valente, V. Muttillo, A. Bufalino, M. Santic, L. Pomante, M. Faccio, F. Federici
OVERVIEW A-LOOP IS A SYSTEM ON MODULE (SoM) PROTOTYPE FOR AEROSPACE APPLICATIONS, DEVELOPED STARTING FROM ZYNQ7000, THAT FOCUSES ON THE INTERACTIONS BETWEEN 2 MODULES ("ISLES OF COMPUTATIONAL ELEMENTS") AND ON THE MONITORING ACTION WITHOUT OVERHEAD INSERTION MOTIVATIONS
PROPOSED PLATFORM
1) Embedded systems development is driven by basic functional specifications, enriched with a
Proposed platform represents a SoM with 2 modules that share a memory region on external memory:
set of non-functional requirements (performances, power dissipation, etc.). 2) One of the techniques that can be exploited is to develop Isles of computational elements (Mo-
-> ISLE #1: a dual-core ARM Cortex A9 with SMP Linux OS, able to interface with external world,
dules) with different characteristics, each one able to satisfy some non-functional specifica-
provides data to Isle #2 and collects results from it. It is also able to monitor performances
tions, in order to realize smart System On Modules (SoM).
of Isle#2, without introducing software overhead, by means of a hardware profiling system.
3) SoC with FPGA can be viewed as platforms useful to prototype these kind of architectures.
-> ISLE #2: a quad-core Leon3 with SMP Linux OS, able to execute parallel applications based on OpenMP library. In particular, in this demo, it executes a MANET localization algorithm.
SYSTEM DESCRIPTION THE CONCEPT ISLE Custom FFT SHARED MEMORY
ISLE INPUT MANAGEMENT
To host
MONITORING OPERATIONS DYNAMIC ADAPTION OF THE SYSTEM
ISLE Localization algorithm
ARM
SNIFFER BLOCK DIAGRAM
IEEE-754 FPU Co-Processor HW MUL/DIV Local IRAM ITLB
LEON3 7 - Stage Integer Pipeline I-Cache D-Cache SRMMU AHB I/F
Memory Controller
LEON3
Local DRAM DTLB
AHB - Adapter Event Monitor APB Bus
APB Interface
S2
LEON3
Decode Section
AMBA AHB Master (32-bit)
Counter Time Monitor
OPENMP Libraries required to execute shared memoryparallel applications, developed with OpenMP C/C++, have been cross-compiled and added to the adopted Leon3 Linux distribution. Non-parallel region: Master thread only
LEON3
ID:0
AMBA AHB S1
AHB Bus
Trace Buffer Debug port Interrupt port
Isle #1 runs a customized SMP Linux distribution provided by Xilinx. Isle #2 runs a customized SMP Linux distribution provided by Gaisler.
LEON3
PHY
A distributed hardware profiling system has been developed for runtime analysis. It is composed of distributed AHB bus monitoring elements (sniffers) that monitor AHB bus, initialized by means of an AXI bus. A global monitor unit, represented by Isle #1, provides sniffers initialization and collects results.
LINUX
ISLE #1 ARM
The LEON3 processor is designed for Embedded applications, combining high performance with low complexity and low power consumption. The LEON3 processor is highly configurable.
ISLE Accelerator
ISLE #2
SRAM
HW PROFILING SYSTEM
3-Port Register File
PROPOSED ARCHITECTURE
Memory Controller AXI Controller Ethernet MAC
LEON3
Parallel region starts: #pragma omp parallel
S3
AHB/APB Bridge
UART
ID:0
ID:1
ID:2 ID:0
UART - USB
Program reverts to single threaded execution
ID:3
Parallel region ends: program waits for all threads to terminate
fork Parallel region: Several thread execute simultaneously join
SYSTEM BEHAVIOUR Proposed profiling technique, used to monitor computational behaviour of the A-LOOP platform, follows the approach of runtime bus sampling. Event monitor: strobe generation (ld_ac_event) during access on specified address range (delimited by sig_out_inf and sig_out_sup). Time monitor: counter activated by read operation (during_read) and stopped by write operation (during_write), both on specified address (0x808).
Main Contacts: giacomo.valente@graduate.univaq.it, vittoriano.muttillo@graduate.univaq.it, andrea.bufalino@student.univaq.it, marco.santic@univaq.it, luigi.pomante@univaq.it, marco.faccio@univaq.it, fabio.federici@univaq.it,
UNIVERSITA’ degli STUDI dell’AQUILA - CENTER of EXCELLENCE DEWS (ITALY)
Perfomance evaluation of the platform by means of Pi calculation algorithm, proposed in four different versions: serial computation, single process multiple data (SPMD) technique with false sharing, SPMD technique without false sharing and OMP reduction function. LEGEND: 1 Thread 2 Threads 3 Threads 4 Threads
http://dews.univaq.it