4 - LOOP LEON3
4-CORE LEON3 WITH LINUX OPERATING SYSTEM, OPENMP LIBRARY AND HARDWARE PROFILING SYSTEM G. Valente, V. Muttillo, L. Pomante, M. Faccio, F. Federici, A. Moro
OVERVIEW 4-LOOP IS A PLATFORM DEVELOPED TO OFFER ADVANTAGE OF PARALLEL EXECUTION, WHILE MONITORING RUNTIME SYSTEM BEHAVIOUR WITHOUT SOFTWARE OVERHEAD MOTIVATIONS
PROPOSED PLATFORM
1) The opportunity to build multi-processor systems exploiting soft-cores is increasing the
Proposed platform is composed of a working symmetric multi-processor
range of applications that can be implemented on FPGAs.
systems (SMP) based on four LEON3 cores, enhanced by adding a custom hardware
2) In order to maximize performance a parallel programming model should be used: OpenMP API
profiling system with no software overhead introduction. A SMP LINUX kernel
is a specification for a set of compiler directives, library routines, and environment variables
targeting the proposed system and including the device drivers needed to collect data
that can be used to specify high-level parallelism.
from the custom hardware profilers has been also built. The system has
3) Runtime analysis on SoC is useful to optimize reconfigurable systems. However, software
been further customized to support the execution of OpenMP-based applications.
profiling systems impose software overhead to application execution.
SYSTEM DESCRIPTION THE PLATFORM
LINUX + OPENMP
LEON3
HW PROFILING SYSTEM
The LEON3 processor is designed for Embedded applications, combining high performance with low complexity and low power consumption. The LEON3 processor is highly configurable.
A distributed hardware profiling system has been developed for runtime analysis. It is composed of distributed AHB bus monitoring elements (sniffers) that monitor AHB bus, initialized by means of APB bus. A global monitor unit, represented by one LEON3 processor, provides sniffers initialization and collects results. SNIFFER BLOCK DIAGRAM
3-Port Register File LEON3 LEON3
LEON3
S2
IEEE-754 FPU Co-Processor HW MUL/DIV
S3
S1
LEON3
Local IRAM ITLB
LEON3 7 - Stage Integer Pipeline I-Cache D-Cache SRMMU AHB I/F
AHB Bus
Trace Buffer Debug port Interrupt port Local DRAM DTLB
AHB - Adapter Event Monitor APB Bus
APB Interface
Decode Section
AMBA AHB Master (32-bit)
LINUX HARDWARE ARCHITECTURE
JTAG
LEON3
UART
S1
LEON3
S2
LEON3
Time Monitor
OPENMP Libraries required to implement parallel applications using shared memory, developed with OpenMP C/C++, have been cross-compiled and added to the adopted Linux distribution. Non-parallel region: Master thread only
Ethernet MAC
ID:0
AMBA AHB
AHB Controller UART - USB
PHY
JTAG Dbg Link
LEON3
A Linux distribution, customized to work with multicore platform in SMP mode, has been developed using Buildroot tool, starting from LEON LINUX kernel (provided by Gaisler research).
Counter
Parallel region starts: #pragma omp parallel
S3
AHB/APB Bridge
Memory Controller
SRAM
ID:0
ID:1
ID:2 ID:0
AMBA APB
Program reverts to single threaded execution
ID:3
Parallel region ends: program waits for all threads to terminate
fork Parallel region: Several thread execute simultaneously join
SYSTEM BEHAVIOUR Proposed profiling technique, used to monitor computational behaviour of the 4-LOOP platform, follows the approach of runtime bus sampling. Event monitor: strobe generation (ld_ac_event) during access on specified address range (delimited by sig_out_inf and sig_out_sup). Time monitor: counter activated by read operation (during_read) and stopped by write operation (during_write), both on specified address (0x808).
Main Contacts: giacomo.valente@graduate.univaq.it, luigi.pomante@univaq.it, marco.faccio@univaq.it, fabio.federici@univaq.it
UNIVERSITA’ degli STUDI dell’AQUILA - CENTER of EXCELLENCE DEWS (ITALY) Graphic Designed By: Tania Valentina Ferro
Perfomance evaluation of the platform by means of Pi calculation algorithm, proposed in four different versions: serial computation, single process multiple data (SPMD) technique with false sharing, SPMD technique without false sharing and OMP reduction function. LEGEND: 1 Thread 2 Threads 3 Threads 4 Threads
http://dews.univaq.it