UltrasoundToGo Nano-Tera annual meeting 2015 Stefanos Skalistis RiSD, EPFL
Consortium
UltrasoundToGo partners: EPFL:
Integrated Systems Laboratory (IIS) Rigorous System Design Laboratory (RiSD) Signal Processing Laboratory (LTS5)
ETHZ:
Integrated Systems Laboratory (LSI) Computer Engineering and Networks Laboratory (TIK)
CHUV:
Service de radiodiagnostic et radiologie interventionnelle
Ultrasound Imaging Benefits Ultrasound is a widespread diagnostic imaging technique Compared to X-Ray/CT/MRI: non-invasive, harmless, cheaper, non-oppressive Multiple medical disciplines Can study static and dynamic body structures
…And Open Challenges Image quality is generally poor, hard to interpret Diagnosis heavily dependent on sonographer’s ability to choose the right angles of view → 3D Ultrasound Imaging Much improved image quality for some subjects, new diagnostic capabilities However, bulky and expensive
Ultrasound Devices Portable machines: limited to 2D imaging GE Vivid, GE Vscan, Mobisante Mobius, …
Hospital 3D devices: stationary, > 100k$, hundreds of W Siemens Acuson, Philips Epiq, Samsung WS80…
High-end research prototypes also expensive SARUS (Jensen ‘13): 320 FPGAs
Cephasonics Griffin: 1 M$ and up
Objective and Challenges Medical objective:
Portable in-the-field device comparable to static systems Telemedicine: untrained GP acquires the 3D image and sonographer remotely evaluates
Engineering challenges:
3D imaging has a very high computation cost Exploit parallel architectures for energy efficiency and lower cost Certification: provably-correct mapping of software onto hardware
Platform in Development Imaging Algorithms
LTS5
Software Techniques
RISD, TIK
Hardware Platform
RF preprocessing
Beam forming
LSI, IIS
RF demod
Baseband processing
Prototyping Environment HW development follows extensive numerical studies Matlab platform
Prototyping of algorithms and imaging modes Study of image quality on artificial numerical “phantoms” Co-development with FPGA offload Generation of input and output patterns for HW testing
Realized a powerful, multi-mode, configurable testing environment for 2D and 3D imaging
Ibrahim et al., MCPS 15
FPGA Mapping of Beamforming Fully-digital 3D beamforming is extremely challenging:
Compute 238MIP/s from 160GB/s input data, while requiring 2.38TD/s
Inner core of imaging algorithm: compute the delay values for echo summation FPGA-friendly architecture, low cost and good accuracy Demonstrated single-FPGA capability of block for 3D US 3.3 Tera delays/s, 200 MHz on Virtex 7
18 kb BRAM
+
+
+
Control
+
+
16 corrections
+ + + + +
...
+
+
+ + + + +
20cm
.. .
+
+ ...
2cm
+
. . .
8 corrections
AXI Bus
...
Ibrahim et al., DATE 15
Study of a 28nm Beamformer ASIC Is a single-chip implementation without external memory feasible? Designed single-chip 3D beamformer in 28nm CMOS Implementation Focus One RX Channel TDC
IQ
AFE
TDC
AFE
TDC
AFE
c
IQ
Parallel
IQ
Bandpass Beamformer IQ
IQ TDC
AFE
GC compute GC control
BFC Cluster
Image Post processing
Scalable parallel architecture with optimized buffering Processes data at the information rate by exploiting bandpass properties. Fully on-chip delay computation from the underlying geometry to avoid off chip delay storage
Chip Analysis Results Implemented 100 -channel 2D/3D beamformer Area: 1.68mm2 Power: 303 mW
Cost extrapolation to 10k channels: Area: 1.68cm2 Power: 30.3 W
Single-chip fully-digital 3D beamformer is feasible
Scalable Software-configurable Increased integration density (portable)
Point Spread Function (golden reference vs HW)
Hager et al., BioCAS 2014
Novel Reconstruction Techniques Compressive sensing (CS) is a successful framework to accurately reconstruct signals from limited data
Exploit a low dimensional model of the ultrasound images in the reconstruction process
Proposed method based on the following scheme: Plane Wave Emission
Receive and process echoes
Apply CS based algorithm Carrillo et al., IUS 2015
Apply CS-based techniques to increase quality of ultrasound images while keeping high frame rate
Contrast Improvements
Carrillo et al., IUS 2015
CNR comparison between CS-based method and state-ofthe-art methods, using 83 insonifications With only 3 insonifications, CS yields better contrast The price is a higher reconstruction computational cost
Preliminary Test on Real Images Real carotid acquired using Ultrasonix Sonix MDP device Probe: Sonix L14-5W
128 transducers Emission at 12 Mhz
Acquisition of one plane wave and comparison with a state-of-the art plane wave imaging method (Bernard et al., 2014)
Deploying US Applications Kalray MPPA-256 many-core architecture: 16 clusters of 16 cores each: 256 cores (400 MHz) Inter-cluster communication: 2D-torus NoC 2MB shared memory per cluster
Trusted Ultrasound platform: Real-time scheduling respecting time constraints Operating within power constraints Optimal use of resources:
Right degree of application-level parallelism
Balanced load and minimization of communication
A Hybrid Approach Offline: Guarantees based on worst-case execution time (WCET) Online: Run-time optimizations based on actual execution times (AET) Application Partitioning
Mapping, Scheduling, Buffer Allocation
Unified System Model
Architecture Model
Application Placement
Power Management Task C
b(eAC)
Cluster 1
Cluster 2
ω(eAC) Task A
ω(eAB)
b(eAB)
IA
0
TA
0
b(eBA)
ω(eAB)
Task B
IB 0
TB FA
0
Real-Time Scheduling
SMT
Deterministic Memory Sharing US application as a Kahn Process Network (KPN) Transformation to Deterministic Memory Sharing KPN Performance gain through:
Concurrency-safe memory sharing between processes In-place modifications Multiple readers Memory recycling
Tretter et al., ESTIMedia 2014
Performance Results Platform: Intel Xeon Phi 5110P
200
60 cores @ 1053 MHz Performs best with 120…240 threads
Tested different buffer sizes
Showing best results and performance/memory tradeoffs Tretter et al., ESTIMedia 2014
Framerate
~ 200 threads 3 implementations of data transfer: Classic, Windowed FIFOS, DMS
100 50 0
Channel memory [MB]
Ultrasound KPN:
150
10 3
10 2
10 1
10 0
Classic Classic Tradeoff
DMS DMS Tradeoff
Windowed
Thanks for your attention!
ďƒ˜ Visit us at www.nano-tera.ch
www.nano-tera.ch
Exploration Phase: Different Transmit Focus
96-element linear array (plane wave)
32-element phased array (converging wave)
32-element phased array (diverging wave)
Exploration Phase: Parameterize the Axial Resolution Time complexity Fast
Slow
Full resolution (11688 points) ~ an hour
500 points
50 points
~ 15 mins
~ 13 mins
Exploration Phase: Zone Imaging Zone sonography divides the ROI into zones Each zone is insonified and reconstructed separately Then they are combined to form the whole ROI
Zone imaging insonifications (5 zones)
Brightness Compensation profile
Exploration Phase: Zone Imaging
Single Converging focus without brightness compensation
Single Converging focus with brightness compensation
Zone imaging of five zones (converging focus) with brightness compensation
Zone imaging of ten zones (converging focus) with brightness compensation
Zone imaging of two zones (converging focus) with brightness compensation
Zone imaging of 50 zones (converging focus) with brightness compensation
Our proposed approach for Delay Calculation: Delay «Steering» The delay calculation is based on: Pre-calculate a delay table for points R along axis of azimuth = 0 and elevation =0 Drive delays for any S for the R at equal distance from O, then “steer” the delay
D4 D3 D2 D1 O D16 D13 R S
Delay «Steering»
1st order Taylor expansion • Critical calculation: 2 ADDs
Precalculated
Delay «Steering» Accuracy • • • • 2cm
Expansion error bounded by the Lagrange bound Excellent in far field and close to the imaging axis Inaccurate close to the probe, at broad angles Luckily, the worst geometric offenders are discarded due to limited element directivity Max delay error = 209 samples
Max delay error = 98 samples 20cm
Fully – Digital vs Partial – Analog Analog Pre-Beamforming: Analog Pre Beamforming
Parallel Beamformer
120 ASICs in the transducer head with active cooling:
Pictures: Whitepaper 2008: 4Z1c Real-Time Volume Imaging Transducer - Siemens Medical Solutions USA, Inc.
CNR computation Phantom:
1cmx1cm Non-echogeneic (black) occlusion placed at 30 mm depth in a medium with high density of scatterers
2 cm
1 cm
2 cm
CNR computation:
occlusion
B is the occlusion, I is the amplitude of the pixels
Contrast improvement
Traditional beamforming method (DAS with 83 focused beams)
Compressed sensing based method with 3 plane waves
Beamforming techniques
Number of ultrasound beams
Contrast (dB)
Traditional method
83
10,17
CS based method
3
10,65
Variations
- 97%
+ 5%