UltrasoundToGo Nano-Tera annual meeting 2015 Stefanos Skalistis RiSD, EPFL
1 Â
Consortium
UltrasoundToGo partners: § EPFL:
§ Integrated Systems Laboratory (LSI) § Rigorous System Design Laboratory (RiSD) § Signal Processing Laboratory (LTS5)
§ ETHZ:
§ Integrated Systems Laboratory (IIS) § Computer Engineering and Networks Laboratory (TIK)
§ CHUV:
§ Service de radiodiagnosMc et radiologie intervenMonnelle
2
Ultrasound Imaging Benefits § Ultrasound is a widespread diagnosMc imaging technique § Compared to X-‐Ray/CT/MRI: non-‐invasive, harmless, cheaper, non-‐oppressive § MulMple medical disciplines § Can study staMc and dynamic body structures 3
…And Open Challenges § Image quality is generally poor, hard to interpret § Diagnosis heavily dependent on sonographer’s ability to choose the right angles of view → 3D Ultrasound Imaging § Much improved image quality for some subjects, new diagnostic capabilities § However, bulky and expensive
4
Ultrasound Devices § Portable machines: limited to 2D imaging GE Vivid, GE Vscan, Mobisante Mobius, …
§ Hospital 3D devices: staMonary, > 100k$, hundreds of W Siemens Acuson, Philips Epiq, Samsung WS80…
§ High-‐end research prototypes also expensive SARUS (Jensen ‘13): 320 FPGAs
Cephasonics Griffin: 1 M$ and up 5
Objective and Challenges Medical objecMve: § Portable in-‐the-‐field device comparable to staMc systems § Telemedicine: untrained GP acquires the 3D image and sonographer remotely evaluates
Engineering challenges: § 3D imaging has a very high computaMon cost § Exploit parallel architectures for energy efficiency and lower cost § CerMficaMon: provably-‐correct mapping of sofware onto hardware 6
Platform in Development
7 Â
Platform in Development Imaging Algorithms
LTS5
Sofware Techniques
RISD, TIK
Hardware Plahorm
RF pre-‐ processing
Beam forming
LSI, IIS
RF demod
Baseband processing
8
Prototyping Environment § HW development follows extensive numerical studies § Matlab environment
§ Prototyping of algorithms and imaging modes § Study of image quality on arMficial numerical “phantoms” § GeneraMon of input and output pamerns for HW tesMng
§ MulM-‐mode, configurable toolchain for 2D/3D imaging
Ibrahim et al., MCPS 15
9
FPGA Mapping of Beamforming §  Inner  core  of  3D  imaging  algorithm:  compute  238  MP/s  from  160GB/s  input  data,  while  requiring  2.38TD/s  §  FPGA-Ââ€?friendly  architecture,  low  cost  and  good  accuracy  §  Demonstrated  single-Ââ€?FPGA  capability  of  block  for  3D  US  3.3  Tera  delays/s,  200  MHz  on  Virtex  7 Â
AXI Â Bus Â
​đ?‘Ľâ†“đ??ˇâ &#x;
8   correcMons   Â
18  kb  BRAM Â
+ Â
+ Â
+ Â
Control Â
+ Â
+ Â
+ Â
+ Â
+ Â
​đ?‘Śâ†“đ??ˇâ &#x;
+ Â + Â + Â + Â + Â
...
+ Â + Â + Â + Â + Â
Â
20cm Â
.. . Â
... Â
16   correcMons  Â
+ Â ...
Â
2cm Â
+ Â ...
Â
Ibrahim  et  al.,  DATE  15 Â
10 Â
Study of a 28nm Beamformer ASIC § Is a single-‐chip implementaMon without external memory feasible? § Designed single-‐chip 3D beamformer in 28nm CMOS § Scalable parallel architecture with opMmized buffering
Implementation Focus One RX Channel TDC
AFE
TDC
AFE
TDC
IQ IQ
Parallel
IQ
Bandpass
AFE
Beamformer IQ IQ
TDC
AFE
GC compute GC control
BFC Cluster
Image Post processing
§ Processes data at the informaMon rate by exploiMng bandpass properMes. § Fully on-‐chip delay computaMon from the underlying geometry to avoid off chip delay storage 11
Chip Analysis Results § Implemented 100 -‐channel 2D/3D beamformer § Area: 1.68mm2 § Power: 303 mW
§ Cost extrapolaMon to 10k channels: § Area: 1.68cm2 § Power: 30.3 W
ü Single-‐chip fully-‐digital 3D beamformer is feasible
§ Scalable § Sofware-‐configurable § Increased integraMon density (portable)
Point Spread FuncOon (golden reference vs HW)
Hager et al., BioCAS 2014 12
Novel Reconstruction Techniques § Compressive sensing (CS) is a successful framework to accurately reconstruct signals from limited data
§ Exploit a low dimensional model of the ultrasound images in the reconstrucMon process
§ Proposed method based on the following scheme:
Plane Wave Emission
Receive and process echoes
Apply CS based algorithm Carrillo et al., IUS 2015
§ Apply CS-‐based techniques to increase quality of ultrasound images while keeping high frame rate 13
Contrast Improvements
Carrillo et al., IUS 2015
§ CNR comparison between CS-‐based method and state-‐of-‐ the-‐art methods § With only 3 insonificaMons, CS yields bemer contrast than focused DAS with 83 insonificaMons § The price is a higher reconstrucMon computaMonal cost 14
Preliminary Test on Real Images § Real caroMd acquired using Ultrasonix Sonix MDP device
§ AcquisiMon of one plane wave and comparison with a state-‐of-‐the art plane wave imaging method (Bernard et al., 2014)
Bernard method
CS-‐based method
15
Deploying US Applications Kalray MPPA-‐256 many-‐core architecture:
§ 16 clusters of 16 cores each: 256 cores (400 MHz) § Inter-‐cluster communicaMon: 2D-‐torus NoC § 2MB shared memory per cluster
Trusted Ultrasound plahorm:
§ Real-‐Mme scheduling respecMng Mme constraints § OperaMng within power constraints § OpMmal use of resources: § Right degree of applicaMon-‐level parallelism § Balanced load and minimizaMon of communicaMon
Image credit: Marcio Castro et al.
16
A Offline/Online Approach Offline: Guarantees based on worst-‐case execuMon Mme Online: OpMmizaMons based on actual execuMon Mmes
17
Deterministic Memory Sharing § US applicaMon as a Kahn Process Network (KPN)
Tremer et al., ESTIMedia 2014
18
Deterministic Memory Sharing § US applicaMon as a Kahn Process Network (KPN) § TransformaMon to DeterminisMc Memory Sharing KPN § Performance gain through: § § § §
Concurrency-‐safe memory sharing between processes In-‐place modificaMons MulMple readers Memory recycling
Tremer et al., ESTIMedia 2014
19
§ 60 cores @ 1053 MHz § Performs best with 120…240 threads
§ Ultrasound KPN:
§ ~ 200 threads § 3 implementaMons of data transfer: Classic, Windowed FIFOS, DMS
§ Tested different buffer sizes
§ Showing best results and performance/memory tradeoffs Tremer et al., ESTIMedia 2014
200 150
100 50 0
Channel memory [MB]
§ Plahorm: Intel Xeon Phi 5110P
Framerate [s↑−1 ]
Performance Results
10 3 10 2 10 1 10 0
Classic DMS Classic Tradeoff DMS Tradeoff Windowed 20
Conclusions § Project goal: trusted programmable hardware plahorm that can be connected to 3D ultrasound probe and to a screen § With low-‐cost, low-‐power components § PracMcal implementaMon steps achieved: § § § §
Matlab prototyping environment Development of blocks on Xilinx FPGA and study of ASIC 28nm Development of innovaMve image reconstrucMon algorithms ConcepMon of opMmized and safe methods for sofware mapping onto Intel Xeon or Kalray mulMcores 21
Thanks for your attention!
Ø Visit us at www.nano-‐tera.ch
www.nano-tera.ch 22