Ulltrasoundtogo

Page 1

UltrasoundToGo Nano-Tera annual meeting 2015 Stefanos Skalistis RiSD, EPFL

1 Â


Consortium

UltrasoundToGo partners: §  EPFL:

§  Integrated Systems Laboratory (LSI) §  Rigorous System Design Laboratory (RiSD) §  Signal Processing Laboratory (LTS5)

§  ETHZ:

§  Integrated Systems Laboratory (IIS) §  Computer Engineering and Networks Laboratory (TIK)

§  CHUV:

§  Service de radiodiagnosMc et radiologie intervenMonnelle

2


Ultrasound Imaging Benefits §  Ultrasound is a widespread diagnosMc imaging technique §  Compared to X-­‐Ray/CT/MRI: non-­‐invasive, harmless, cheaper, non-­‐oppressive §  MulMple medical disciplines §  Can study staMc and dynamic body structures 3


…And Open Challenges §  Image quality is generally poor, hard to interpret §  Diagnosis heavily dependent on sonographer’s ability to choose the right angles of view → 3D Ultrasound Imaging §  Much improved image quality for some subjects, new diagnostic capabilities §  However, bulky and expensive

4


Ultrasound Devices §  Portable machines: limited to 2D imaging GE Vivid, GE Vscan, Mobisante Mobius, …

§  Hospital 3D devices: staMonary, > 100k$, hundreds of W Siemens Acuson, Philips Epiq, Samsung WS80…

§  High-­‐end research prototypes also expensive SARUS (Jensen ‘13): 320 FPGAs

Cephasonics Griffin: 1 M$ and up 5


Objective and Challenges Medical objecMve: §  Portable in-­‐the-­‐field device comparable to staMc systems §  Telemedicine: untrained GP acquires the 3D image and sonographer remotely evaluates

Engineering challenges: §  3D imaging has a very high computaMon cost §  Exploit parallel architectures for energy efficiency and lower cost §  CerMficaMon: provably-­‐correct mapping of sofware onto hardware 6


Platform in Development

7 Â


Platform in Development Imaging Algorithms

LTS5

Sofware Techniques

RISD, TIK

Hardware Plahorm

RF pre-­‐ processing

Beam forming

LSI, IIS

RF demod

Baseband processing

8


Prototyping Environment §  HW development follows extensive numerical studies §  Matlab environment

§  Prototyping of algorithms and imaging modes §  Study of image quality on arMficial numerical “phantoms” §  GeneraMon of input and output pamerns for HW tesMng

§  MulM-­‐mode, configurable toolchain for 2D/3D imaging

Ibrahim et al., MCPS 15

9


FPGA Mapping of Beamforming §ď‚§â€Ż Inner  core  of  3D  imaging  algorithm:  compute  238  MP/s  from  160GB/s  input  data,  while  requiring  2.38TD/s  §ď‚§â€Ż FPGA-­â€?friendly  architecture,  low  cost  and  good  accuracy  §ď‚§â€Ż Demonstrated  single-­â€?FPGA  capability  of  block  for  3D  US  3.3  Tera  delays/s,  200  MHz  on  Virtex  7 Â

AXI Â Bus Â

​đ?‘Ľâ†“đ??ˇâ &#x;

8   correcMons   Â

18  kb  BRAM Â

+ Â

+ Â

+ Â

Control Â

+ Â

+ Â

+ Â

+ Â

+ Â

​đ?‘Śâ†“đ??ˇâ &#x;

+ Â + Â + Â + Â + Â

...

+ Â + Â + Â + Â + Â

Â

20cm Â

.. . Â

... Â

16   correcMons  Â

+ Â ...

Â

2cm Â

+ Â ...

Â

Ibrahim  et  al.,  DATE  15 Â

10 Â


Study of a 28nm Beamformer ASIC §  Is a single-­‐chip implementaMon without external memory feasible? §  Designed single-­‐chip 3D beamformer in 28nm CMOS §  Scalable parallel architecture with opMmized buffering

Implementation Focus One RX Channel TDC

AFE

TDC

AFE

TDC

IQ IQ

Parallel

IQ

Bandpass

AFE

Beamformer IQ IQ

TDC

AFE

GC compute GC control

BFC Cluster

Image Post processing

§  Processes data at the informaMon rate by exploiMng bandpass properMes. §  Fully on-­‐chip delay computaMon from the underlying geometry to avoid off chip delay storage 11


Chip Analysis Results §  Implemented 100 -­‐channel 2D/3D beamformer §  Area: 1.68mm2 §  Power: 303 mW

§  Cost extrapolaMon to 10k channels: §  Area: 1.68cm2 §  Power: 30.3 W

ü  Single-­‐chip fully-­‐digital 3D beamformer is feasible

§  Scalable §  Sofware-­‐configurable §  Increased integraMon density (portable)

Point Spread FuncOon (golden reference vs HW)

Hager et al., BioCAS 2014 12


Novel Reconstruction Techniques §  Compressive sensing (CS) is a successful framework to accurately reconstruct signals from limited data

§  Exploit a low dimensional model of the ultrasound images in the reconstrucMon process

§  Proposed method based on the following scheme:

Plane Wave Emission

Receive and process echoes

Apply CS based algorithm Carrillo et al., IUS 2015

§  Apply CS-­‐based techniques to increase quality of ultrasound images while keeping high frame rate 13


Contrast Improvements

Carrillo et al., IUS 2015

§  CNR comparison between CS-­‐based method and state-­‐of-­‐ the-­‐art methods §  With only 3 insonificaMons, CS yields bemer contrast than focused DAS with 83 insonificaMons §  The price is a higher reconstrucMon computaMonal cost 14


Preliminary Test on Real Images §  Real caroMd acquired using Ultrasonix Sonix MDP device

§  AcquisiMon of one plane wave and comparison with a state-­‐of-­‐the art plane wave imaging method (Bernard et al., 2014)

Bernard method

CS-­‐based method

15


Deploying US Applications Kalray MPPA-­‐256 many-­‐core architecture:

§  16 clusters of 16 cores each: 256 cores (400 MHz) §  Inter-­‐cluster communicaMon: 2D-­‐torus NoC §  2MB shared memory per cluster

Trusted Ultrasound plahorm:

§  Real-­‐Mme scheduling respecMng Mme constraints §  OperaMng within power constraints §  OpMmal use of resources: §  Right degree of applicaMon-­‐level parallelism §  Balanced load and minimizaMon of communicaMon

Image credit: Marcio Castro et al.

16


A Offline/Online Approach Offline: Guarantees based on worst-­‐case execuMon Mme Online: OpMmizaMons based on actual execuMon Mmes

17


Deterministic Memory Sharing §  US applicaMon as a Kahn Process Network (KPN)

Tremer et al., ESTIMedia 2014

18


Deterministic Memory Sharing §  US applicaMon as a Kahn Process Network (KPN) §  TransformaMon to DeterminisMc Memory Sharing KPN §  Performance gain through: §  §  §  §

Concurrency-­‐safe memory sharing between processes In-­‐place modificaMons MulMple readers Memory recycling

Tremer et al., ESTIMedia 2014

19


§  60 cores @ 1053 MHz §  Performs best with 120…240 threads

§  Ultrasound KPN:

§  ~ 200 threads §  3 implementaMons of data transfer: Classic, Windowed FIFOS, DMS

§  Tested different buffer sizes

§  Showing best results and performance/memory tradeoffs Tremer et al., ESTIMedia 2014

200 150

100 50 0

Channel memory [MB]

§  Plahorm: Intel Xeon Phi 5110P

Framerate [​s↑−1 ]

Performance Results

10 3 10 2 10 1 10 0

Classic DMS Classic Tradeoff DMS Tradeoff Windowed 20


Conclusions §  Project goal: trusted programmable hardware plahorm that can be connected to 3D ultrasound probe and to a screen §  With low-­‐cost, low-­‐power components §  PracMcal implementaMon steps achieved: §  §  §  §

Matlab prototyping environment Development of blocks on Xilinx FPGA and study of ASIC 28nm Development of innovaMve image reconstrucMon algorithms ConcepMon of opMmized and safe methods for sofware mapping onto Intel Xeon or Kalray mulMcores 21


Thanks for your attention!

Ø  Visit us at www.nano-­‐tera.ch

www.nano-tera.ch 22


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.