Ultrasoundtogo

Page 1

UltrasoundToGo Nano-Tera annual meeting 2015 Stefanos Skalistis RiSD, EPFL


Consortium

UltrasoundToGo partners:  EPFL:

 Integrated Systems Laboratory (IIS)  Rigorous System Design Laboratory (RiSD)  Signal Processing Laboratory (LTS5)

 ETHZ:

 Integrated Systems Laboratory (LSI)  Computer Engineering and Networks Laboratory (TIK)

 CHUV:

 Service de radiodiagnostic et radiologie interventionnelle


Ultrasound Imaging Benefits  Ultrasound is a widespread diagnostic imaging technique  Compared to X-Ray/CT/MRI: non-invasive, harmless, cheaper, non-oppressive  Multiple medical disciplines  Can study static and dynamic body structures


…And Open Challenges  Image quality is generally poor, hard to interpret  Diagnosis heavily dependent on sonographer’s ability to choose the right angles of view → 3D Ultrasound Imaging  Much improved image quality for some subjects, new diagnostic capabilities  However, bulky and expensive


Ultrasound Devices  Portable machines: limited to 2D imaging GE Vivid, GE Vscan, Mobisante Mobius, …

 Hospital 3D devices: stationary, > 100k$, hundreds of W Siemens Acuson, Philips Epiq, Samsung WS80…

 High-end research prototypes also expensive SARUS (Jensen ‘13): 320 FPGAs

Cephasonics Griffin: 1 M$ and up


Objective and Challenges Medical objective:

 Portable in-the-field device comparable to static systems  Telemedicine: untrained GP acquires the 3D image and sonographer remotely evaluates

Engineering challenges:

 3D imaging has a very high computation cost  Exploit parallel architectures for energy efficiency and lower cost  Certification: provably-correct mapping of software onto hardware


Platform in Development Imaging Algorithms

LTS5

Software Techniques

RISD, TIK

Hardware Platform

RF preprocessing

Beam forming

LSI, IIS

RF demod

Baseband processing


Prototyping Environment  HW development follows extensive numerical studies  Matlab platform    

Prototyping of algorithms and imaging modes Study of image quality on artificial numerical “phantoms” Co-development with FPGA offload Generation of input and output patterns for HW testing

 Realized a powerful, multi-mode, configurable testing environment for 2D and 3D imaging

Ibrahim et al., MCPS 15


FPGA Mapping of Beamforming  Fully-digital 3D beamforming is extremely challenging:

 Compute 238MIP/s from 160GB/s input data, while requiring 2.38TD/s

 Inner core of imaging algorithm: compute the delay values for echo summation  FPGA-friendly architecture, low cost and good accuracy  Demonstrated single-FPGA capability of block for 3D US 3.3 Tera delays/s, 200 MHz on Virtex 7

18 kb BRAM

+

+

+

Control

+

+

16 corrections

+ + + + +

...

+

+

+ + + + +

20cm

.. .

+

+ ...

2cm

+

. . .

8 corrections

AXI Bus

...

Ibrahim et al., DATE 15


Study of a 28nm Beamformer ASIC  Is a single-chip implementation without external memory feasible?  Designed single-chip 3D beamformer in 28nm CMOS Implementation Focus One RX Channel TDC

IQ

AFE

TDC

AFE

TDC

AFE

c

IQ

Parallel

IQ

Bandpass Beamformer IQ

IQ TDC

AFE

GC compute GC control

BFC Cluster

Image Post processing

 Scalable parallel architecture with optimized buffering  Processes data at the information rate by exploiting bandpass properties.  Fully on-chip delay computation from the underlying geometry to avoid off chip delay storage


Chip Analysis Results  Implemented 100 -channel 2D/3D beamformer  Area: 1.68mm2  Power: 303 mW

 Cost extrapolation to 10k channels:  Area: 1.68cm2  Power: 30.3 W

 Single-chip fully-digital 3D beamformer is feasible

 Scalable  Software-configurable  Increased integration density (portable)

Point Spread Function (golden reference vs HW)

Hager et al., BioCAS 2014


Novel Reconstruction Techniques  Compressive sensing (CS) is a successful framework to accurately reconstruct signals from limited data

 Exploit a low dimensional model of the ultrasound images in the reconstruction process

 Proposed method based on the following scheme: Plane Wave Emission

Receive and process echoes

Apply CS based algorithm Carrillo et al., IUS 2015

 Apply CS-based techniques to increase quality of ultrasound images while keeping high frame rate


Contrast Improvements

Carrillo et al., IUS 2015

 CNR comparison between CS-based method and state-ofthe-art methods, using 83 insonifications  With only 3 insonifications, CS yields better contrast  The price is a higher reconstruction computational cost


Preliminary Test on Real Images  Real carotid acquired using Ultrasonix Sonix MDP device  Probe: Sonix L14-5W  

128 transducers Emission at 12 Mhz

 Acquisition of one plane wave and comparison with a state-of-the art plane wave imaging method (Bernard et al., 2014)


Deploying US Applications Kalray MPPA-256 many-core architecture:  16 clusters of 16 cores each: 256 cores (400 MHz)  Inter-cluster communication: 2D-torus NoC  2MB shared memory per cluster

Trusted Ultrasound platform:  Real-time scheduling respecting time constraints  Operating within power constraints  Optimal use of resources: 

Right degree of application-level parallelism

Balanced load and minimization of communication


A Hybrid Approach Offline: Guarantees based on worst-case execution time (WCET) Online: Run-time optimizations based on actual execution times (AET) Application Partitioning

Mapping, Scheduling, Buffer Allocation

Unified System Model

Architecture Model

Application Placement

Power Management Task C

b(eAC)

Cluster 1

Cluster 2

ω(eAC) Task A

ω(eAB)

b(eAB)

IA

0

TA

0

b(eBA)

ω(eAB)

Task B

IB 0

TB FA

0

Real-Time Scheduling

SMT


Deterministic Memory Sharing  US application as a Kahn Process Network (KPN)  Transformation to Deterministic Memory Sharing KPN  Performance gain through:    

Concurrency-safe memory sharing between processes In-place modifications Multiple readers Memory recycling

Tretter et al., ESTIMedia 2014


Performance Results  Platform: Intel Xeon Phi 5110P

200

 60 cores @ 1053 MHz  Performs best with 120…240 threads

 Tested different buffer sizes

 Showing best results and performance/memory tradeoffs Tretter et al., ESTIMedia 2014

Framerate

 ~ 200 threads  3 implementations of data transfer: Classic, Windowed FIFOS, DMS

100 50 0

Channel memory [MB]

 Ultrasound KPN:

150

10 3

10 2

10 1

10 0

Classic Classic Tradeoff

DMS DMS Tradeoff

Windowed


Thanks for your attention!

ďƒ˜ Visit us at www.nano-tera.ch

www.nano-tera.ch


Exploration Phase: Different Transmit Focus

96-element linear array (plane wave)

32-element phased array (converging wave)

32-element phased array (diverging wave)


Exploration Phase: Parameterize the Axial Resolution Time complexity Fast

Slow

Full resolution (11688 points) ~ an hour

500 points

50 points

~ 15 mins

~ 13 mins


Exploration Phase: Zone Imaging  Zone sonography divides the ROI into zones  Each zone is insonified and reconstructed separately  Then they are combined to form the whole ROI

Zone imaging insonifications (5 zones)

Brightness Compensation profile


Exploration Phase: Zone Imaging

Single Converging focus without brightness compensation

Single Converging focus with brightness compensation

Zone imaging of five zones (converging focus) with brightness compensation

Zone imaging of ten zones (converging focus) with brightness compensation

Zone imaging of two zones (converging focus) with brightness compensation

Zone imaging of 50 zones (converging focus) with brightness compensation


Our proposed approach for Delay Calculation: Delay «Steering»  The delay calculation is based on:  Pre-calculate a delay table for points R along axis of azimuth = 0 and elevation =0  Drive delays for any S for the R at equal distance from O, then “steer” the delay

D4 D3 D2 D1 O D16 D13 R S


Delay «Steering»

1st order Taylor expansion • Critical calculation: 2 ADDs

Precalculated


Delay «Steering» Accuracy • • • • 2cm

Expansion error bounded by the Lagrange bound Excellent in far field and close to the imaging axis Inaccurate close to the probe, at broad angles Luckily, the worst geometric offenders are discarded due to limited element directivity Max delay error = 209 samples

Max delay error = 98 samples 20cm


Fully – Digital vs Partial – Analog  Analog Pre-Beamforming: Analog Pre Beamforming

Parallel Beamformer

 120 ASICs in the transducer head with active cooling:

Pictures: Whitepaper 2008: 4Z1c Real-Time Volume Imaging Transducer - Siemens Medical Solutions USA, Inc.


CNR computation  Phantom: 

1cmx1cm Non-echogeneic (black) occlusion placed at 30 mm depth in a medium with high density of scatterers

2 cm

1 cm

2 cm

 CNR computation: 

occlusion

B is the occlusion, I is the amplitude of the pixels


Contrast improvement

Traditional beamforming method (DAS with 83 focused beams)

Compressed sensing based method with 3 plane waves

Beamforming techniques

Number of ultrasound beams

Contrast (dB)

Traditional method

83

10,17

CS based method

3

10,65

Variations

- 97%

+ 5%


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.