IRISE Report: Improving Bridge Assessment

Page 80

11. Appendix III: Methodology for subsurface defects detection 11.1 Xception Convolutional Neural Network (CNN) has been widely used to deal with vision-based tasks. Different networks may share similar sets of feature extraction layers, which are referred to as the backbone. There are some frequently used backbones, such as AlexNet [80] and VGG16/19 [81]. In this study, Xception [82] is selected as the backbone. Xception is a building block for deep nets developed by Google, which focus on the efficiency of convolution neural network by introducing the depth-wise separable convolutions. In other words, depth-wise separable convolution means convolution kernel is performed for each channel independently to extract spatial information and features. It consists of two steps: point-wise convolution and depth-wise convolution. As shown in Figure 10, 1x1 convolutions are applied to input to reduce the dimension first, and then n x n convolutions are applied to each channel to conduct depth-wise convolution. The extracted features are stacked to pass to the next layer.

Figure 10. Depth-wise separable convolution 11.2 DeepLabV3+ DeeplabV3+ is a powerful semantic segmentation module developed by Google [83], it utilizes an encoder-decoder architecture with Atrous spatial pyramid pooling (ASPP). ASPP is able to encode multi-scale contextual information. ASPP is Atrous convolution based spatial pyramid pooling. The top part of Figure 11 shows the Atrous convolution process. It can be presented in equation (13). y[j] = ∑k x[j + r ∗ n]w[n]

(13)

Where j is the location, n is the filter size, w is the filter weight, and r is the Atrous rate corresponding to the stride used to sample the input.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.