Dense Deconvolutional Network for Skin Lesion Segmentation

Page 1

Dense Deconvolutional Network for Skin Lesion Segmentation

Abstract: Automatic delineation of skin lesion contours from dermoscopy images is a basic step in the process of diagnosis and treatment of skin lesions. However, it is a challenging task due to the high variation of appearances and sizes of skin lesions. In order to deal with such challenges, we propose a new dense deconvolutional network (DDN) for skin lesion segmentation based on residual learning. Specifically, the proposed network consists of dense deconvolutional layers (DDLs), chained residual pooling (CRP), and hierarchical supervision (HS). First, unlike traditional deconvolutional layers, DDLs are adopted to maintain the dimensions of the input and output images unchanged. The DDNs are trained in an end-to-end manner without the need of prior knowledge or complicated postprocessing procedures. Second, the CRP aims to capture rich contextual background information and to fuse multi-level features. By combining the local and global contextual information via multi-level feature fusion, the highresolution prediction output is obtained. Third, HS is added to serve as an auxiliary loss and to refine the prediction mask. Extensive experiments based on the public ISBI 2016 and 2017 skin lesion challenge datasets demonstrate the superior segmentation results of our proposed method over the state-of-the-art methods.


Existing system: Semantic segmentation is commonly used to assign labels to every pixel in the region of interest and to delineate the contour of skin lesions. Typically, it is regarded as a dense pixel classification task. However, there are two existing challenges in the task of semantic segmentation: classification and localization. To improve the performance of the classification, the model needs to be insensitive to position information. On the contrary, the segmentation needs to be extremely sensitive to the position information. Since the traditional classification methods mainly use hand-crafted features [30], they are unable to stably and fully learn feature representation. To solve the contradiction problem in the image segmentation task, deep learning methods using CNNs have been proposed and achieved remarkable performance .For instance, the classical deep neural networks (e.g., Alex Net, VggNet, Google Net, and ResNet) have been widely applied with impressive performance. However, there are mainly two problems in the task of melanoma segmentation using deep CNNs. The first problem lies in the fullyconnected layers, which requires a fixed image size. The second problem is the pooling operation and stride in the convolution operation, which downscales the feature size and loses detailed position information. Proposed system: Our proposed DDN framework has the ability to learn rich hierarchical features by leveraging local and global contextual information. Extensive experiments on the public skin lesion challenge datasets demonstrate the effectiveness of our proposed method for skin lesion segmentation. The rest of this paper is organized as follows. The related work is presented in Section II. Section III introduces the proposed network architecture in detail. The experiments and comparison results are illustrated in Section IV. Our discussions are given in Section V. Finally, our conclusions are presented in Section VI. Advantages: Our proposed DDN framework has the ability to learn rich hierarchical features by leveraging local and global contextual information. Extensive experiments on the public skin lesion challenge datasets demonstrate the effectiveness of our proposed method for skin lesion segmentation.


Propose a DDL to solve the ‘checkerboard’ issue and ambiguous boundary shapes. By establishing a direct relationship among adjacent pixel values of a feature map, DDL can recover the detailed information. Disadvantages: To solve the contradiction problem in the image segmentation task, deep learning methods using CNNshave been proposed and achieved remarkable performance. For instance, the classical deep neural networks (e.g., Alex Net, VggNet, Google Net, and ResNet) have been widely applied with impressive performance. However, there are mainly two problems in the task of melanoma segmentation using deep CNNs. The first problem lies in the fully-connected layers, which requires a fixed image size. The second problem is the pooling operation and stride in the convolution operation, which downscales the feature size and loses detailed position information. Modules: Convolutional neural network: To address these challenges, deep learning models (e.g., convolutional neural networks (CNNs)) have been widely applied due to their impressive performance of the semantic segmentation. However, serial combinations of convolution strides and pooling reduce the resolution of the final output. Hence, these methods may not be able to provide an end-to-end training that maintains the same dimensions of the input and the output. To achieve an end-to-end training for segmentation, several studies developed techniques of up-sampling operation, dilated convolution, and post-processing procedures to resize the learnt feature maps to the dimensions of the output (i.e., label map). The main purpose of using the upsampling technique is to enlarge the resolution of the feature maps by expanding the width and the height, which achieves the same dimensions as the original input images. The interpolation and deconvolution methods (a.k.a., transposed convolution) are commonly used for the up-sampling operation. Fully convolutional networks (FCNs) have extended the traditional CNNs and become one of the most representative models with bilinear interpolation to restore the size of the original input image. However, when bilinear interpolation is used to expand


image resolutions, it is unable to automatically learn image features by merging multiple input maps into a single output. Dense deconvolutional network: End-to-end framework called dense deconvolutional networks (DDNs) for skin lesion segmentation inspired by in this paper. The DDN follows an encodingdecoding pipeline without prior knowledge of the input data or complicated postprocessing procedures. In the encoding phase, we adopt ResNet to extract semantic information (i.e., either the skin lesion or background). The decoding phase consists of three parts, i.e., a residual block for weight adjustment, a chained residual pooling (CRP) layer, and a dense deconvolutional layer (DDL). In addition, deep hierarchical supervision (HS) is added to further improve the performance. Overall, our main contribution is three-folds: 1) Devise a CRP to expand the receptive field, capture global information, and to enhance the robustness of the model; 2) Propose a DDL to solve the ‘checkerboard’ issue and ambiguous boundary shapes. By establishing a direct relationship among adjacent pixel values of a feature map, DDL can recover the detailed information. CRP : The output feature maps of residual convolution unit are fed into the CRP. The detailed architecture of the CRP is shown in Fig. 3 (b). There are in total three CRP blocks in total and every CRP block consists of a convolution layer and a maxpooling layer. The input of each CRP block is the output of the previous CRP block. The convolution operation is employed to generate weighting parameters in the training process. Pooling is used to change the size of the feature maps and boost performance via leveraging global contextual information. Here, we choose the kernel size in the pooling operation as five instead of three since large kernel size can expand the receptive field and capture more global information to make the model more robust to translations. Furthermore, the CRP block connects the output feature map generated by the pooling layer with the input feature map by summation via skip connection. In the process of backpropagation, the gradient can be transferred to a shallower layer more effectively with a skip connection. The non-linear ReLU operation is applied to increase pooling effectiveness as well. Experiments on DDL:


In order to prove the effectiveness of DDL and handle a variety of challenges in the skin lesion segmentation, our results are compared with conventional methods that are connected to the general deconvolutional operation in the decoding module. The main difference between the methods with and without DDL is the connection for deconvolution operation in the decoding module. The method without DDL uses the same encoding architecture with the same multi-level and multi-scale feature maps. DDL adopts dense connection in the up-sampling operation. By using DDL, each layer of the network in our proposed method can combine the feature maps of all the previous layers. The intermediate feature maps are generated by the input feature maps obtained from the pre-trained ResNet model. The remaining ones are generated by the previous feature maps. Finally, all the feature maps are fused for the final feature representation. The comparative segmentation results are shown in Fig. 6. We can see that the effectiveness of DDL is demonstrated in terms of various metrics. Compared with those without DDL, the results with DDL are improved in DC, JA, AC, SE, and SP, respectively.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.