Image Segmentation for Object Detection using Mask R-CNN in Colab

Page 1

GRD Journals- Global Research and Development Journal for Engineering | Volume 5 | Issue 4 | March 2020 ISSN- 2455-5703

Image Segmentation for Object Detection using Mask R-CNN in Colab Mr. V. Neethidevan Assistant Professor Department of MCA Mepco Schlenk Engineering College(Autonomous) Sivakasi

Dr. G. Chandrasekaran Director Department of MCA Mepco Schlenk Engineering College(Autonomous) Sivakasi

Abstract Image segmentation is a critical process in computer vision. It involves dividing a visual input into segments to simplify image analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels. Recently, due to the success of deep learning models in a wide range of vision applications, there has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models. Image segmentation with CNN involves feeding segments of an image as input to a convolutional neural network, which labels the pixels. Fully convolutional network, FCNs use convolutional layers to process varying input sizes and can work faster. The final output layer has a large receptive field and corresponds to the height and width of the image, while the number of channels corresponds to the number of classes. An architecture based on deep encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding the input image into low dimensions and then recovering it with orientation invariance capabilities in the decoder. Most notably is the R-CNN, or Region-Based Convolutional Neural Networks, and the most recent technique called Mask R-CNN that is capable of achieving state-of-the-art results on a range of object detection tasks. Keywords- Machine Learning, Deep Learning, Image Segmentation, Video Analytics

I. INTRODUCTION A. What is Image Segmentation? Image segmentation is a critical process in computer vision. It involves dividing a visual input into segments to simplify image analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels”. Image segmentation sorts pixels into larger components, eliminating the need to consider individual pixels as units of observation. There are three levels of image analysis– Classification- categorizing the entire image into a class such as “people”, “animals”, “outdoors” – Object Detection- detecting objects within an image and drawing a rectangle around them, for example, a person or a sheep. – Segmentation- identifying parts of the image and understanding what object they belong to. Segmentation lays the basis for performing object detection and classification. B. Old-School Image Segmentation Methods There are additional image segmentation techniques that were commonly used in the past but are less efficient than their deep learning counterparts because they use rigid algorithms and require human intervention and expertise. These include– Thresholding- divides an image into a foreground and background. A specified threshold value separates pixels into one of two levels to isolate objects. Thresholding converts grayscale images into binary images or distinguishes the lighter and darker pixels of a color image. – K-means clustering- An algorithm identifies groups in the data, with the variable K representing the number of groups. The algorithm assigns each data point (or pixel) to one of the groups based on feature similarity. Rather than analyzing predefined groups, clustering works iteratively to organically form groups. – Histogram-based image segmentation- uses a histogram to group pixels based on “gray levels”. Simple images consist of an object and a background. The background is usually one gray level and is the larger entity. Thus, a large peak represents the background gray level in the histogram. A smaller peak represents the object, which is another gray level. – Edge detection- identifies sharp changes or discontinuities in brightness. Edge detection usually involves arranging points of discontinuity into curved line segments, or edges. For example, the border between a block of red and a block of blue.

II. LITERATURE SURVEY [1]. In the last few years researches across the globe applied deep learning concept in computer vision applications. The authors addressed the problems in using two Neural network architectures LeNet and Network in Network (NiN). Performance of the

All rights reserved by www.grdjournals.com

15


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.