3 Methodology 3.1 YOLO object detection YOLO (You Only Look Once) is a real-time object detection system (Figure 5). YOLO applies a neural network to the image and predicts a rectangle bounding box with a probability using linear regression. Each bounding box predicts a separate pretrained class. The centre coordinates are predicted using a sigmoid function, and the width and height from cluster centroids. If the probability of a bounding box appearing is smaller than the threshold, then the bounding box is not displayed [7].
Figure 5. Bounding box with location and dimension [7].
Feature extraction is achieved using the Darknet-53 neural network, which has 53 convolutional layers. The network uses 3x3, 1x1 layers and has shortcut connections (Figure 6) [7].
17