Algorithmic Design: Schnell ans Ziel

Page 1

Chair of Architectural Informatics Department of Architecture Technical University of SchnellMunich ans Ziel

Schnell ans Ziel

Chair of Architectural Informatics

Prof. Dr.-Ing. Frank Petzold

Algorithmic Design

Ivan Bratoev, Frank Petzold

Liubov

2
03764306Vanessa03762097KniazevaMagloire

Table of Contents

StoryboardTopic

Concept

3 4038363416141210864
Concept
Development. 1 Phase General Approach Artificial Intelligence Research Concept Development. 2 Phase OutlookComparisonPrototype References Contact

TopicSince cyclists, skaters, and pedestrians have different requirements for the surface, this project uses AI to analyze different surface conditions of pedestrian paths and their influence on comfort and driveability in relation to the time needed to cover the distance. To achieve this, the smoothness index is defined, which assigns a value to each surface in a picture, to describe how comfortable a path is.

The main goal is to create a tool that maps this information, and gives the different users the possibility to choose the most comfortable route, and make their individual experience of the city more enjoyable.

4
This project consists of: Storyboard Concept Concept Development General Approach Artificial Intelligence Research Prototype Data Structure RTK Neural Network Architecture Model Training Gathering new Data SmoothnessPredictions Index MappingCoordinates Final Route OutlookComparison
Isn‘t there a way to find a smoother route?!
Why don‘t you just take the bike? I really like to use skateboard!the
Because
5

Storyboard

In big cities like Munich, the surfaces on one path can differ greatly.

Distance: 2.5km Smooth: 25% Time: 15min

Distance: 3.0km Smooth: 55% Time: 12min

With AI it is possible to map the smoothness index of different surfaces and paths, which can be criteria for choosing one route over another.

Oh no, I am reallyTime:Smooth:Distance:late!3.5km80%10min

What do I do, isn‘t there a smoother path where it is easier to ride

Howon?suprising,

I remember hearing about a new app that shows me the roughness of different routes!

The path consists of different surfaces and pavement quality. Depending on the means of transport the requirement on the quality of the surface may differ.

even though the green path is 1 km longer I am still faster! And I can enjoy this route much more than the shorter paths...

Depending on the means of transport, AI estimates how comfortable a path is and how much time can be saved when the path is very smooth.

A skateboarder may choose a different path than a pedestrian or cyclist.

With AI we can make better choices.

Concept Development

To develop our tool, the first step would be to create our own dataset, analyze the images and translate our results into a map. A possible step would be to differentiate the street from the bike lane and side walks, since the bike path provides safer ways for faster movement. The Datasets would consist of very rough surfaces like cobblestone and very smooth surfaces of asphalt that would be labeled.

Mapping

CombineInfo
PrepareData Extract images from Google street views Consider bike paths as priority way sidewalksover Translate results of ontoinformationamap Create own Data 8

Considering the analyzed information, roads with different smoothness indexes could be found. The index has a direct influence on the estimated required time of travel, which helps the user to decide on the most suitable path based on the means of transport.

To make our tool accessible, the next step would be to create an interface for the application or a plug-in for an existing application like google maps. Finally, we would need to test the application to ensure its quality by gathering feedback for the users.

Routesfinding Interfacean

Evaluation

Find individualqualityestimateroutesdifferentandtheirtomakechoices Create applicationinterfacephone-basedaforthe Test applicationthe and improve it 9
Create

General Approach

The most challenging part of our project is surface quality classification. After researching the development in this field, we have found a set of different papers, that worked on surface quality classification [1] and road segmentation [2][3]. The main technology that is used is a Convolution Neural Network (CNN).

It is a class of deep neural networks that extracts features from images, given as an input, to perform specific tasks such as image classification, face recognition, and semantic image system. A CNN has one or more convolution layers (Fig. 1) for simple feature extraction, which execute convolution operation (i.e. multiplication of a set of weights with input) while retaining the critical features (spatial and temporal information) without human supervision. Convolutional operation

10
Figure[4] 1.
[4]

The three primary layers that define the structure of a Convolutional Neural Network are:

Convolutional Layer performs feature extraction by sliding the filter over the input image (Fig. 2). The output of the convolved feature is the element-wise product of filters in the image and their sum for every sliding action;

Pooling Layer reduces the number of trainable parameters by decreasing the spatial size of the image, thereby reducing the computational cost;

Fully Connected Layer determines the output, converting output from the pooling layer into a one-dimensional vector. [4]

Figure 2. Convolutional operation [4]

11

AI Research

For the implementation of our idea we have found different datasets, that could be used for the Neural Network training. All the datasets contain images of roads.

The KITTI dataset [5] and the CaRINA dataset [6] both use high-quality image captures and higher cost cameras and active remote sensing.

RTK dataset [7] compared to the other two contains a variation of different surface types, including damages on the road surface and even on unpaved roads, classification of surfaces (such as asphalt, paved and unpaved) that we could use for surface classification and quality estimation.

Initially the project considered classification of the pedestrian path, where it made sense to integrate the Datasets of KITTI and CaRina. After further research the semantic segmentation process found to be more effictive. That is why RTK dataset was used and the other two Datasets became redundant as they are mainly used for classification and simple segmentation.

The RTK dataset (Fig.3) contains images captured with a low-cost camera, showing different types of surfaces and qualities (asphalt, paved, unpaved roads) and classes (potholes, waterpuddles, cracks, ...) [7]

Figure 3. RTK Dataset
12

The CaRina dataset (Fig.4) and road detection benchmark consisting of GPS annotated RADAR, LIDAR and camera information for several data sequences. It also proposes a novel evaluation metric based on the intersection of polygons. [6]

KITTI (Fig.5) is a road and lane estimation benchmark that consists of 289 training and 290 test images. It contains three different categories of road scenes: urban unmarked, urban marked and urban multiple marked lanes. [5]

Figure 4. CaRina Figure 5. KITTI
13

Concept Developement

After researching different papers, the paper of RTK proved to be the most applicable and consistent with the concept idea. To further sharpen the steps of the overall concept, the initial idea was further developed and possible tools and libraries, that could be used for each step were defined.

The concept is divided into two main parts:

ConvolutionallayersData AugmentationInput Images Select combinedataset,appropriateanorafew of them KITTI, CaRinaRTK,etc. Semantic Segmentation of the road / dynamicsidewalksU-NETfastaidistortionperspectiverotationsHorizontalandlibrary 14
} } 1.Step

1. In the first step, image segmentation is made to define different types of surfaces

2. In the second step, the quality of the surface is defined

To estimate the performance of the NN we compared the provided results with the personal experience and perception of a person.

adaptation:Illumination adjust gamma, increase brightness python functions Define quality of the (asphalt,surfacepaved) ReLU as an functionactivation sensing athecomparisonimpoveaccuracyEstimateseabornexperiment,libraryNNandit,withperseptionofperson 15 Data Augmentation Convolutionallayers InterpretationPerformance } } 2.Step

Data Structure RTK Prototype

For this project, the RTK (Road Traversing Knowledge) Dataset was used.

The Dataset (Fig. 6) consists of a compilation of folders with:

original frames (Fig. 6a) containing the raw images in a low resolution, labels (Fig. 6b) containing the masks that will be used to train and validate. These images are 8-bit pixels after a colormap removal process and appear black because the difference in grayscale is not visible to the naked eye.

colorLabels (Fig. 6c) contain the original colored masks, which will be used later for the visual comparison, valid.txt (Fig. 6e) contains a list of image names randomly selected for validation, codes.txt (Fig. 6d) contains a list with all the class names. [7]

16
17 b) labelsa) original frames e) valid.txtc) colorLabels Figure 6. Sample images from RTK and d) codes.txt ...000000072.png000000060.png000000052.png000000050.png000000045.png000000038.png000000029.png000000025.png000000019.png000000012.png000000010.png000000004.png000000001.pngpatchsmanholeCoverstormDraincatsEyespeedBumproadMarkingroadUnpavedroadPavedroadAsphaltbackground waterPuddle crackspothole

Neural Network Architecture

The architecture of the CNN (Fig. 7) is based on a study on Road surface detection and differentiation considering surface damages [2] and is implemented as a dynamic U-Net with ResNet34 as an encoder part for feature extraction. ResNet34 is a residual CNN model with skip-connections, that helps to maintain important features of the early layers. [2] As an input, images with a resolution of 288x352 are provided (RTK dataset).

ResNet34 blocks consist of the following layers:

Conv2d applies a 2D convolution over an input signal composed of several input planes.

BatchNorm2d performs the normalization for each training mini-batch. MaxPool2d applies a 2D max pooling over an input signal composed of several input planes. ReLU applies the rectified linear unit function element-wise. [8]

In the decoder part following layers are used:

PixelShuffle avoids checkerboard artifacts when upsampling images; MergeLayer merges a shortcut with the result of the module by adding or concatenating them. [9]

The whole U-Net is generated automatically by fastai library using pre-trained residual ResNet34 as a base model.

18
19 Figure 7. U-Net architecture with ResNet as encoder. Based on [9],[10],[11] } } } } 288x352 288x352 Downsampling Path (Blocks of ResNet34) Upsampling Path ConnectionsSkip64 144x17672x88 ResNet34 Blocks Convolutional Blocks MaxConcatenatePool2x2 Up-sampling (Pixel Shuffle) 36x44 18x22 9x11 128 256 512 1024 Input Image Output Image Features Resolution ResNet blocks contain shortcut connections skipping one or more layers

Model Training

To start training the prepared Neural Network (NN) some important parameters need to be defined:

batch size defines how many images will be sent at a time to train NN. It depends on the available GPU RAM. In our project we used the Google Collaboration Platform, therefore the batch size is 4 or 8 images when available memory is 8 or 16 GB respectively;

weight decay is a small number to prevent loss values to be too huge. The default value of weight decay in fastai is 0.01 [12], which is applied in this project;

learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward the minimum of a loss function. The lower the value, the slower we travel along the downward slope. According to (Fig. 8 ) it is important to select an appropriate learning rate value to get better NN performance in a shorter time. A specific function in fastai library was used to find the most appropriate value for our specific NN automatically.

By recording the learning rate at each iteration and plotting it in a logarithmic scale against loss (Fig. 9 ) the point A, which should be located slightly to the left to the minimum value of the displayed function, was selected as an appropriate learning rate value. [13]

20

Figur 8. Effect of various learning rates on convergence [13]

Figure 9. Learning rate/Loss graphicA

21

According to T. Rateke, A. von Wangenheim [2] the training process is divided into two steps (Fig. 10) using a Transfer Learning technique. In the first step NN is trained without using weights in the classes and then used as a basis for a second CNN model where weights for each class were defined. The weights are based on the ratio of each class‘s pixels to the total pixels in Ground True (Tab. 1).

In the Confusion Matrix (Fig. 11) we can see that after the first training step NN is good at recognizing background classes such as asphalt, paved and unpaved, but is not so precise for small classes. After the second step (Fig. 12) the whole performance is much better.

Class GT ratio Weight used

Background 65.86% 1.0

Asphalt 12.90% 5.0

Paved 10.50% 6.0

Unpaved 9.22% 7.0

Marking 0.78% 75.0

Speed-Bump 0.06% 1000.0

Cats-Eye 0.02% 3100.0

Storm-Drain 0.02% 3300.0

Patches 0.22% 270.0

Water-Puddle 0.03% 2200.0

Pothole 0.06% 1000.0

Cracks 0.33% 180.0

Table 1. Ratio of each class’s pixels and the weights used in the loss function [2]

22

Figure 10. Training process, divided into 2 steps [2]

Figure 11. Training results. Step 1: no weights

Figure 12. Training results. Step 2: with weights

23

Gathering new data

To test the Neural Network on our use case, new data is gathered on a pedestrian path close to the city center of Munich. The initial idea was to take a video, but there is no metadata for each frame which is why took images were taken instead. Dataset 1 for the path (Fig.13) consists of approximately 100 images, that predominantly show paved surfaces, examples shown in Figure 14. The images were taken on a bike in good weather and lighting conditions, except for shadows in several Insequences.general, the GPS coordinates of the images are accurate even if some sequences are lacking which might correlate with the velocity and number of images taken.

Figure 13. First path in the City of Munich, ca. 0.9 km long

24
25 c) PavedIMG_7446road d) b)PavedIIMG_7496roadIMG_7420Cobblestoneroada) AsphaltIMG_7418road Figure 14: Dataset 1 for the first path

Predictions

After creating dataset 1 we proceeded with implementing it into the NN.

The first try to run the code with dataset 1 (Fig.14) wasn‘t as smooth as expected. Some errors were caused by the format of the images. Converting the HEIC format to jpeg, resulted in strange predictions from the NN. This might be because the two formats jpeg and png have different compression processes and jpeg loses information during the compression process compared to the png and therefore has a lower quality. To avoid these errors, it is crucial to pay attention to the different formats.

Luckily even after the conversion process, the metadata remains, which will be needed for the next step to map the data.

To show how the NN predicts different surface conditions on our path we chose three images and three masks as an example (Fig.15).

Figure 15a shows an asphalted section of the road. Image (Fig.15b) shows a cobblestone and image (Fig.15c) a tile paved road.

Comparing the mask (Fig.15d-f) with the image (Fig.15a-c) we can say that in general, the prediction is accurate which means that the colors in the legend match the corresponding surface. The only issue is the prediction of cobblestone, where NN classified it as paved with many cracks, as seen in Figure 15e, which will in the next step result in an incorrect smoothness index.

26
27 c) TiledIMG_7446roadf)MG_7446Tiledroadb) CobblestoneIMG_7420 road e) CobblestoneIMG_7420 road a) AsphaltIMG_7418roadd)IMG_7418AsphaltroadAsphalt Unpaved Speed Bump Patch Water Puddle Cracks Paved Marking Storm Drain Cats Eye Pothole Figure 15: Dataset 1

Smoothness Index

Depending on the means of transport the surface condition has a relevant influence on the comfort and drivability of the route. For this reason, we have created a formula to calculate this smoothness index for each image. The idea is to include the calculation of travel time to make more precise estimations.

The calculation of the smoothness index contains list 1 (class 0-3) and list 2 (classes 4-12) see Table 2. For each class, a coefficient is set. In list 2 classes 5,7,9,10,11,12 are considered, as they have a relevant influence and reduce the drivability and comfort compared to classes 4,6, and 8. Next, the coefficient of each class is multiplied by the surface percentage of the pixels each class has in one image. Finally, list 2 will be subtracted from list 1 to get the final value.

list 1 = 62.51%

list 2 = 0.01% sum = 62.50%

28
patchsmanholeCoverstormDraincatsEyespeedBumproadMarkingroadUnpavedroadPavedroadAsphaltbackgroundClass waterPuddle crackspothole1211109876543210 111101010 0.010.000.000.000.000.000.020.000.04 0.010.000.000.000.000.000.000.000.00 coeff0100505 % surface0.320.610.00 30.2632.25%0.00 } } } }
Table 2. Smoothness index calculation for IMG_7446
Figure
29
16. IMG_7446 asphaltUnpavedAsphaltroad Speed Bump WaterPatch Puddle CracksPaved Marking Storm Drain Cats Eye Pothole

Coordinates

To get the necessary for mapping coordinates, we used the meta-data of the images (Fig.17, 18). An important step was to convert the coordinates from Deg Min Sec (DMS) format to decimal degrees. Since the visualization library seaborn that we used for the mapping requires coordinates in another coordinate system (EPSG:3857), a second conversion was needed (Table 3).

Figure 17. IMG_7946

Figure 18. GPS Meta-data of the image (excerpt)

DMS

Decimal Degrees Map Coordinates (deg min sec) Rel EPSG:4326 EPSG:3857 (48.0, 8.0, 46.01) 48.14611388888889 1288599.792692466 lon (11.0, 34.0, 32.48) 11.57568888888889

Table 3. Coordinates conversion example

30
lat
N
E
6131197.49467983 1 2

Mapping

In the process of image analysis, a JSON file with the necessary data for mapping was created. It contains information about image names, coordinates, and the smoothness index. It made it easier for us to work with the visualization without running the whole script again and also allows to export data to an application like Kepler.gl [14] for creating interactive maps.

Another important aspect of the visualization (Fig.19) was to fix the color range to show the same values on different routes with the permanent colors.

Figure 19. First Route Smoothness

31

Final Route

To improve dataset 1, we chose a second path (Fig.20) that shows a greater diversity of the road surfaces and a denser image sequence. The images on the second path was taken by foot, which had a positive impact on the accuracy of the GPS coordinates.

For the second path dataset 2 with 300 and dataset 3 with 500 images were created. Compared to dataset 1, the weather and lighting conditions in dataset 2 weren‘t as good. Most of the sequences had sharp shadows, resulting in low accuracy and a false prediction. For dataset 3 the images were retaken in a better lighting condition. This improved the prediction as seen in Figure 21.

Figure 20. Second path in the City of Munich, ca. 2.5km long

32
33 Figure 21. Dataset prediction Wcomparison c-1) Dataset 2 a-1) Dataset 2 d-1) Dataset 2 b-1) Dataset 2 c-2) Dataset 3 a-2) Dataset 3 d-2) Dataset 3 b-2) Dataset 3

Comparison

To compare the subjective view of a person with the prediction of the NN, we conducted a self-experiment with a skateboard on the final path and marked the experience on a map (Fig. 23b).

We were interested in how much a skateboarder would agree with the prediction of NN (Fig. 23a) and realized, that we would need to change the smoothness coefficients depending on the means of transport. Because even if cyclists find it uncomfortable, they can still ride over cobblestones, whereas skateboarders consider it to be an obstacle where they have to step off and Bywalk.comparing

the individual experience with the NN prediction we discovered that the predictions in some sequences are incorrect. In Figure 22 we can see that the NN could not recognize the gravel surface and instead classified it as paved with a little bit of asphalt. In Figure 23b the two red parts indicate unpaved surfaces, but NN classified them as good.

a) original frame

b) predicted segmentation Figure 22. IMG_8071

34

a) NN prediction map with the smoothness index

b) Mapped perception of the surface quality from the view of a skater

35 Figure 23. Prediction Maps Indexsmoothness0.10.40.71.0BBAA

Since the accuracy of the NN is not good enough, the first step could be the addition of a new class for cobblestone, since this pavement has a very different smoothness index. Also a new types of unpaved surfaces could be added to the training dataset in order to recognize it better.

The next step would be to not only classify the surface type, but also estimate its quality (this step was already proposed in the concept development, but not implemented). For this NN described by T.Rateke [1] could be used. Based on it, one more coefficient could be considered by smoothness index calculation.

Since the accuracy of NN is better for the nearest part of the image, and the density of the route could vary and be higher, it could be useful to define the region of interest in the image (Fig. 24), taking only the lower part into consideration and images collection higher to still analyze all surface on the way.

36 Outlook
Figure 24. IMG_7419 Asphalt road

In this project only a short path was analyzed and the datasets were created manually. To create an extensive smoothness map it would be possible to extract images from Google Street Views automatically. However, the up-todateness of data can vary, and the images will contain mainly roads although the sidewalks have a bigger importance.

As a further step, the process of finding different routes opportunities could be implemented. For each path the smoothness index could be calculated as a single parameter, defining the surface quality on the whole path. To provide the user with a better overview, an approximate time for each route could be estimated depending on the average speed of chosen means of transportation and the smoothness index on each segment of the route.

Since we want to provide different options for different means of transportation (skateboard, bike, etc) specific smoothness coefficients could be defined for each surface type for each mean of transport. And then individual Smoothness Maps for each transport could be created.

To find the fastest and the most comfortable route different aspects like already existing biking paths (provided in [15] ) and the amount of space (narrow or wide) could be considered.

Finally, creating a good interface for web and phone usage can make the analysis results accessible and usable.

37

References

[1] T.Rateke, K. A. Justen and A. von Wangenheim. Road Surface Classification with Images Captured From Low-cost Cameras — Road Traversing Knowledge (RTK) Dataset, (2019), Revista de Informática Teórica e Aplicada (RITA). Access: www.researchgate.net

[2] T. Rateke, A. von Wangenheim. Road surface detection and differentiation considering surface damages, (2020). Access: link.springer.com

[3] JunHyeok, Road-Segmentation, (2021). Access: github.com

[4] Analytics Vidhya Web-Site, Convolutional Neural Network (CNN), (2022). Access: www.analyticsvidhya.com

[5] A. Geiger, P. Lenz, C. Stiller, R. Urtasun, A. et al. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research (IJRR), (2013). Access: journals.sagepub.com

[6] P. Y. Shinzato, T. C. dos Santos, L. A. Rosero, D. A. Ridel, C. M. Massera, F. Alencar, M. P. Batista, A. Y. Hata, F. S. Osório and D. F. Wolf. CaRINA dataset: An emerging-country urban scenario benchmark for road detection systems (2016). Access: www.researchgate.net

38

[7] T.Rateke. Road Traversing Knowledge (RTK) Dataset, (2016). Access: lapix.ufsc.br

[8] PyTorch Documentation, (2022). Access: pytorch.org

[9] Fastai Library Documentation, (2022). Access: docs.fast.ai

[10] N. Singla. UNet with ResBlock for Semantic Segmentation, (2019). Access: medium.com

[11] Mapping Landslides on EO Data: Performance of Deep Learning Models vs. Traditional Machine Learning Models, (2020). Access: www.researchgate.net

[12] D. Vasani. This thing called Weight Decay, (2019). Access: towardsdatascience.com

[13] H. Zulkifli. Understanding Learning Rates and How It Improves Performance in Deep Learning, (2018). Access: towardsdatascience.com

[14] Munich Bike City Map. Access: geoportal.muenchen.de

39
40 Contact Liubov 2.03762097KniazevaSemesterMaster Vanessa Exchange03764306MagloireSemester Master
41

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.