DEEP FLOAT 1.0 User Guide An Easy Deep Learning Tool In Your Internet Browser
Corentin Macqueron Sébastien Fragnaud May 2017 1
1. What’s Deep Float? Deep Float [1] is a simple and user-friendly web application that brings you the power of state-of-the-art Deep Learning tools and techniques, allowing you to perform non-linear and multivariable regression modeling just within your internet browser: copy-and-paste your data and you are just a few clicks away from a highly-trained and predictive artificial neural network. Deep Float, named after Deep Learning and the fact that it manipulates numbers often referred to as Float in programming language, runs Python code from Keras [2], TensorFlow [3] and Scikit Learn [4] on a remote server using the Heroku [5] cloud application platform. Deep Float uses the multilayer perceptron feedforward neural network approach to make prediction in physics, biology, climatology, economics or any other system where complex causes and effects may hide under the ‘curse of dimensionality’. Please be aware that in its 1.0 version, Deep Float is still a prototype. You may encounter some bugs. 2. Why Deep Float? Deep Float was developed following the observation that, despite the numerous Machine Learning tools available, to our knowledge, no existing tool can easily perform Deep Learning with a simple copy-andpaste and a few clicks and, in our opinion, being able to perform such a task was a legitimate need. Deep Float is hence not designed for specialists. It is designed for the needy who is considering Deep Learning as an option for his problem but who does not have the mathematical, computer or programming skills1 to use the existing tools and who does not have the time or the motivation to go that way. 3. What’s Deep Learning? Deep Learning is a sub-branch of Machine Learning, which is an artificial intelligence technique dedicated to learn complex behaviours in order to make predictions [6][7]. It can be applied to virtually any topic, such as economics, physics, social sciences, epidemiology, climatology, and so on. It aims at ‘understanding’ a phenomena in order to be able to ‘reproduce’ it accurately. 1
Or just the money, even if many Machine Learning tools are free.
2
For instance, you may want to be able to predict tomorrow’s weather. To do so, you can rely on ‘traditional’ techniques used in meteorology, or you can use Deep Learning. This would mean to pass to an artificial neural network the knowledge of recorded weather as a function of multiple parameters such as temperature, pressure or humidity (the same parameters used in ‘traditional’ weather science). The network would then learn how the weather is linked to these parameters and, once trained, the network would be able to make predictions (weather forecasts) in a very different way than the ‘traditional’ techniques do: the network would learn the connections between inputs and outputs solely based on observable evidence without any consideration for the underlying physics, while the ‘traditional’ techniques would precisely (and only) do so (they would for instance run massive calculations based on fluid dynamics equations). An artificial neural network can theoretically learn any relationships between inputs and outputs, whatever the difficulties (a tremendous number of variables and/or insanely complex relationships between the variables). You ‘just’ need to have enough training examples for the network, both qualitatively and quantitatively. 4. How does Deep Learning work? Given the astonishing capabilities of Deep Learning [6][7], one may think of it as a very complicated stuff, but an artificial neural network is actually quite simple: it is merely a series of matrix products with a nonlinear transformation in between each product. The matrices are called the ‘layers’ of the network and the coefficients of the matrices are called the ‘neurons’ of the layers. Each matrix product creates an output which is a combination of its inputs as a function of the matrix coefficients. Each output is passed onto the next matrix product as new inputs, creating more and more ‘richness’ in the inputs combination as the number of matrix products increases. This process can reproduce virtually any relationships between the variables, provided there are enough coefficients and matrices and that the relationships are linear (because the matrix product is a linear operation). The nonlinear transformation of the output between each matrix product is called the ‘activation function’ and shatters the fundamental linear nature and limitation of the matrix product. The activation function makes the 3
artificial neural network capable to reproduce virtually anything, whatever the number of variables and the difficulties of the nonlinearities. Deep Float uses the ReLU (Rectified Linear Unit) activation function. Despite its name, it can produce nonlinearities. Of course, the coefficients of each matrix have to be determined, and this is where the mathematics comes in. Put simply, all the coefficients are initialized and the network is run: the matrix products with the nonlinear transformations are calculated. The output is compared to the desired results of the training examples. The error between the prediction and the desired result is ‘retro-propagated’ into the network in order to enhance it by adjusting the values of the coefficients. This operation is a complex and computationally expensive task and depends on the selected algorithm. We will not go deeper into the details of the mathematics that can be found in references such as [8] (Deep Float uses the ADAM algorithm [9]). One cycle of this prediction-correction scheme is called a ‘learning cycle’. A network is ‘trained’ through multiple learning cycles in order to minimize the error between the prediction and the desired result. After training completion, the network is ready to make new predictions based on new inputs. These new predictions can be obtained almost instantaneously or, at the very least, much quicker than with ‘traditional’ techniques. The training, on the other hand, can be long. 5. Getting Started As an artificial network links inputs and outputs, you need to feed it with data. Deep Float needs numbers. Not pictures. Not PDFs. Not calendars or cucumbers. Just numbers. You may work on something else by converting it to numbers. To give your data to Deep Float, just copy-and-paste it. The data must be given in the TSV format (Tabulated Separated Values). A software like Microsoft Excel produces TSV data, so a simple copy-andpaste from your spreadsheet will be fine. The decimal separator must be ‘.’ and not ‘,’. Let’s say that you want to teach Deep Float the following nonlinear function: f(x) = x2 4
This is obviously a dummy example but this will be all you need to know to go on with your own data: instead of trying to link f to x, you may want to try to correlate the weather temperature of tomorrow from today’s temperature and humidity (this would probably not work but you could try). Open your Excel file and create 1 column for x on an arbitrary domain from 0 to 9 for instance: X 0 1 2 3 4 5 6 7 8 9
This table is your ‘input data’. It has 10 lines, which is your number of ‘cases’, and 1 column, which is your number of input variable(s). You could have much more lines and columns. For instance, if you had 227 cases and 7 input variables you would have 227 lines and 7 columns. Now proceed to create the f function: f 0 1 4 9 16 25 36 49 64 81
This table is your ‘output data’. It has obviously 10 lines as well. This is mandatory: your output data must have the same number of lines as your input data. You could have much more lines and columns. For instance, if you had 132 cases and 3 output variables you would have 132 lines and 3 columns. 5
Something you do not see here is that Deep Float internally scales your data between 0 and 1 in order to help with the mathematics robustness. Now open Deep Float and go to ‘create a model’. In the Data section, copy-and-paste the first table in the ‘input data’ field and the second table in the ‘output data’ field (Figure 1). Just copy the numbers. Leave all the text (‘x’, ‘f’) aside.
Figure 1 – Inputs and outputs
Deep Float should tell you that everything is fine by recognizing that you have 10 cases, 1 input and 1 output. The next section is Parameters where you setup your neural network (Figure 2).
Figure 2 – Parameters setup 6
You specify the number of layers, neurons per layer, learning cycles and the percentage of data kept to validate the model. If you decide to keep 20% for instance, Deep Float will randomly remove 20% of the data (which would be 2 cases here). These data will not be ‘seen’ by the network so it will not be trained with it. This is very important because a neural network can sometimes learn his lesson so well that it will only know it ‘by heart’ and will be incapable to make new relevant predictions. It would be like learning by heart a Spanish dictionary and pretend that you learnt Spanish, which would obviously be silly: it’s better to have less vocabulary but to be able to make real meaningful sentences in a conversation with a real Spanish person. When a neural network does that silly thing, it is called ‘overfitting’: you trained your network too much. Randomly removing some of the data for the network not to ‘see’ it during the learning phase, and then asking the network to make a prediction about these data (from which only you would know the ‘answer’) is a way to check the true capability of your network to make relevant predictions instead of ‘silly’ ones. When your network is setup you can begin the learning process by hitting the RUN button. You will see in (almost) real time the error made by your network (the Mean Squared Error to be precise). It should decrease and stabilize at a low value. What ‘low’ and ‘stabilize’ can possibly mean is of course open to interpretation like the Bible or the Qur’an. Sometimes the error does not decrease and just oscillates. This can happen for numerical reasons and in this case it may be wise to just reload the page. When the learning cycles are over, you will see a graph that represents what the network has learnt versus what it has been taught, and what the network predicts versus the correct answers on the unseen data you removed at the beginning (‘blind test’). For obvious reasons2, you expect all the dots to be aligned with the ‘identity line’ (y = x). If the dots from the learning data are aligned with the identity line but the dots from the ‘blind test’ are not, it could mean that your network is overfitting. In this case, you may have to go back with a simpler and lighter network If this is not obvious for you, you may have nothing to do here. All jokes aside, if the network is supposed to find y = 36 when x = 6, if the network finds 35.89, this is a very good prediction and a plot of 35.89 versus 36 will be very close to the identity line. A very bad prediction, like 9.2 instead of 36, will be very far from the identity line. This graph hence gives a very good idea of the network quality. 2
7
structure (less layers, less neurons per layer). You need your network to learn the meaning of your data, not the irrelevant details. For instance, if you have some kind of pressure measurements, you want your network to learn the principal physical tendencies of your pressure signal, not the small chaotic oscillations that are probably just noisy garbage. Your network does not know what is relevant and what is not, only you do. Blind data not aligned with the identity line can surely mean overfitting, but this is not always the case. Suppose you do not have so much data and the relationship to learn is complex. In this case your network may just still be fragile in its learning and you may just be asking too much of it with too few data. Or you could have some abnormal cases not fitting well within the rest of the data. Or, without being abnormal, these cases may be ‘special’ (located in an area of step gradient not well learnt by the network). If you are not satisfied with the results, you can modify the setup of your network and hit RUN AGAIN until you are. Same inputs and setup can lead to different results due to the randomness of the matrix coefficients initialization, numerical procedures and blind test data choosing. There are not crystal clear rules to follow as for the numbers of layers and neurons. Teaching a network is a hit-or-miss and a die-and-retry process. The good practice is to start small with few layers and few neurons, run, assess and progressively go wider (by adding neurons per layer) and deeper (by adding layers). The number of learning cycles has to be setup in order to stabilize the error. If the results are not satisfying and the error is not stabilized at the end of the learning cycles, do not change the network, you may just add cycles. Sometimes a wider network will be better than a deeper one, anything is possible: this is the fuzzy art and beauty of Deep Learning. You can see the results on the example with 4 layers, 25 neurons per layer, 1000 learning cycles and keeping 20% of the data for validation on Figure 3. The error (‘loss’) diminishes and stabilizes. ‘Learnt’ data (red dots) are well aligned with the identity line, whereas the blind data (‘check’, purple dots) are not so well aligned with it. It is now your job to conclude. 8
Figure 3 – Error (‘loss’) and results on learning and validation data
When you are satisfied with the results, you can download the network in a *.df format for later use and/or proceed by hitting GO TO PREDICTION. Just type (or copy-and-paste) your inputs and Deep Float will give you its predictions. You can test it again on known data to check for accuracy, and make predictions for unknown cases. You can make predictions on several new cases as shown on Figure 4 where 7 predictions are made at the same time. The network is not very accurate for ‘low’ values of x but is not so bad for ‘high’ values (remember that the network is trained to minimize the error, not the relative error).
Figure 4 – Predictions
If you want to go outside the boundaries that your network has been trained on (here, below x = 0 or above x = 9), please feel free to do so but keep in mind that the results may be good, so-so or completely wrong. Here, as you can see on figure 4, the network is still fine for x = 10. An additional test might show you that it has completely lost 9
track for x = 100. The bottom line is that you can never be 100% confident with a neural network result; you just assess your network and accept the risk. Also bear in mind that your network may give you an answer even with impossible inputs, such as a negative diameter. The result will be silly, and this time only because you are the silly one. In the end, it is your responsibility to gather relevant and sufficient data and to teach, assess and use your artificial neural network in a smart manner. 6. Examples Here are some examples taken from real life in order to show what can be done with Deep Float in several domains: https://deep-float.herokuapp.com/examples The proposed parameters can give a rough idea but should not be considered as optimal. a. BOSTON The Boston dataset is about socio-economics and is taken from the UCI Machine Learning Repository [10]. It offers to correlate the real-estate value in Boston with 13 parameters such as per capita crime rate, pupilteacher ratio, proportion of black people (sic), etc. Using Deep Float, you can obtain good results with 1 layer of 40 neurons, 1000 learning cycles and keeping 20% of the data for validation (Figure 5).
Figure 5 – BOSTON results 10
b. PIMA The PIMA dataset is about health science and is taken from the UCI Machine Learning Repository [10]. It offers to correlate the diabetes diagnostic of Pima Indians to 9 parameters such as plasma glucose concentration, age, diastolic blood pressure, etc. Using Deep Float, you can obtain good results with 3 layers, 32 neurons per layer, 300 learning cycles and keeping 20% of the data for validation. The Deep Float results have to be cut-off with a value of 0.5 before being compared to the 0-1 diagnostic results. You can obtain ~80% of accuracy at the diabetes test. c. USS The USS dataset is about physics and is built with the Richardson and Zaki formulae (1954). It offers to correlate the terminal falling velocity of a particle swarm in a quiet fluid to 5 parameters such as particles diameter and density, fluid viscosity, etc. Using Deep Float, you can obtain good results with 4 layers, 48 neurons per layer, 2000 learning cycles and keeping 10% of the data for validation (Figure 6).
Figure 6 – USS results
d. MAK The MAK dataset is about physics and is built with computational fluid dynamics simulations validated with the Tsz-Chunk Mak experiments (1992). It offers to correlate the quality (in terms of RSD: Relative 11
Standard Deviation) of a solid-liquid suspension to 6 parameters such as particles diameter, fluid viscosity, stirrer diameter and velocity, etc. Using Deep Float, you can obtain good results with 3 layers, 36 neurons per layer, 1000 learning cycles and keeping 20% of the data for validation (Figure 7).
Figure 7 – MAK results
e. DECO The DECO dataset is about physiology and is built with the French MN90 scuba diving tables. It offers to correlate the Total Ascent Time of a scuba diver to 2 parameters (depth and time spent underwater) according to the decompression theory. Using Deep Float, you can obtain good results with 1 layer of 16 neurons, 2000 learning cycles and keeping 20% of the data for validation (Figure 8).
Figure 8 – DECO results 12
7. About Us Corentin Macqueron is an engineer in computational fluid dynamics. He had the idea of Deep Float and copy-and-pasted the scientific core algorithms that power it, trying to prove once and for all that he is not a fraud. He can be contacted at corentin.macqueron@gmail.com. Sébastien Fragnaud is a software engineer specialized in data visualization. He developed the Deep Float GUI as an attempt to look like he gets computer stuff. He can (but doesn’t want to) be contacted at dolor_sit_amet@protonmail.ch. Both are French, born in 1985 and sauna addicts. 8. References [1] https://deep-float.herokuapp.com/ [2] https://keras.io [3] https://www.tensorflow.org [4] https://scikit-learn.org [5] https://www.heroku.com [6] J. Howard, The wonderful and terrifying implications of computers that can learn, TED, 2014 https://www.ted.com/talks/jeremy_howard_the_wonderful_and_terrifying _implications_of_computers_that_can_learn?language=en [7] L. Fei-Fei, How we’re teaching computers to understand pictures, TED, 2015 https://www.ted.com/talks/fei_fei_li_how_we_re_teaching_computers_to_ understand_pictures?language=en [8] http://www.deeplearningbook.org/ [9] D. P. Kingma, J. Lei Ba, ADAM: A Method for Stochastic Optimization, ICLR, 2015, https://arxiv.org/abs/1412.6980 [10] http://archive.ics.uci.edu/ml/
13