Random Walk Survey by Melih Sözdinler

Survey:Random Walk Melih S¨ozdinler1 Computer Engineering Bogazi¸ci University Bebek,Istanbul 34342 Turkey

Abstract. Random Walk is a concept that is surely used in many topics of Computer Science and it has a lot more contrbutions to do in the future. In this survey paper we are going to deal with the concept of random walk, its application areas and the future contributions. We also give some experimental details to show the nature of random walk.

Introduction

In computer science the basic approach to the random walk is in 1D. According to this approach, basically someone flips a coin or chooses random number R. At the coin example, if the coin is head, we will do something, such as moving right else we will do something else such as moving left. Furthermore, let’s consider a constant C if R ≥ C move right else move left. That is the concept of random walk in 1D. When dimension changes basically the moving options increases exponentialy. In the literature, random walk is used in many areas from bioinformatics to complex systems. It seems to be basic but its contributions are countless. The motivation behind the random walk is to create stochastic movements or step-by-step hops. Due to its nature, it is perfect for modelling decision problems and because of randomness one can not say ”I found a deterministic function for random walk”. Modelling the random walk is also well studied. The basic version depends on the probability distribution function of flipping a coin simply head-and-tail in 1D, but it might be the case that n dimensions and Gaussian Distribution. In 1905 Pearson introduced the term ”random walk” He was interested in describing the spatial/temporal evolution of mosquito populations invading cleared jungle regions. But his first findings implies that the problem is too complex to solve deterministically. Rayleigh was also worked over √ these before Pearson and He formulated the 1D Random Walk with Gaussian having spreading around the σ = n. Bachelier gave the probability distribution of walkers after n steps (discrete) or time, t (continuous) as R +∞ R Pn+1 (x) = −∞ P1 (x − x0 )Pn (x)dx0 or P (x, t + γ) = P (x − x0 , γ)P (x0 , t)dx0 . Einstein(1905) Gave a physical description of Brownian motion; Later experimentally verified by Perrin(1909). Both sets of work received Nobel Prize for providing proof for the atomic nature of matter Einstein’s assumptions (similar to Bachelier): 1. Many Independent random walkers 2. Each takes steps that, after some very small time interval, γ , can be considered independent in time 3. He used a continuum approach rather than discrete Basically the idea around Random Walk emerged. We will give some application areas about the topic. Random walks arise in the motion of particles under collision (such as Brownian motion), in gambling problems (the fortune of a (perhaps unfortunate) gambler), and in mathematical models in finance (such as the pricing of options).In economics it is used in order to model shares prices and other factors. In physics, random walks are used as simplified models of physical Brownian motion and the random movement of molecules in liquids and gases. See for example diffusionlimited aggregation. Also random walks and some of the self interacting walks play a role in quantum field theory.

Melih Sözdinler 2009800075

Fig. 1. Route of each agent in 2D Plot for Zero Start Agent

Furthermore, in complex systems it is used to model agent like systems such as crawling the web to determine some statistical results such as page ranking, estimating the size of the web. Moreover, in bioinformatics, researches are done such as biclustering, protein-protein interactions using random walks. In wireless networking, random walk is used to model node movement and agent based systems such as event handling and reporting may be done using random walk concept. Random walk is also used to model gambling. In this paper, we will give some experimental results about the behaviour of random walk in one dimension and we will also cover some specific applications such as random walk in modelling gambling and biclustering.

Experiments

We designed some set of experiments to understand random walk in 1D. We have two different implications using just one agent starting from the initial point 0 at each trial. In most of the cases we run the simulation for 100 movements of the agent. Implementation is done with LEDA C++ graph library [1]. We construct graph for the random walk and we extend the graph during the walk if it is necessary. We discover two things these are ”what happens until 100 movements?” and ”what is the most visited node during the random walk?”. This section divided into three parts, the first part is agent always starts from initial point lets say where p = 0. In the second part, at each 100 times random walk agent continues where it’s previous random walk ended. Finally, the agent chooses random node at the initial random walk graph and starts from this node. For all of these setup each run last 100 execution of 100 movements without violating the settings.

2.1

Zero Start Agent

First of all, in the Figure 1 and 2, you will see the how the agent makes its moves. The plot shows the current position on the line (vertical axis) versus the time steps (horizontal axis). From the non-linear plots, we know that there is no exact function that gives us the right position of the agent at time t. So we can propose some statistical behaviours. in the in the Figure 3, we can see the normal like peak around 0 since each time we start at point 0 and gradually agent goes to a point where p− (x) < 0 or p+ (x) ≥ 0. Interestingly, although the agent starts its route at 0 point, it does not have a mean center at 0. Its highest peak at point 7.

Melih Sözdinler 2009800075

Fig. 2. Route of each agent in 3D Plot fot Zero Start Agent

2.2

Continous Agent

In addition, we tested the behaviour of continous agent. Interestingly, maybe because of the length of our run, the agent makes fluctuations in terms of how often it visits the specific nodes in terms of the sum of number of visits. We make 5 different run, and we plotted at Figure 4. We have two peak around where p1 ≈ −100 and p2 ≈ −15. These two peak may be merged at the point pm ≈ −60, the disctinction may be the lack of runs since we have 5 different continous random walk. More interestingly, on the average agent spends more time at negative nodes. This is relavent for 5 different simulation and the random number generator and its distribution may have a tendency to make the agent spending more time at negative instances.

2.3

Random Restart Agent

Finally we tested the agent that have a random restart such that let S = pi , pi+1 , pi+2 , ..., pj be the set of points or nodes visited by the agent. Then we randomly choose a point pk where i ≤ k ≤ j. The agent continues its routing in 1D from that point pk . The resulting plot given at Figure 5. This time we found normal distribution with µ ≈ 0. Due to its nature of random walk, it is used to make a movements for some intervals. For this experiment since it begins with 0 at the beginning and at the next step we continue with a randomly chosen point pk , the tendency of making these intervals collected around 0. In Table 2.3, we give the intervals of these five different runs with 100 execution. The intervals are close to each other and the agent can not go somewhere else since random restart is bounding forces over the agent. Meanly, agent can not easly leave its interval and whenever it tries to break the bounds, the method does not allow it to go further. Random Walk Intervals for Random Restart Agent Interval 1 -82 ... 0 ... 71 Interval 2 -67 ... 0 ... 71 Interval 3 -57 ... 0 ... 73 Interval 4 -66 ... 0 ... 66 Interval 5 -64 ... 0 ... 69

Melih Sözdinler 2009800075

Fig. 3. Sum of the instances at point i for each Zero Start Agents that are routing

Random Walk Applications

There are variety of cases that use Random Walks. We do not consider all of these methods. We concentrate on two methods. The first one is Random Walk in Biclustering and the second one is Random Walk in Gambling. Biclustering is the topic introduced by [5] and first heuristic method is given by [3]. The problem is to find a maximumum submatrix instances while not violating the constraints. Constraints define a bicluser type such that submatrices could be all constant, constant rows, constant columns or coherent. The problem is proven to be NPHard. Interestingly, in [2], the biclustering problem is solved using a random walk. The method is so simple that on a 2D matrix, the agent makes random walks and try to extend the locally found submatrixes by looking at its neighbours while obtaining the similarity score at each extension. In Gambling random walks are used at ”Gambler’s Ruin” [6]. Suppose the case of flipping a coin and you start with a dollars, and repeatedly bet one dollar until you either reach 0 dollars (i.e. go broke), or reach c dollars (i.e. get rich), and then you stop. Suppose you have probability 1/2 of winning (or losing) each bet. When we ask ”What is the probability q of reaching c before 0?” Since on average we break even, the amounts of money we have form a martingale, i.e. a random sequence which stays the same on average. It follows that our average amount at the end should equal the amount we started with. That is q(c) + (1 − q)(0) = a, so that q = a/c. It is obvious that people tends to lose money when they desired higher c with their starting amount a. Interesting point is that when the case gambler have a = 0. This time casino ruins the gambler. Casino have the advantage that it can give as much as the gambler wins, and although the gambler earns money c − a > 0 at time t, one can not say that at time t + k the money still remains. This is strongly related the results that we obtained at the experimental section. Vice versa if the gambler have a = 0 at time t, the casino is ruined the gambler. The money is exactly owned by casino.

Conclusion

In this paper, we give some aspects of the ”Random Walk” phenomenon with giving some experimental results and two specific basic applications. We believe that biology [4] and complex systems has still much more attractive points that could be simulated using ”Random Walk”. Because of its nature and quickly adaptability to many systems we thought that ”Random Walk” phenomenon will be still attractive in the future.

Melih Sözdinler 2009800075

Fig. 4. Route of each agent in 2D Plot for Continous Agent

Fig. 5. Route of each agent in 2D Plot for Random Restart Agent

References 1. Leda c++ algorithm library. http://www.algorithmic-solutions.com/. 2. F. Angiulli, E. Cesario, and C. Pizzuti. Random walk biclustering for microarray data. Inf. Sci., 178(6):1479–1497, 2008. 3. Y. Cheng and G. M. Church. Biclustering of expression data. In R. Altman, L. Bailey, Timothy, P. Bourne, M. Gribskov, T. Lengauer, and I. N. Shindyalov, editors, Proceedings of the 8th International Conference on Intelligent Systems for Molecular (ISMB-00), pages 93–103, Menlo Park, CA, Aug. 16–23 2000. AAAI Press. 4. E. A. Codling, M. J. Plank, and S. Benhamou. Random walk models in biology. Journal of The Royal Society Interface, 5(25):813–834, 2008. 5. J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123–129, 1972. 6. R. W. Shonkwiler and F. Mendivil. Random walks. pages 165–212. 2009.

Melih Sözdinler 2009800075