A Non-Stationary Online Learning Approach to Mobility Management
Abstract: Efficient mobility management is an important problem in modern wireless networks with heterogeneous cell sizes and increased node densities. We show that optimization based mobility protocols cannot achieve long-term optimal performance, particularly for ultra-dense networks in a time-varying environment. To address the complex system dynamics, especially the possible change of statistics due to user movement and environment changes, we propose piece-wise stationary online learning algorithms to learn the varying throughput distribution and solve the frequent handover problem. The proposed MMBD/MMBSW algorithms are proved to achieve sub linear regret performance in finite time horizon and a linear, nontrivial rigorous regret bound for infinite time horizon. We also study the robustness of the MMBD/MMBSW algorithms under delayed or missing feedback. The simulations show that the proposed algorithms can outperform 3GPP protocols with optimal thresholds. More importantly, they are more robust to system dynamics which are commonly present in practical ultradense wireless networks. Existing system:
Furthermore, ultra-dense deployment makes the problem even harder, as user equipments (UE) in ultradense networks (UDN) can have many possible serving cells, and mobile UEs may trigger very frequent handovers even without much physical movement. Simply applying existing macro solutions leads to a poor SBS mobility performance. To address these challenges, research on SBS mobility management has attracted a lot of attention recently the research has mainly utilized optimization theory. The formulated problem is generally non-convex and optimal or suboptimal solutions have been proposed. In, a utility maximization framework is formulated for the optimal user association which accounts for both user’s RF conditions and the load situation at the BS. Proposed system: There are a few works applying the MAB algorithms to address this challenge. The authors of utilize context-aware bandits to learn the optimal cell range expansion parameter. The results of have established the equivalence between 3GPP handover protocols and bandit learning algorithms and proved its sub-optimal performance. The authors of improve the applicability of MAB-based mobility solutions by relaxing the stochastic assumptions of the performance associated with each SBS. A similar approach has been proposed in for the design of user association policies in enterprise small cell networks. Recently, the authors of have applied multi-user deep reinforcement learning to study the mobility management problem, in order to lower the handover rate while ensuring good system throughput. Advantages: The authors of improve the applicability of MAB-based mobility solutions by relaxing the stochastic assumptions of the performance associated with each SBS. A similar approach has been proposed in for the design of user association policies in enterprise small cell networks. Recently, the authors of have applied multi-user deep reinforcement learning to study the mobility management problem, in order to lower the handover rate while ensuring good system throughput. Disadvantages:
Despite the suitability of online learning models to mobility management, one limitation of the aforementioned solutions is that they are mostly designed for stationary user association problems. In a way, they do not directly handle the dynamics caused by user movement or environment changes. This is mainly because classical online learning models such as MAB or Markov Decision Process (MDP) assume that there exists a fixed reward distribution, and the player’s goal is essentially to learn this distribution or the action for each state. Modules: Markov decision process: Despite the suitability of online learning models to mobility Management, one limitation of the aforementioned solutions is that they are mostly designed for stationary user association problems. In a way, they do not directly handle the dynamics caused by user movement or environment changes. This is mainly because classical online learning models such as MAB or Markov Decision Process (MDP) assume that there exists a fixed reward distribution, and the player’s goal is essentially to learn this distribution or the action for each state. In practice, however, a more common scenario is that because of the user movement or environment changes, the stochastic reward (e.g., SINR) for each candidate BS will also have a varying distribution. This challenge the learning-based mobility solutions to not only be able to gradually learn the behavior of each candidate BS, but also have the capability to track their variations over time. Piece –wise stationary model: In this work, we address the UDN mobility management problem with the objective of maximizing the long-term throughput, by using a time-varying stochastic bandit setting to model the performance of each candidate SBS. More specifically, we propose to adopt a piece-wise stationary model to capture the timevarying throughput of a particular SBS who is selected to serve the user. This simplified model can be a good approximation for deployment where the RF conditions vary slowly, but abrupt changes can happen due to blockage or shadowing (e.g., millimeter wave indoor base stations [15]). To the best of the
authors’ knowledge, there is no prior work applying time-varying bandit models to mobility management, which is a fundamental deviation from the previous stochastic and non-stochastic mobility solutions. Table I presents a comparison of this work to the existing literature. Ultradense network: Furthermore, ultra-dense deployment makes the problem even harder, as user equipments (UE) in ultradense networks (UDN) can have many possible serving cells, and mobile UEs may trigger very frequent handovers even without much physical movement. Simply applying existing macro solutions leads to a poor SBS mobility performance .To address these challenges; research on SBS mobility management has attracted a lot of attention recently. The research has mainly utilized optimization theory. The formulated problem is generally non-convex and optimal or suboptimal solutions have been proposed. In, a utility maximization framework is formulated for the optimal user association which accounts for both user’s RF conditions and the load situation at the BS. The authors of adopt the same philosophy and extend the utility maximization framework to a visible light communication system. For energy efficient user association, the authors of aim at maximizing the ratio between the total data rate of all users and the total energy consumption for downlink heterogeneous networks. Frequent handover: These existing solutions have been proved effective for less-densified heterogeneous networks, but they may perform poorly when the network density becomes high. Examples include the so-called frequent handover (FHO), PingPong (PP), and other handover failures (HOF) problems, which commonly occur when the UE is surrounded by many candidate BSs. In this scenario, a UE may select its serving SBS based on some optimization criterion, e.g., the strongest biased signal strength as is adopted in 3GPP, or other system metric. However, system dynamics such as movement of the UE or its surrounding objects can quickly render the solution sub-optimal, triggering the user handover procedure in a frequent manner. Motivated by these issues, a different approach which models SBS mobility management as an online learning problem has been developed in recent years.