Galley Proof
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 1
1
Journal of Intelligent & Fuzzy Systems 18 (2007) 1–8 IOS Press
Anomaly detection in mobile communication networks using the self-organizing map Rewbenio A. Frota∗ , Guilherme A. Barreto and Jo˜ao C.M. Mota Department of Teleinformatics Engineering, Federal University of Cear a´ (UFC), CP 6005, CEP 60455-760, Fortaleza, Ceara´ , Brazil
Abstract. Anomaly detection is a pattern recognition task whose goal is to report the occurrence of abnormal or unknown behavior in a given system being monitored. In this paper we propose a general procedure for the computation of decision thresholds for anomaly detection in mobile communication networks. The proposed method is based on Kohonen’s Self-Organizing Map (SOM) and the computation of nonparametric (i.e. percentile-based) confidence intervals. Through simulations we compare the performance of the proposed and standard SOM-based anomaly detection methods with respect to the false positive rates produced.
1. Introduction The multi-service character of today’s mobile radio technology brings totally new requirements into network optimization process and radio resource management algorithms, differing significantly from traditional speech-dominated technologies [13]. One of these new aspects is related to the quality of service (QoS) requirements. Because of them, operation and maintenance of such cellular networks is becoming more and more challenging, since mobile cells interact and interfere more, have hundreds of adjustable parameters and monitor and record several hundreds of different variables in each cell, thus producing a huge amount of data. Considering scenarios with thousands of cells, it is clear that for optimum handling of the radio access network (RAN), effective data mining methods for performance analysis based on Key Performance Indicator (KPI) are required. KPIs are a set of essential measurements which summarize the behavior of the cellular network of interest, and can be used for system acceptance, benchmarking and system specification. A good choice of KPIs to monitor and analyze collected data are crucial to understand the reasons for the var∗ Corresponding
author. E-mail: rewbenio@deti.ufc.br.
ious operational states of the cellular network, noticing abnormal behaviors, analyzing them and providing possible solutions. Data mining is an expanding area of research in artificial intelligence and information management whose objective is to extract relevant information from large databases [4]. Typical data mining and analysis tasks include classification, regression, and clustering of data, aiming at determining parameter/data dependencies and finding various anomalies from the data. In this paper, we are interested in the clustering capabilities of the Self-Organizing Map (SOM) [8] applied to the detection of anomalous states of a CDMA2000 cellular systems. The SOM is an important unsupervised competitive learning algorithm, being able to extract statistical regularities from the input data vectors and encode them in the weights without supervision [14]. Such a learning machine will then be used to build a compact internal representation of the cellular network, in the sense that the data vectors representing its behavior are projected onto a reduced number of prototype vectors (each representing a given cluster of data), which can be further analyzed in search of hidden data structures. In addition to data clustering, the SOM is also widely used for visualization of cluster structures [2]. This visualization ability is particularly suitable to network optimization purposes, as discussed in a number of re-
1064-1246/07/$17.00 2007 – IOS Press and the authors. All rights reserved
Galley Proof
2
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 2
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map
cent studies [9,10,15]. In addition, Laiho et al. [9] also applied the SOM to monitoring objects in the cellular network, such as base stations and radio network controllers, looking for anomalous or abnormal behaviors. They argue that this approach makes it much easier to monitor a large amount of cells in a network, since only abnormal observations have to be examined. By the same token, we propose a general procedure to detect abnormal behavior of a CDMA2000 cellular network using the SOM as a data modelling tool and nonparametric (i.e. percentile-based) confidence intervals to define reliable decision thresholds. Another contribution of this paper is the performance comparison of four different methods for computing decision thresholds for anomaly detection purposes, using the SOM as the common clustering algorithm. The remainder of the paper is organized as follows. In Section 2, we describe the fundamentals of the SOM algorithm. In Section 3, we review standard methods for computing single decision thresholds for anomaly detection tests. In Section 4, we introduce a percentilebased technique for computing interval-based decision thresholds for anomaly detection tests. Computer simulations are presented in Section 5 and the paper is concluded in Section 6.
2. The self-organizing map The Self-Organizing Map (SOM) is one of the most popular neural network architectures, usually designed to build an ordered representation of spatial proximity among vectors of an unlabelled data set. Neurons in the SOM are put together in an output layer, A, in one-, two- or even three-dimensional arrays. Each neuron i ∈ A is associated to a weight vector w i ∈ n with the same dimension of the input vector x ∈ n . Network weights are trained according to a competitive-cooperative scheme in which the weight vectors of a winning neuron and its neighbors in the output array are updated after the presentation of an input vector. Roughly speaking, the functioning of this type of learning algorithm is based on the concept of winning neuron, defined as the neuron whose weight vector is the closest to the current input vector. During the learning phase, the weight vectors of the winning neurons are modified iteratively in order to extract average features from the set of input patterns. The SOM has been widely applied to pattern recognition tasks, such as clustering and vector quantization. In these applications, the weight vectors are called pro-
totypes or centroids of a given class or category, since through learning they become the most representative element of a given group of input vectors. Using Euclidean distance, the simplest strategy to find the winning neuron, i ∗ (t), is given by: i∗ (t) = arg min x(t) − wi (t) , ∀i
(1)
where t denotes the current iteration of the algorithm. Accordingly, the weight vectors of the neurons are modified by the following recursive equation: wi (t+1) = wi (t)+η(t)h(i∗ , i; t)[x(t)−wi (t)],(2) where h(i∗ , i; t) is a Gaussian function which control the degree of change imposed to the weight vectors of those neurons in the neighborhood of the winning neuron: ri (t) − ri∗ (t) 2 ∗ h(i , i; t) = exp − , (3) σ 2 (t) where σ(t) defines the radius of the neighborhood function at iteration t, and r i (t) and ri∗ (t) are the coordinates of neurons i and i ∗ in the output array, respectively. The learning rate, 0 < η(t) < 1, should decay with time to guarantee convergence of the weight vectors to stable states. In this paper, we use (t/T ) η(t) = η0 (ηT /η0 ) , where η0 is the initial value of η, and ηT is its final value after T training iterations. The variable σ(t) should decay in time in a similar fashion. During training, due to the joint use of a shrinkingwidth neighborhood function and a decreasing learning rate, the SOM implements a sort of soft competitive learning, thus decreasing the dependency on weight initialization and the occurrence of dead units. 1 Soft competitive learning is closely related to Fuzzy clustering [1]. The use of a neighborhood function also imposes an order to the weight vectors, so that, at the end of the training phase, input vectors that are close in the input space are mapped onto the same winning neuron or onto winning neurons that are close in the output array. This is the so-called topology-preserving property of the SOM, which has been particularly useful for data visualization purposes [2]. Once the SOM has converged, regions in the input space X from which data vectors are drawn with a high probability of occurrence are mapped onto a larger set of neurons in the output space A, and therefore 1 Neurons
that never win a competition.
Galley Proof
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 3
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map
with better resolution than regions in X from which vectors are drawn with a low probability of occurrence. This density matching property is very important for anomaly detection purposes. For example, once the SOM has been trained with unlabelled vectors that one believes to consist only of data representing the normal state of the system being monitored, we can use the quantization error, e q (x, wi∗ ; t), defined by: eq (x, wi∗ ; t) = x(t) − wi∗ (t) n = (xj (t) − wi∗ j (t))2
(4)
j=1
as a measure of the degree of proximity of x(t) to the distribution of “normal” data vectors as encoded in the weight vectors of the SOM. From the distribution of quantization errors e q (x, wi∗ ; t), computed for all training vectors after the SOM has been trained, one can compute decision thresholds for the anomaly detection tests. Several procedures to compute such thresholds have been developed in recent years, most of them based on well-established statistical techniques [6]). In the following sections we describe some of these techniques in the context of SOM-based anomaly detection.
3. Computing single decision thresholds Detection tests based on a single decision threshold are the most common in anomaly detection applications, either using neural networks or statistical-based methods [7,9,11]. This kind of test evaluates if a new input vector x new is anomalous by performing the following hypothesis test: IF
eq (xnew , wi∗ ) < τ +
THEN
xnew is NORMAL
ELSE
xnew is ABNORMAL
Step 1: After training is finished, the quantization errors for the training set vectors are computed (e 1q , e2q , . . . , eN q ), defining the set . {eµq }N µ=1 Step 2: The quantization error e q (xnew , wi∗ ) of a new input vector is computed. Step 3: Define the null-hypothesis H 0 as: “the new input vector x new is normal”. Set τ + = N 100(1 − α)th percentile2 of {eµ }µ=1 , where 0 < α < 1 (e.g. α = 0.05 or 0.01) is the statistical significance of the test, a parameter associated to the probability of making a type I error (false alarm). Step 4: Set τ new = enew . Step 5: If τ new < τ + , then H0 is accepted; otherwise it is rejected. A significance level α = 0.05 is commonly used. Step 6: Steps 2–5 are repeated for every new input vector. According to the authors the system is very reliable and have presented acceptable rates of false negatives and false positives, concluding that these errors were due to natural changes in user profiles. Similar approaches have been applied to anomaly detection in cellular networks [10], time series modelling [3] and machine monitoring [5]. A single-threshold SOM-based method for anomaly detection in rotating machinery was proposed in [17]. The procedure follows the same steps described previously, except that the decision threshold is computed as follows. Once training is completed, for each new input vector we compute the distances D i∗ j (t) = wi∗ (t) − wj , ∀j ∈ V1 (t), where V1 (t) is the set of neurons in the immediate neighborhood of the winning √ neuron, i.e. rj − ri∗ (t) 2. The threshold for that input vector is taken as the maximum value among these distances: τ + = max {Di∗ j (t)} ∀j∈V1
(5)
where τ + is an upper distance-based threshold computed from the distribution of quantization errors of the training vectors. For example, in [7], the SOM is trained with data representing the normal activity of users within a computer network. The threshold τ + is determined by computing the statistical p-value associated with the distribution of training quantization errors. This anomaly detection procedure is implemented as follows:
3
(6)
Thus, if eq (xnew , wi∗ ) > τ + then the input vector carries novel or anomalous information. 4. Computing interval-based decision thresholds For certain applications, not only a very high quantization error is indicative of anomaly, but also a very 2 The
percentile of a distribution of values is a number Nα such that a percentage 100(1 − α) of the population values are less than or equal to Nα . For example, if α = 0.5 then Nα is the median of the distribution of values.
Galley Proof
7/02/2007; 14:38
4
File: ifs363.tex; BOKCTP/ljl p. 4
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map
small one. One can argue that a small quantization error means that the input vector is almost surely normal. This is true if no outliers are present in the data. In more realistic scenarios, there is no guarantee that the training data is outlier-free, and a given neuron could be representing exactly the region the outliers belong to. Since outliers can be handled directly by interval-based anomaly detection methodologies, they are more robust than single-threshold approaches as will be shown in the simulations. In this paper we propose a novel technique to detect anomalous state of a cellular systems by computing the decision intervals (DI) using the concept of nonparametric (i.e. percentile-based) prediction intervals of the set of quantization errors associated with the data vectors used for training the SOM. Hence, this distribution is computed after the training is finished using the training data vectors once again. No weight adjustment is performed during this stage. For a given significance level α, we are interested in an interval within which we can certainly find a percentage 100(1−α) (e.g. α = 0.05) of normal values of the quantization error. Hence, we compute the lower and upper limits of this interval as follows: – Lower Limit (τ − ): This is the 100 α2 th percentile of the distribution of quantization errors associated with the training data vectors. – Upper Limit (τ + ): This is the 100(1 − α2 )th percentile of the distribution of quantization errors associated with the training data vectors. which can then be used to classifying a new state vector into normal/abnormal by means of a simple hypothesis test: IF THEN ELSE
eq (xnew , wi∗ ) ∈ [τ − , τ + ] xnew is NORMAL x
new
(7)
is ABNORMAL
Note that percentiles are specified using percentages 100(1 − α), which may vary from 0 to 100. If a set of quantization error values, {e 1q , e2q , . . . , eN q }, associated with N training vectors is available, then the percentiles are computed as follows: 3 (1) The values in the set {e µq }N µ=1 are sorted in ascending order. 3 This
c. is the algorithm implemented in MATLAB
(2) The sorted values are taken as the 100(0.5/N), 100(1.5/N), . . ., 100((N-0.5)/N) percentiles. (3) Simple linear interpolation is used to compute percentiles for percentages 100(1 − α) between 100(0.5/N ) and 100((N − 0.5)/N ). (4) The minimum or maximum values of the set {eµq }N µ=1 } are assigned to percentiles for percentages outside the range 100(0.5/N ) and 100((N − 0.5)/N ). Methods combining the joint use of neural clustering algorithms with decision intervals are not very common in the machine learning community. Indeed, we were able to find only a previous work [12], in which the authors used the well-known box-plot technique to determine the interval [τ − , τ + ]. Our approach is much simpler and faster, since the interval [τ − , τ + ] is computed in a single stage from the set of N samples of quantizaN tion errors {e µ }µ=1 computed from each data vector in the training set, while in [12] the interval is computed only after a preprocessing stage in that outliers are removed from the training data set. This outlier removal procedure demands the computation of a Median Interneuron Distance matrix, defined as that whose m ij entry is the median of the Euclidean distance between the weight vector w i and all neurons within a certain neighborhood, and the Sammon’s mapping [16]. In this sense, the proposed approach is rather novel and suitable for on-line anomaly detection purposes.
5. Simulations Most of the hardware and software which support radio resource management (RRM) algorithms are located at the base station, so the robustness of this network element is a key issue to maintain the overall quality of the system. Thus, reliable detection of abnormal conditions at base stations is therefore an aspect that should be taken into account in system planning and design. Bearing this in mind, we have chosen the following KPIs: – Number of Users: Number of initial mobile users attempting to use the services of the cellular network. – Downlink Throughput: Total throughput in downlink direction, in Kb/s, summed over all the active links in the cell being analyzed. – Noise Rise: Ratio between overall interference and thermal noise in the analyzed cell, in dB.
Galley Proof
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 5
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map
5
SOM for Anomaly Detection 40 DI p-Value Boxplot Tanaka
35
30
False Positive Rates (%)
25
20
15
10
5
0
2 5
10
20
30
40
50
60
70
80
90
100
Number of Neurons
Fig. 1. False positive rates versus the number of neurons.
– Other-Cells Interference: This variable measures the interference (in the uplink direction) from other cells to the cell being analyzed, in dBm. Successive configurations of a CDMA2000 cellular network were simulated through a number of independent Monte Carlo runs (also called drops) of a static simulation tool, thus generating a set of state vectors, {x1 , x2 , . . . , xN }, representing normal and abnormal functioning of the cellular network. All the KPIs have been normalized to zero mean and unity variance. Quality parameters, such as the E b /NtTarget and maximum Noise Rise level, were set to 5 dB and 6 dB, respectively. Mobile users can be removed from the system by a power control algorithm. Due to the high number of input parameters that can be handled, some specific scenarios were selected, focusing on the analysis of traffic and CDMA interference behavior. Abnormal situations were generated by purposely simulating the occurrence of the following “defects” of the cellular system: (i) restriction of the number of Walsh codes available, (ii) reduction of the Base Station transmission power, and (iii) increase in the Noise Rise of the cellular system. These three anomalies can
distort the normal functioning of the cellular system. For example, in the power-control cycle, mobile units are removed from the system usually by the lack of available Walsh codes. The reduction in the transmission power of the Base Stations leads directly to a unusually low number of mobile units covered by the system. Finally, the increase in Noise Rise, which can be viewed as a foreign source of interference disturbing the system, has an effect of increase the power level of some mobile stations, causing their disconnection from the system by the control-power cycle. SOM training was carried out using a total of 400 “normal” state vectors. During testing, 70 normal and 30 abnormal state vectors were used to evaluate the SOM in the classification a given state vector x new as normal/abnormal using the single- and interval-based decision threshold methods described in Sections 3 and 4. The false positive rates obtained for the SOM-based anomaly detectors as a function of the number of neurons are shown in Fig. 1. By false positive we mean a normal state vector that was wrongly classified as an abnormal one. The SOM was trained for 50 epochs
Galley Proof
6
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 6
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map SOM for Anomaly Detection 40 DI p-Value Boxplot Tanaka
35
False Positive Rates (%)
30
25
20
15
10
5
0
1
10
20
30
40
50
60
70
80
90
100
Number of Epochs
Fig. 2. False positive rates versus the number of training epochs.
only with normal state vectors. It can be noted that the pair (SOM, DI), formed by the SOM and the proposed decision threshold method, produced the best false positive rates, for SOM architectures with more than 50 neurons, followed reasonably close by the pairs (SOM, Box-plot) and (SOM, p-value). The poor performance of the pair (SOM, Tanaka) for more than 50 neurons, revealed by its high false alarm rates, can be explained by the fact that as the number of neurons increases, the quantization error e q (xnew , wi∗ ) also tends to decrease, while the decision threshold τ + computed in Eq. (6) tends to remain approximately constant. So, as the network achieves a better representation of the data, it becomes more and more rare to observe eq (xnew , wi∗ ) < τ + . The opposite reasoning explains its good performance for SOM architectures with less than 50 neurons. The second set of simulations evaluates the sensitivity of the SOM-based anomaly detectors to changes in the number of training epochs, as shown in Fig. 2. The training parameters used were the same as those used for the first set of simulations, except that the number of neurons was fixed and set to 40. The overall per-
formances remain the same as in Fig. 1, with the pair (SOM, DI) achieving the lowest false positive rates for less than 50 training epochs. It is interesting to note that the performance of the pair (SOM, TANAKA) gets better as the number of training epochs increases. An intuitive explanation for this behavior may be based again on the very nature of Tanaka’s test. Once the SOM has more time to converge, it better fits the data manifold. Then, we can observe that the quantization error e q (xnew , wi∗ ) tends to decrease, while the detection threshold τ + computed in Eq. (6) tends to remain constant. So, as the network achieves a better representation of the data, it becomes more and more rare to observe e q (xnew , wi∗ ) > τ + , and hence the test is almost never positive for anomalies, even when the presented data vector is truly abnormal, so we will have less false positive (false alarms) cases. The third set of simulations compares the influence of the size of the training set on the SOM-based anomaly detectors. The purpose of these tests is to give a rough idea of which method requires less training data to obtain a high classification accuracy. In these simula-
Galley Proof
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 7
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map
7
SOM for Anomaly Detection 40 DI p-Value Boxplot Tanaka
35
False Positive Rates (%)
30
25
20
15
10
5
0 10
20
30
40
50
60
70
80
90
Size of Training Set (% of Total Data Set)
Fig. 3. False positive rates versus the size of the training set.
tions, the number of neurons and the number of training epochs were set to 40 and 50, respectively. In Fig. 3 we observed a decrease in the false positive rates with the growth of the training set. At the end, we observe a increase in these rates, caused by the need of more training epochs in order to handle better the larger amount of data. As a general conclusion we can state that the best overall detection performances were provided by the pairs (SOM, DI) and (SOM, Tanaka). However, it is worth noting that the detection test implemented by the pair (SOM, DI) is computationally faster than the pair (SOM, Tanaka), since the decision threshold of Tanaka’s method should be computed for every new input vector, while the two thresholds, τ − and τ + , of the proposed DI method are computed only once at the end of the SOM’s training phase.
nication network based on Kohonen’s Self-Organizing Map (SOM). We illustrate through simulations that the proposed performs better in average than standard methods with respect to the false positive (false alarm) rates produced. Currently we are developing SOM-based anomaly detectors for serially correlated data, such as financial time series, in order to detect change in regimes and novelties associated with the temporal evolution of a given stock market. Applications in engineering, such as detection of faults for an electric induction motor, are also being developed. References [1]
[2]
6. Conclusion and further work [3]
In this paper we proposed a general procedure to design anomaly detection systems for mobile commu-
A. Baraldi and P. Blonda, A survey of fuzzy clustering algorithms for pattern recognition-part II, IEEE Transactions on Systems, Man, and Cybernetics B-29(6) (1999), 786–801. A. Flexer, On the use of self-organizing maps for clustering and visualization, Intelligent Data Analysis 5(5) (2001), 373– 384. F. Gonzalez and D. Dasgupta, Neuro-immune and SelfOrganizing Map Approaches to Anomaly Detection: A Comparison, Proceedings of the First International Conference on Artificial Immune Systems (Canterbury, UK), 2002, 203–211.
Galley Proof
8
7/02/2007; 14:38
File: ifs363.tex; BOKCTP/ljl p. 8
R.A. Frota et al. / Anomaly detection in mobile communication networks using the self-organizing map
[4]
D.J. Hand, H. Mannila and P. Smyth, Principles of Data Mining, MIT Press, 2001. [5] T. Harris, A Kohonen SOM based machine health monitoring system which enables diagnosis of faults not seen in the training set, Proceedings of the International Joint Conference on Neural Networks, (IJCNN’93) 1 (1993), 947–950. [6] V.J. Hodge and J. Austin, A survey of outlier detection methodologies, Artificial Intelligence Review 22(2) (2004), 85–126. [7] A.J. H¨oglund, K. H¨at¨onen and A.S. Sorvari, A Computer Host-Based User Anomaly Detection System Using the SelfOrganizing Map, (Vol. 5), Proceedings of the IEEE-INNSENNS International Joint Conference on Neural Networks (IJCNN’00) (Como, Italy), 2000, 411–416. [8] T. Kohonen, The self-organizing map, Proceedings of the IEEE 78(9) (1990), 1464–1480. [9] J. Laiho, M. Kylv¨aj¨a and A. H¨oglund, Utilisation of Advanced Analysis Methods in UMTS Networks, Proceedings of the IEEE Vehicular Technology Conference (VTS/spring) (Birmingham, Alabama), 2002, 726–730. [10] J. Laiho, K. Raivio, P. Lehtim¨aki, K. H¨at¨onen and O. Simula, Advanced analysis methods for 3G cellular networks, IEEE Transactions on Wireless Communications 4(3) (2005), 930– 942. [11] H.-J. Lee and S. Cho, SOM-based novelty detection using
[12] [13]
[14]
[15]
[16]
[17]
novel data, Lecture Notes on Computer Science 3578 (2005), 359–366. A. Mu noz and J. Muruz¨abal, Self-organising maps for outlier detection, Neurocomputing 18 (1998), 33–60. R. Prasad, W. Mohr and W. Kon¨auser, Third Generation Mobile Communication Systems – Universal Personal Communications, Artech House Publishers, 2000. J.C. Principe, N.R. Euliano and W.C. Lefebvre, Neural and Adaptive Systems: Fundamentals Through Simulations, John Wiley & Sons, 2000. K. Raivio, O. Simula, J. Laiho and P. Lehtim¨aki, Analysis of Mobile Radio Access Network Using the Self-Organizing Map, Proceedings of the IPIP/IEEE International Symposium on Integrated Network Management (Colorado Springs, Colorado), 2003, 439–451. J.W. Sammon, Jr., A nonlinear mapping for data structure analysis, IEEE Transactions on Computers C-18 (1969), 401– 409. M. Tanaka, M. Sakawa, I. Shiromaru and T. Matsumoto, Application of Kohonen’s Self-Organizing Network to the Diagnosis System for Rotating Machinery, (Vol. 5), Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC’95), 1995, 4039–4044.