Proc. of Int. Conf. on Recent Trends in Information, Telecommunication and Computing, ITC
Effect of Decomposition of Classes on Software Architecture Stability Sumeet Kaur Sehra1, Harpreet Kaur2 and Dr. Navdeep Kaur3 1
3
Assistant Professor, GNDEC, Ludhiana-06, India 2 Student, GNDEC, Ludhiana-06, India Associate Professor, SGGSWU, Fatehgarh Sahib, India
Abstract— Stability of software is related to the decomposing the classes. In any software, major part of the code is suffers from the Yoyo problem with multiple issues related to readability of code, understandability of code as well as maintainability of code. Due to these issues, there is need to rethink, redesign, re-factor these pieces of code. The best way is to simplify the inter relationship of class objects in such a manner that code becomes concise with Liskov Substitution Principle by decomposition of classes. However this may lead to unknown or unwanted issues affecting the stability of overall application which may even lead to software erosion. Index Terms— Design Stability, Metrics, Software Architecture, dependency, Software Modularity.
I. INTRODUCTION Object-oriented programming (OOP) is expected to support software maintenance and reuse by introducing concepts like abstraction, encapsulation, aggregation, inheritance and polymorphism. However, years of experience have revealed that this support is not enough [1, 5, 12]. Whenever a crosscutting concern needs to be changed, a developer has to make a lot of effort to localize the code that implements it [10]. This may possibly require him to inspect many different modules, since the code may be scattered across several of them [4, 6, 11]. Object-oriented programming addresses three major software engineering goals namely modularity, extensibility and flexibility as shown in figure 1. An essential problem with traditional programming paradigms is of the dominant decomposition. No matter how well a software system is decomposed into modules, there will always be concerns (typically nonfunctional ones) whose code cuts across the chosen decomposition. The implementation of these crosscutting concerns will spread across different modules, which has a negative impact on maintainability, stability and reusability [8] Decomposition of classes is of importance to influence the extendibility and stability of system Architecture [5].Decomposing of classes support overall system modularity and minimize manifestation of ripple-effects in the presence of heterogeneous changes. It has been empirically observed that design stability is directly dependent on the underlying decomposition mechanisms[13]. For instance, certain studies have detected that the versatility of multiple inheritance is one of the main causes of ripple effects in OO System. Superior modularity and stability are obtained through the use of new composition mechanisms,. It is often claimed that such mechanisms support enhanced incremental development, and avoid early design degradation. Modularity has been playing a significant role in the context of software design. Many software engineering DOI: 02.ITC.2014.5.114 © Association of Computer Electronics and Electrical Engineers, 2014
Figure 1. Software Engineering Goals
methods and techniques are based on the premise that a modular structure of software can improve its quality to some extent [3]. According to a number of quality models, modularity is an internal quality attribute that influences external quality attributes such as maintainability and reliability [17]. It can be considered a fundamental engineering principle as it allows, among other things: • To develop different parts of the same system by distinct people. • To test systems in a simultaneous fashion. • To substitute or repair defective parts of a system without affecting with other parts. • To reuse existing parts in different contexts. • To restrict change propagation. Modularity can be defines as the degree to which a system program is composed of discrete components such that a change to one component has minimal impact on other components [9]. IEEE’s definition is closely related to Booch’s (1994) one, 37 which states that modularity is the property of a system that had been decomposed into a set of cohesive and loosely coupled modules. More recently, [7, 10] have defined a theory which considers modularity as a key factor to innovation and market growth. This theory can be applied to different industries, including software development. II. MODULARITY METRICS A. CK METRICS [Chidamber]
Chidamber and Kemerer proposed a first version of these metrics and later the definition of some of them was improved. Only three of the seven CK metrics are available for a UML class diagram. The Metrics are discussed in Table I. III . DECOMPOSITION PROCESS The data collected for the coupling, cohesion and size metrics have mostly favored the decomposition implementations [14]. In fact, decomposition mechanisms show improvements in modularity, despite some shortcomings in expressiveness. The use of the concept of modularity has been noticed to be employed in several scientific domains such as computer science, management, engineering, manufacturing, etc. While no single generally accepted definition is known, the concept is most commonly associated with the process of subdividing a system into several sub systems This decomposition of complex systems is said to result in a certain degree of complexity reduction and assist change by allowing modifications at the level of a single subsystem instead of having to adapt the whole system at once[14,16,18] .Even the subsystems in order to be assembled into one working system later on, while the other design parameters are only visible for a module itself. Modularity allows multiple (parallel) experiments [15]. Systems evolution is generally characterized by the following six modular operators 1. 1.Splitting a design and its tasks into modules 2. Substituting one module design for another 3. Augmenting, i.e., adding a new module to the system 426
TABLE I. CK METRIC
Sr,no 1
Metrics WMC
2
DIT
3
NOC
4
CBO
5
RFC
6
Number of Dependencies IN (NDepIN)
7
Number of Dependencies OUT
Definition the Weighted Methods per Class is defined as follows:WMC=ÎŁ ci; i=1 Where c1, ..., cn be the complexity of the methods of a class with methods M1, ...,Mn. If all method complexities are considered to be unity, the WMC = n, the number of methods7. The Depth of Inheritance of a class is the DIT metric for a class. In cases involving multiple inheritances, the DIT will be the maximum length from the node to the root of the tree. The Number of Children is the number of immediate subclasses subordinated to a class in the class hierarchy. Classes are coupled if methods or instance variables in one class are used by the other. CBO for a class is number of other classes coupled with it. Count of all methods in the class plus all methods called in other classes The Number of Dependencies IN metric is defined as the number of classes that depend on a given class. When the dependencies are reduced the class can function more independently. This metrics was introduced by Brian. The Number of Dependencies Out metric is defined as the number of classes on which a given class depends. When the metric value is less the class can function independently.
4. Excluding a module from the system, i.e., isolating common functionality in a new module, 5. Creating new design rules 6. Porting a module to another system. A. Class Decomposition Decomposition of classes, a problem is broken down into a set of sub problems according to the inherent class relations among data . In contrast to the explicit decomposition, this method requires only some common knowledge concerning the class relations among data. B. Inheritance Inheritance is the idea to inherit the properties of one set to another set [2]. This has also known as class composition again. For example, classes A contains two-member function ads and subtracts and class b contain two different functions multiply and divide. We want to use all these function with one object then we need to use inheritance where class B inherits all the property of class, which is public, but class B cannot use the private properties of class. IV. METHODOLOGY Methodology for calculating the stability of software consists of different steps which initializes from building dataset of the projects, then identifying the components. After that those parts are identified where degree of dependency is quite high. Then after calculating degree of stability, decomposition is done, and then further stability is re-calculated. The methodology for calculating the stability is shown in Figure 2 in form of a flow chart. 427
Figure 2. Methodology for calculating the stability of software
V. RESULTS The purpose of evaluating is to know which classifier is most accurate for automating. The process of finding instable and stable components in project is based on the metrics. The results shown below are based on dataset that are below and after application of decomposition. The performance of the different classifiers were compared amongst each other – Logistic, RBF, SVM, Naïve Bayes in terms of Mean Absolute Error, Area Under the Curve, False Positive, Kappa Statistics, Precision, Recall, Root Mean Square Error and True Positive Rate. The detailed results are discussed in the following sections. Stability is considered by many to be at the core of process management. It is central to each organization’s ability to produce products according to plan and to improve processes so as to produce better and more competitive products. Stability of a process with respect to any given attribute is determined by measuring the attribute and tracking the results over time. If one or more measurements fall outside the range of chance variation, or if systematic patterns are apparent, the process may not be stable. We must then look for the causes of deviation, and remove any that we find, if we want to achieve a stable and predictable state of operation. 428
A. Mean Absolute Error The mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given
Mean absolute error is an average of the absolute errors = | where fi is the prediction and yi is the true value. The closer the predictions to the actual value lesser will be the MAE.A comparative graph of the MAEs for the classifiers under investigation is shown in figure3.
Figure 3. MAE Comparative
Figure 4. A comparison of AUC
As is evident from the graph in Figure 3, the values of MAE for RBF and Logistic are very high, while in case of Naïve Bayes MAE is lowest followed by SVM. Since lesser MAE means the predictions are closer to the actual value so, Naïve Bayes classifier performs satisfactorily on this parameter, as MAE is lowest in this case. (Stability and instability of software) B. Area Under the Curve The area under the ROC curve (AUC) is a well-known measure of ranking performance, estimating the probability that a random positive is ranked before a random negative, without committing to a particular decision threshold. It is also often used as a measure of aggregated classification performance, on the grounds that AUC in some sense averages over all possible decision thresholds and operating conditions. AUC can be interpreted as the expected true positive rate, averaged over all false positive rates. For any given classifier we don’t have direct access to the false positive rate, and so we average over possible decision thresholds. The larger the area under the curve, better the performance. A comparison of AUC for different classifiers being studied is depicted in Figure 4. It indicates that AUC for all the classifiers does not vary significantly. As AUC represents the performance or AUC can be interpreted as the expected true positive rate, averaged over all false positive rates. Similar AUC trend for all the classifiers indicate that they are equally effective in predicting true positive rates averaged over all false positive rates. C. False Positive Rate The false positive rate (FP) is the proportion of negatives cases that were incorrectly classified as positive, as calculated using the equation:
b = Incorrect number of predictions that instance is negative. a = correct number of predictions that instance is negative. High value of false positive rate indicates that the algorithm is making large number of incorrect predictions and hence it is not reliable. 429
Figure 5. False Positive Rate for different algorithms
Figure 6. Kappa Statistics
Figure 5 indicates that RBF and SVM classifiers exhibit low false positive rate. RBF showing the lowest FPR. FPR in case of Logistic and Naïve Bayes algorithm are highest. Low false positive rate is an indication of high level of accuracy. D. Kappa Statistics Kappa statistics is a measure of inter-rater agreement or inter annotator agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. Kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories. Landis and Koch characterized values < 0 as indicating no agreement and 0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1 as almost perfect agreement. In order to judge the accuracy of different, kappa statistics were used. Value of Kappa Statistic is equal to 1 for Naïve Bayes and RBF indicating as almost perfect agreement. Figure 6 depicts the results for kappa statistics. E. Root Mean Square Error The Root Mean Square Error (RMSE) (also called the root mean square deviation, RMSD) is a frequently used measure of the difference between values predicted by a model and the values actually observed from the environment that is being modelled. These individual differences are also called residuals, and the RMSE serves to aggregate them into a single measure of predictive power. The RMSE of a model prediction with respect to the estimated variable Xmodel is defined as the square root of the mean squared error: n
RMSE =
∑i =1
(X obs,i
−X
mo del ,i
)2
n
Where Xobs is observed values and Xmodel is modelled values at time/place i. Figure 7 depicts root mean squared error for the classifiers that are being studied. Since root mean squared values represent difference between the prediction and reality, lower the RMS values closer is the prediction to reality. Figure 7 reveals that for Logistic and SVM the Root mean square value is high which can be due to large difference between observed and model value. Or it can be due to small value of n by which value of overall term is high. And for RBF and NB the RMSE value is very low, RMS is lowest for Naïve Bayes which may be due to the small difference in the value of observed and model value. F. True Positive Rate The true positive rate (TP) is the proportion of positive cases that were correctly identified, as calculated using the equation:
430
Figure 7. Root mean squared error for different classifiers
Figure 8. True Positive rate for different classifiers
d is the number of correct predictions that an instance is positive.c is the number of incorrect of predictions that an instance negative. Figure 8.depicts the true positive rates for different algorithms.A high positive rate indicates that the classifier is capable of prediction which is very near to the specified criteria. Figure 8. reveals that for SVM the True positive rate is maximum which can be due to same value of c(number of incorrect predictions that an instance negative) & d(number of correct predictions that an instance is positive. By which the ratio is 1 and for Logistic, NB and RBF the value is less than 1 which can be due to large value of term in the denominator (c+d). VI. CONCLUSION Summarization of the results of the study is shown in Table II. TABLE II â&#x20AC;&#x201C; SUMMARY OF THE RESULTS Material LR LR RBF RBF SVM SVM NB NB After /Algorit Before After Before After Before After Before Hm MAE
1.9
1
1.75
1
0.9
0.68 0.5
0.33
AUC
0.87
1
0.9
1
0.9
1
0.95
1
KS
0.89
0.98 0.9
1
0.87
0.967 0.95
1
78
8
RMSE
0.089 0.07 0.09 81
0.1
0.086 0.079 0.006 0.002 1
FP
0.012 0.01 0.01
0.0
0.015 0.006 0.018 0.01
02 TP
0.85
0.99 0.9
0.9 6
431
0.97
1
0.93
0.99
The results of the study indicate that Naïve Bayes classifier is better for detecting stability of software. Least MAE indicates that there is minimal deviation in prediction. AUC, Kappa value, TP, Precision and Recall all these parameters have values nearly equal to 1. This is highly significant as already stated AUC describes effectiveness in predicting true positive rates averaged over all false positive rates. A value of 1 effectively means successful prediction of true positive rates. Unity value of Kappa statistics, True Positive (TP) and Precision (accuracy) further reinforces the effectiveness of Naïve Bayes classifier. Recall value is an indicator of the ability of the algorithm to reproduce the same results over a period of time. Recall value of 0.99 for Bayes, Network indicate that the algorithm when subjected to same conditions over different period of times gives the same results, indicating high reliability of the algorithm. It is clear from the results that after the process of implementing decomposition the dataset become more discriminate and difference between each class increased. It become for all algorithms to; This may be attributed to the fact that the number of classes/components for which accuracy is calculated is increased; which leads to change in previous metric value and new value to a greater value. VII. FUTURE SCOPE In our current research work we have technically explored how stability behaves when decomposition leads to change in count of concrete, abstract, interfaces and methods that keep the class objects coupled and cohesive in nature. To begin with research we explored how the Liskov Substitution Principle and problem of “Yoyo” need to be considered for having a stable, mature and reliable software. For future scope we suggest more metrics that have deep causal relation with respect to the stability of the software may be developed and validated for understanding the effect not only the stability but readability of the software/application we focus on artificial intelligence for automating the process. REFERENCES [1] Adam Przybyłek “Systems Evolution and Software Reuse in Object-Oriented Programming and Aspect-Oriented Programming” University of Gdansk, adam@univ.gda.pl, 2011 [2] Arti Chhikara, R.S.Chhillar, R.S.Chhillar ”EVALUATING THE IMPACT OF DIFFERENT TYPES OF INHERITANCE ON THE OBJECT ORIENTED SOFTWARE METRICS” Proceedings of the International Journal of Enterprise Computing and Business Systems, 2 July 2011. [3] A. Mockus, D. Weiss and P. Zhang, “Understanding and Predicting Effort in Software Projects,” Proc. 25th International Conference on Software Engineering (ICSE 2003), Portland, Oregon, May 3-10, 2003, 2003, pp. 274284. [4] Podgurski and L. Clarke, “A Formal Model of Program Dependencies and its Implications for Software Testing, Debugging, and Maintenance,” IEEE Transactions of Software Engineering, vol. 16, no. 9 (1990), pp. 965-979. [5] Dependable software for undependable hardware, Industrial Embedded Systems (SIES), 2012 7th IEEE International Symposium on Digital Object, 2012. [6] Foad Dabiri and Miodrag Potkonjak, “Hardware aging-based software metering”, Proceedings of the Conference on Design, Automation and Test in Europe ,2009. Pp: 460-465 [7] Guang-yi TANG, Hong-wei XUAN “Research on Measurement of Software Package Dependency based on Component” proceeding of the JOURNAL OF SOFTWARE, VOL. 7, NO. 9, September 2012. [8] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Pattern, Elements of Reusable Object-Oriented Software. Addison-Wesley Professional Computing Series, London, UK. 1995 [9] Herwig Mannaert , Jan Verelst and Kris” Towards evolvable software architectures based on systems theoretic stability” Published online 27 January 2011 in Wiley Online Library wileyonlinelibrary.com [10] José M. Conejero, Eduardo Figueiredo, Alessandro Garcia, Juan Hernández, and Elena Jurado,” Early Crosscutting Metrics as Predictors of Software Instability” Lecture Notes in Business Information Processing Volume 33, 2009, pp 136-156 [11] Jennifer Louise Bevan and Whitehead, E.J.” Identification of Software Instabilities” Reverse Engineering, 2003. WCRE 2003. Proceedings. 10th Working Conference, 13-16 Nov. 2003 [12] J. Stafford and A. Wolf, “Architecture-Level Dependence Analysis for Software Systems,” Int'l Journal of Software Engineering and Knowledge Engineering, vol. 11, no. 4 (2001), pp. 431-451. Levendel, I, “The consequences of variability in software”, On-Line Testing Symposium, 2006. IOLTS 2006. 12th IEEE International [13] M. M. Lehman and J. F. Ramil, “EpiCS: Evolution Phenomenology in Component-Intensive Software,” Proc. Seventh IEEE Workshop on Empirical Studies of Software Maintenance (WESS 2001), November, 2001. [14] Mohammad Alshayeb, Wei Li, “An empirical study of system design instability metric and design evolution in an agile software process” Journal of Systems and Software Volume 74, Issue 3, 1 February 2005, Pages 269–274 [15] Maya Yadav, Pradeep Baniya, Ganesh Wayal” Comparison Between Inheritance & interface UML Design Through the Coupling Metrics” Proceedings of the International Journal of Engineering and Advanced Technology (IJEAT), 5 june 2012
432
[16] Phil Greenwood, Thiago Bartolomei, Eduardo Figueiredo, Marcos Dosea, Alessandro Garcia, Nelio Cacho, Cláudio Sant’Anna, Sergio Soares, Paulo Borba, Uirá Kulesza, and Awais Rashid” On the Impact of Aspectual Decompositions on Design Stability: An Empirical Study” 2007 [17] Peter De Bruyn, Herwig Mannaer “Towards Applying Normalized Systems Concepts to Modularity and the Systems Engineering “Department of Management Information Systems proceeding of the The Seventh International Conference on Systems. 2012 [18] Shihab, E. ; Bird, C. ; Zimmermann, T. ,” The effect of branching strategies on software quality” Empirical Software Engineering and Measurement (ESEM), 2012 ACM-IEEE International Symposium , 2012 , Page: 301 – 310
433