Effective Bug Triage using Software Data Reduction Techniques

Page 1

IJIRST –International Journal for Innovative Research in Science & Technology| Volume 4 | Issue 1 | June 2017 ISSN (online): 2349-6010

Effective Bug Triage using Software Data Reduction Techniques Rinku A. Chaudhari ME Student Department of Computer Engineering P.V.P.I.T Bavdhan, Pune, Maharashtra, India

Prof. Sarika V. Bodake Assistant Professor Department of Computer Engineering P.V.P.I.T Bavdhan, Pune, Maharashtra, India

Abstract Bug triage is a very much important step during bug fixing. Bug triage is the way toward fixing bug whose primary target is to accurately apportion a designer to another bug additionally taking care of. Many software organizations spend their too much cost in managing these bugs. To reduce the time cost in manual work and to improve the working of programmed bug triage, two procedures are connected in particular content characterization and double arrangement. In writing different papers address the issue of information diminishment for bug triage, i.e., how to lessen the scale and enhance the nature of bug information. By joining the example determination and the component choice calculations to at the same time reduce the information scale and upgrade the correctness of the bug reports in the bug triage. According to writing, need to build up a powerful model for doing information lessening on bug information set which will decrease the size of the information and increment the nature of the information, by minimizing the time and cost to get accurate result. System produces bug report and assigns that bug to appropriate developer according to his domain. Keywords: Bug Triage, Data Reduction, Instance Selection, Feature Selection, Data Mining _______________________________________________________________________________________________________ I.

INTRODUCTION

Various product organizations spend a large portion of the cash in settling the bugs. Vast programming ventures have bug storehouse that gathers together all the data identified with bugs. In bug store, every product bug has a bug report. The bug report comprises of literary data with respect to the bug and overhauls identified with status of bug settling [1]. Once a bug report is shaped, a human triager doles out this bug to a designer, who will attempt to settle this bug. This engineer is recorded in a thing allocated to the appointed to will change to another designer if the already relegated engineer can't settle this bug [2]. The way toward allotting a right developer for settling the bug is called bug triage. Bug triage is a standout amongst the most tedious stride in treatment of bugs in programming ventures for solving it [3]. Manual bug triage by a human triager is tedious and blunder inclined since the quantity of day by day bugs is substantial and absence of learning in engineers about all bugs. Due to every one of these things, bug triage brings 5678 about costly time misfortune, high cost and low precision. The data put away in bug reports has two main primary difficulties [4]. Firstly the extensive scale information and also low nature of information. Because of vast number of daily reported bugs, the quantity of bug reports is scaling up in the store [5]. Uproarious and repetitive bugs are debasing the nature of bug reports. In this paper a powerful bug triage framework is proposed which will lessen the bug information to spare the 745-63. II. RELATED WORK Present a system for discovering flaws in PHP Web applications that depends on joined concrete and typical execution [1]. The work is novel in a few regards. To start with, the strategy distinguishes runtime mistakes as well as utilizations a HTML validator as a prophet to decide circumstances where deformed HTML is made. Second, address various PHP-particular issues, for example, the recreation of intelligent client input that happens when UI components on created HTML pages are enacted, bringing about the execution of extra PHP scripts. Third, we play out a robotized examination to minimize the span of disappointment actuating inputs. Audits the issues with utilizing basic K-Means as a part of the order of datasets [2]. The viability of Quad Tree based EM grouping calculation in foreseeing deficiencies while characterizing a dataset, when contrasted with other existing calculations, for example, K-Means has been assessed. The Quad Tree approach appoints proper introductory group focuses and wipes out the exceptions. K-Means is thought to be one of the most straightforward strategies to group information. Nonetheless, the proposed EM calculation is utilized to group information adequately. Consolidating the Quad Tree approach and the EM calculation gives a bunching strategy that not just fits the information better in the groups additionally tries to make them minimal and more significant. Utilizing EM alongside Quad Tree makes the grouping procedure quicker. With K-implies, merging is not ensured but rather EM ensures rich union. It conducts a contextual analysis on high effect bugs, which characterized bugs answered to four open source ventures into six sorts of high effect bugs [3].

All rights reserved by www.ijirst.org

214


Effective Bug Triage using Software Data Reduction Techniques (IJIRST/ Volume 4 / Issue 1/ 036)

For the situation study, one hundred bug reports were physically reviewed for every venture and are arranged into six sorts of high effect bugs in light of past reviews which concentrate on high effect bugs. Our contextual investigation expected to uncover dispersions of high effect bugs in reported bugs and covered connections among high effect bugs. It has actualized cost-triage calculation which abuses the cost and exactness of bug settling or bug forecast [4]. This paper has likewise actualized the element determination procedure which lessens the quantity of components utilized by a machine learning classifier for bug forecast. It is investigated the utilization of highlight determination strategies to foresee programming bugs. A critical sober minded outcome is that component determination can be performed in augmentations of half of every single outstanding element, permitting it to continue rapidly. Somewhere around 3.12 and 25 percent of the aggregate list of capabilities yielded ideal characterization comes about. The lessened list of capabilities allows better and speedier bug expectations. The procedure is quick performing and can be connected to anticipating bugs on different activities [5]. The most imperative outcomes originating from utilizing the element determination process are found in Table, which presents F-measure improved outcomes for the Naive Bayes classifier. A helpful even minded outcome is that component determination can be performed in augmentations of half of every single residual element, permitting it to continue rapidly. The normal carriage is accuracy is very high at 0.97, with a sensible review of 0.70. This output beats comparative classifier-based bug forecast strategies and the outcomes prepare for reasonable reception of classifier-based bug expectation. From the perspective of an engineer accepting bug expectations on their work, these figures imply that if the classifier says a code change has a bug, it is quite often right.work cost of engineers. It additionally expects to manufacture a brilliant arrangement of bug information by expelling the excess and non-useful bug reports [7]. Drawback of Existing System    

Bugs are manually triaged by an expert developer. Manual bug triage is expensive in time cost and low in accuracy. Lack of data reduction technique. Developers needs to study whole bug repository to solve bug. III. PROPOSED SCHEME

Manual Bug fixing is time consuming task and didn’t get accurate result. So that proposed system is provided. There is problem of getting accurate bug solution according to domain. In existing approach, get reduced bug dataset and high quality bug dataset. For that point, proposed system is provided. We used existing system instance selection and feature selection for reducing bug dataset. And additionally use Top-K pruning algorithm for improving results of data reduction quality as compared to existing system and get domain wise bug solution.

Fig. 1: Proposed System Architecture

Bug triage is one of the most time consuming step in handling of bugs in software projects. Manual bug triage by a human triager is time consuming and error-prone since the number of daily bugs is large and lack of knowledge in developers about all bugs. Because of all these things, bug triage results in expensive time loss, high cost and low accuracy [1]. The information stored in bug reports has two main challenges. Firstly the large scale data and secondly low quality of data. Due to large number of daily reported bugs, the number of bug reports are added up in the repository. Noisy and redundant bugs are degrading the quality of bug reports. The effective bug triage system is proposed which will reduce the bug data to save the labour cost of developers. It also aims to build a high quality set of bug data by removing the redundant and non-informative bug reports .The proposed system consist of following modules:

All rights reserved by www.ijirst.org

215


Effective Bug Triage using Software Data Reduction Techniques (IJIRST/ Volume 4 / Issue 1/ 036)

Instance Selection Instance selection and feature selection are widely used techniques in data processing. For a given data set in a certain application, instance selection is to obtain a subset of relevant instances (i.e., bug reports in bug data) while feature selection aims to obtain a subset of relevant features (i.e., words in bug data). In our work, we employ the combination of instance selection and feature selection. Data Reduction In our work, to save the labor cost of developers, the data reduction for bug triage has two goals.  Reducing the data scale.  Improving the accuracy of bug triage. In contrast to modelling the textual content of bug reports in existing work, we aim to augment the data set to build a preprocessing approach, which can be applied before an existing bug triage approach. We explain the two goals of data reduction as follows. Create Bug Report According to error it will create the bug report. Assign Bug to Developer Assign bug report to appropriate developer. IV. IMPLEMENTATION PLAN Reducing Bug Data for Bug TRIAG Bug data reduction in our work, which is applied as a phase in data preparation of bug triage. We combine existing techniques of instance selection and feature selection to remove certain bug reports and words, i.e., in Fig. 2. A problem for reducing the bug data is to determine the order of applying instance selection and feature selection, which is denoted as the prediction of reduction orders.

Fig. 2: Illustration of reducing bug data for bug triage

All rights reserved by www.ijirst.org

216


Effective Bug Triage using Software Data Reduction Techniques (IJIRST/ Volume 4 / Issue 1/ 036)

Contribution 1)    

Bug Summery is generalized in pdf file. Pdf is very nice contribution for this project. Pdf contains details as follows. Bug Summary: We have use reduction algorithm here to reduce the result and show that in pdf. Bug Deadline: It contains date by which the bug should be solved. Bug Description: Here the details of bug are reported. This is nothing but the bug reported by the manager. Suggestions: Developer who has already worked on some or have some knowledge have a bug can be solved then he or she can give suggestion to the user. 2) Graphical representation for bug assigning and completion by developer.

Fig. 3: Project work flow

Algorithm Used Data reduction based on FS → IS FS- Feature Selection IS- Instance Selection Input: training set T with n words and m bug reports, reduction order FS→ IS Final number nF of words, Final number m1 of bug reports, Output: reduced data set TFT for bug triage Apply FS to n words of T and calculate objective values for all the words; Select the top nF words of T and generate a training set TF; Apply IS to m1 bug reports of TF; Terminate IS when the number of bug reports is equal to or less than ml and generate the final training set TFI. In Algorithm 1, we briefly describe how to reduce the bug Data based on FS ! IS. Given a bug data set, the output of Bug data reduction is a emerging new trend and reduced data set. Two algorithms FS and IS are applied sequentially. Note that in Step2, some of bug reports may be blank during feature Xuan et al.: towards effective bug triage with software data all the words in a bug report are removed. Such Blank bug reports are also removed in the feature selection. In our work, FS ! IS and IS ! FS are viewed as two Orders of bug data reduction. To avoid the bias from a single Algorithm, we examine results of four typical algorithms of Instance selection and feature selection. Naive Bayes Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is also known to outperform even highly sophisticated classification methods. Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation below:          

All rights reserved by www.ijirst.org

217


Effective Bug Triage using Software Data Reduction Techniques (IJIRST/ Volume 4 / Issue 1/ 036)

    1) 2) 3)

Above, P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes). P(c) is the prior probability of class. P(x|c) is the likelihood which is the probability of predictor given class. P(x) is the prior probability of predictor. Step 1: Convert the data set into a frequency table Step 2: Create Likelihood table by finding the probabilities Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction. Mathematical Model

Let, S = {I, P, IO, O} Where, S = Set of bug report (text file) I = Set of inputs P = Set of processes IO = Intermediate output O = Set of outputs/ final outputs 1) I = {i} Where, i = Folder containing bug file in text format 2) P = {p1, p2, p3, p4} Where, p1 = Load data set, p2=Vector quantization and applying LVQ algorithm (instance selection), p3 = Feature selection, p4 = Result in the cluster file (new report). 3) IO = {io1, io2, io3} Where, io1 = Check the file format, io2 = Data reduction, io3 = Checking the path for output file. 4) O= {o} Where, o=cluster of document (new bug report). V. RESULT TABLES Following table shows result comes from the system performance, it shows that the required to assign bug to appropriate developer with respect to file size of bug. Table - 1 Time required to triage bug Sr. No Time(Sec) Image Size(KB) 1 100 7 2 160 8 3 200 11 4 310 12 5 400 13

Fig. 4: Developers / Solved Bugs

All rights reserved by www.ijirst.org

218


Effective Bug Triage using Software Data Reduction Techniques (IJIRST/ Volume 4 / Issue 1/ 036)

Fig. 5: Result graph

As shown in figure 5 the graph represents time require to assign bug to developer. If bug size is increasing rapidly then the time is also increased. Outcome of System Getting result from this dissertation, it’s just an actual result. It’s part of knowledge about Developer and Manager how they can work. The actual output of modules is as follows: Developer  Developer can register the details and self-login into the system.  Developers views the assign bug task and solves it and update the status. Manager  Manager login to the system and upload the bug data.  After uploading the bug data system do the data reduction and create appropriate report.  Manager can view the developer’s task status. VI. CONCLUSIONS AND FUTURE WORK Bug triage is an expensive walk of programming upkeep in both work cost and time cost. In this paper, we combine highlight assurance with case decision to diminish the span of bug data sets and also improve the data quality. To choose the demand of applying event decision and highlight decision for another bug data set, we remove properties of each bug data set and set up a perceptive model in perspective of bona fide data sets. We tentatively investigate the data diminishment for bug triage in bug storage facilities of two immense open source endeavours, to be particular Eclipse and Mozilla. Our work gives an approach to manage using procedures on data get ready to outline diminished and first rate bug data in programming headway and support. We have included many other, modules which are helpful in many ways. Those are Bug summery that will summarize the bug generated report in the form of pdf file. It will contain bug summery, bug deadline, bug description and suggestions by develop if any. Graphical representation for bug assigning and completion by developer is also provided. This makes the analysis easier for the higher authority to decide the developer to assign further bug to repair. In future work, we expect improving the eventual outcomes of data diminishment in bug triage to explore how to set up a first rate bug data set and handle a space specific programming task. For predicting reducing orders, we plan to pay attempts to find the potential relationship between the characteristics of bug data sets and the diminishment orders. In Future System used in as:  For IT industry to manage bug solution process task.  Used automatic bug triage in industry. REFERENCES [1] [2] [3] [4] [5] [6]

Shay Artzi, Adam Kie zun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Senior Member, IEEE, and Michael D. Ernst,” Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking” 2010. Partha Sarathi Bishnu and Bandana Bhattacherjee, Member, IEEE,” Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm”, 2012. Yutaro Kashiwa Hayato Yoshiyuki “A Pilot Study of Diversity in High Impact Bugs” 2014. Shivkumar Shivaji, E. James Whitehead, Jr., Ram Akella “Reducing Features to Improve Bug Prediction.” 2009. Ram Akella, Sunghun Kim” Reducing Features to Improve Code Change-Based Bug Prediction” 2013. K. Balog, L. Azzopardi, and M. de Rijke, “Formal models for expert finding in enterprise corpora,” in Proc. 29th Annu. Int. ACM SIGIR Conf. Res. Develop. Inform. Retrieval, Aug. 2006, pp. 43–50.

All rights reserved by www.ijirst.org

219


Effective Bug Triage using Software Data Reduction Techniques (IJIRST/ Volume 4 / Issue 1/ 036) P. S. Bishnu and V. Bhattacherjee, “Software fault prediction using quad tree-based k-means clustering algorithm,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 6, pp. 1146–1150, Jun. 2012. [8] H. Brighton and C. Mellish, “Advances in instance selection for instance-based learning algorithms,” Data Mining Knowl. Discovery, vol. 6, no. 2, pp. 153– 172, Apr. 2002. [9] S. Breu, R. Premraj, J. Sillito, and T. Zimmermann, “Information needs in bug reports: Improving cooperation between developers and users,” in Proc. ACM Conf. Comput. Supported Cooperative Work, Feb. 2010, pp. 301–310. [10] V. Bol_on-Canedo, N. S anchez-Maro~no, and A. Alonso-Betanzos, “A review of feature selection methods on synthetic data,” Knowl. Inform. Syst., vol. 34, no. 3, pp. 483–519, 2013. [7]

All rights reserved by www.ijirst.org

220


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.