2nd International Symposium on Search Based Software Engineering
September 7 – 9, 2010 Benevento, Italy
The Human Competitiveness of Search Based Software Engineering Jerffeson Teixeira de Souza Camila Loiola Maia Fabrício Gomes de Freitas Daniel Pinto Coutinho
Optimization in Software Engineering Group (GOES.UECE) State University of Ceará, Brazil
Nice to meet you,
Jerffeson Teixeira de Souza, Ph.D. State University of Cearรก, Brazil Professor http://goes.comp.uece.br/ prof.jerff@gmail.com
Our little time will be divided as follows Part 01 Part 02 Part 03 Part 04
Research Questions Experimental Design Results and Analises Final Considerations
The question regarding the human competitiveness of SBSE ... has already been raised no comprehensive work has been Mark Harman, The Current State and Future of Search Based published to date. Software Engineering, Proceedings of International Conference on Software Engineering / Future of Software Engineering 2007 (ICSE/FOSE '07), Minneapolis: IEEE Computer Society, pp. 342-357, 2007.
why ?
one may argue ...
!
The human competitiveness of SBSE is not in doubt by the SBSE community
But, even if that is the case,...
Strong research results regarding this issue would likely, in the least, contribute to the increasing acceptance of SBSE outside its research community
!
?
Can the results generated by Search Based Software Engineering be said to be human competitive?
SBSE human competitiveness
but ...
How to evaluate the Human Competitiveness of SBSE?
“
The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or humanwritten computer programs).
�
FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY
SSBSE HUMANS 2010
VS
MACHINE THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW
FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY
SSBSE HUMANS 2010
VS
SBSE ALGORITHMS THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW
FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY
?
which SSBSE ones 2010
HUMANS VS
SBSE ALGORITHMS THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW
?
Can the results generated by Search Based Software Engineering be said to be human competitive?
SBSE human competitiveness
??
Can the results generated by Search Based Software Engineering be said to be human competitive?
SBSE human competitiveness
How do different metaheuristics compare in solving a variety of search based software engineering problems?
SBSE algorithms comparison
The Problems The Next Release Problem The Multi-Objective Next Release Problem
Motivation
The Workgroup Formation Problem
They can be considered “classical”Test formulations The Multi-Objective Case Selection Problem
They cover together a range of three different general phases in the software development life cycle
THE NEXT RELEASE PROBLEM Involves determining a set of customers which will have their selected requirements delivered in the next software release This selection prioritizes * customers with higher cost R i ≤ B importance to the company i∈S and must respect a pre-determined ∑ wi budget i∈S
The cost of implementing the selected requirements is taken as an independent objective to be optimized, not as a constraint, n
along with
∑ costi ⋅ xi i =1
n
∑ scorei ⋅ xi THE
a score representing the importance of a given requirement
i =1
MULTI-OBJECTIVE
NEXT RELEASE PROBLEM
P N
∑ ∑ Sal p × Aap × Dura p =1a =1
P
S
N
− λ ∑∑ ∑ R ps × Aap × SI s
The pformulation displays a single objective =1 s =1 a =1 function toNbeP minimized, Nwhich composes both P skill and preference factors −salary η ∑ ∑costs, Ppa × A ap + ∑ ∑ Pmpa × Aap a =1 p =1
a =1 p =1
N P P −η ∑ ∑ Pp1 p 2 × Aapallocation × Aap 2 × X of deals with the human ∑ 1 p 1 p 2 a =1 p1=1 p 2 =1 resources to projecttasks THE
WORKGROUP FORMATION PROBLEM
THE MULTI-OBJECTIVE TEST CASE SELECTION PROBLEM extends previously published mono-objective formulations
The paper discusses two variations, one which considers two objectives (code coverage and execution time), used here, and the other covering three objectives (code coverage, execution time and fault detection).
For each problem (NRP, MONRP, WORK and TEST), two instances, A and B, with increasing sizes, were synthetically generated.
The Data
Instance Name
Instance Features # Customers
# Tasks
NRPA
10
20
NRPB
20
40 INSTANCES FOR PROBLEM NRP
Instance Name MONRPA MONRPB
Instance Features # Customers
# Requirements
10 20
20 40
INSTANCES FOR PROBLEM MONRP
Instance Name
Instance Features # Persons
# Skills
# Activities
WORKA
10
5
5
WORKB
20
10
10
INSTANCES FOR PROBLEM WORK
Instance Name TESTA TESTB
Instance Features # Test Cases
# Code Blocks
20 40
40 80
INSTANCES FOR PROBLEM TEST
The Algorithms For Mono-Objective Problems Genetic Algorithm (GA) Simulated Annealing (SA)
For Multi-Objective Problems NSGA-II MoCell
For Mono and Multi-Objective Problems Random Search
Human Subjects A total of 63 professional software engineers solved some or all of the instances.
NUMBER OF HUMAN RESPONDENTS PER INSTAN
Human Subjects Besides solving the problem instance, each respondent was asked to answer the following questions related to each problem instance How hard was it to solve this problem instance? How hard would it be for you to solve an instance twice this size? What do you think the quality of a solution generated by you over an instance twice this size would be?
In addition to these specific questions regarding each problem instance, general questions on the respondent theoretical and practical experience over software engineering
Comparison Metrics For Mono-Objective Problems Quality
For Multi-Objective Problems Hypervolume Spread Number of Solutions
For Mono and Multi-Objective Problems Execution Time
RESULTS AND ANALYSES
?
How do different metaheuristics compare in solving a variety of search based software engineering problems?
SBSE algorithms comparison
RESULTS
Problem
GA
SA
RAND
NRPA
26.45±0.500
25.74±0.949
15.03±5.950
NRPB
95.41±0.190
90.47±7.023
45.74±11.819
WORKA
16,026.17±51.700
18,644.71±1,260.194
19,391.34±1,220.17
WORKB
24,831.23±388.107
35,174.19±2,464.733
36,892.64±2,428.269
Quality of Results for NRP and WORK
averages and standard deviations, over 100 executions
RESULTS
Problem
GA
SA
RAND
NRPA
40.92±11.112
23.01±7.476
0.00±0.002
NRPB
504.72±95.665
292.62±55.548
0.06±0.016
WORKA
242.42±44.117
73.35±19.702
0.04±0.010
WORKB
4,797.89±645.360
2,211.28±234.256
1.75±0.158
Time (in miliseconds) Results for NRP and WORK averages and standard deviations, over 100 executions
NRPA
NRPB
WORKA
WORKB
Boxplots showing average (+), maximum (), minimum (×) and 25% 75% quartile ranges of quality for mono-objective problems NRP and WORK, instances A and B, for GA, SA and Random Search.
RESULTS
Problem
NSGA-II
MOCell
RAND
MONRPA
0.6519±0.009
0.6494±0.013
0.5479±0.0701
MONRPB
0.6488±0.015
0.6470±0.017
0.5462±0.0584
TESTA
0.5997±0.009
0.5867±0.019
0.5804±0.0648
TESTB
0.6608±0.020
0.6243±0.044
0.5673±0.1083
Hypervolume Results for MONRP and TEST
averages and standard deviations, over 100 executions
RESULTS
Problem
NSGA-II
MOCell
RAND
MONRPA
0.4216±0.094
0.3973±0.031
0.5492±0.1058
MONRPB
0.4935±0.098
0.3630±0.032
0.5504±0.1081
TESTA
0.4330±0.076
0.2659±0.038
0.5060±0.1029
TESTB
0.3503±0.178
0.2963±0.072
0.4712±0.1410
Spread Results for MONRP and TEST
averages and standard deviations, over 100 executions
RESULTS
Problem
NSGA-II
MOCell
RAND
MONRPA
1,420.48±168.858
993.09±117.227
25.30±10.132
MONRPB
1,756.71±138.505
1,529.32±141.778
30.49±7.204
TESTA
1,661.03±125.131
1,168.47±142.534
25.24±11.038
TESTB
1,693.37±138.895
1,370.96±127.953
32.89±9.335
Time (in miliseconds) Results for MONRP and TEST averages and standard deviations, over 100 executions
RESULTS
Problem
NSGA-II
MOCell
RAND
MONRPA
31.97±5.712
25.01±5.266
12.45±1.572
MONRPB
60.56±4.835
48.04±4.857
20.46±2.932
TESTA
35.43±4.110
26.20±5.971
12.54±1.282
TESTB
41.86±9.670
19.93±8.514
11.58±2.184
Number of Solutions Results for MONRP and TEST averages and standard deviations, over 100 executions
RESULTS 120 100
MOCell NSGA-II Random
cost
80 60 40 20
STANCES FOR PROBLEM NR
0 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 value
Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances A
RESULTS 220
MOCell NSGA-II Random
200 180 160 cost
140 120 100 80 60
STANCES FOR PROBLEM NR
40 -11000 -10000 -9000
-8000
-7000 value
-6000
-5000
-4000
-3000
Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances B
RESULTS 1400
MOCell NSGA-II Random
1200
cost
1000 800 600 400 200
STANCES FOR PROBLEM NR
0 -100
-80
-60 -40 % coverage
-20
0
Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances A
RESULTS 1000
MOCell NSGA-II Random
900 800 700 cost
600 500 400 300 200 100
STANCES FOR PROBLEM NR
0 -100
-90
-80
-70 % coverage
-60
-50
-40
Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances B
?
Can the results generated by Search Based Software Engineering be said to be human competitive?
SBSE human competitiveness
RESULTS AND ANALYSES
RESULTS SBSE
Problem
Humans
Quality
Time
Quality
Time
NRPA
26.48 ±0.512
40.57 ±9.938
16.19 ±6.934
1,731,428.57 ±2,587,005.57
NRPB
95.77 ±0.832
534.69 ±91.133
77.85 ±23.459
3,084,000.00 ±2,542,943.10
WORKA
16,049.72 ±121.858
260.00 ±50.384
28,615.44 ±12,862.590
2,593,846.15 ±1,415,659.62
WORKB
25,047.40 ±322.085
4,919.30 ±1,219.912
50,604.60 ±20,378.740
5,280,000.00 ±3,400,588.14
Quality and Time (in milliseconds) for NRP and WORK averages and standard deviations
NRPA
NRPB
WORKA
WORKB
Boxplots showing average (+), maximum (), minimum (×) and 25% 75% quartile ranges of quality for mono-objective problems NRP and WORK, instances A and B, for SBSE and Human Subjects.
RESULTS SBSE
Problem
Humans
HV
Time
HV
Time
MONRPA
0.6519 ±0.009
1,420.48 ±168.858
0.4448
1,365,000.00 ±1,065,086.42
MONRPB
0.6488 ±0.015
1,756.71 ±138.505
0.2870
2,689,090.91 ±2,046,662.91
TESTA
0.5997 ±0.009
1,661.03 ±125.131
0.4878
1,472,307.69 ±892,171.07
TESTB
0.6608 ±0.020
1,693.37 ±138.895
0.4979
3,617,142.86 ±3,819,431.52
Hypervolume and Time (in milliseconds) Results for SBSE and Humans For MONRP and TEST averages and standard deviations
RESULTS 120
MOCell NSGA-II Humans
100
cost
80 60 40 20
STANCES FOR PROBLEM NR
0 -2500
-2000
-1500
-1000
-500
0
value
Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances A
RESULTS 250
MOCell NSGA-II Humans
200
cost
150
100
50
STANCES FOR PROBLEM NR
0 -11000-10000-9000 -8000 -7000 -6000 -5000 -4000 -3000 -2000 -1000 value
Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances B
RESULTS 250
MOCell NSGA-II Humans
200
cost
150
100
50
0 -100
STANCES FOR PROBLEM NR -80
-60 -40 % coverage
-20
0
Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances A
RESULTS 550
MOCell NSGA-II Humans
500 450 400 350 cost
300 250 200 150 100 50
STANCES FOR PROBLEM NR
0 -100
-90
-80
-70
-60 -50 % coverage
-40
-30
-20
Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances B
FURTHER HUMAN COMPETITIVENESS ANALYSES Human participants were asked to rate how difficult they found each problem instance and how confident they were on their solutions
Bar chart showing percentage of human respondents who considered each problem “hard” or “very hard”
FURTHER HUMAN COMPETITIVENESS ANALYSES Human participants were asked to rate how difficult they found each problem instance and how confident they were on their solutions
Bar chart showing percentage of human respondents who were “confident” or “very confident”
FURTHER HUMAN COMPETITIVENESS ANALYSES
NRP
MONRP
WORK
TEST
Bar charts showing percentage differences in quality for mono and multi-objective problems generated by SBSE and the human subjects
FURTHER HUMAN COMPETITIVENESS ANALYSES 57.33% of the human participants which responded instance A indicated that solving instance B would be “harder” or “much harder”, and 55.00% predicted that their solution for this instance would be “worse” or “much worse” 62.50% of the instance B respondents pointed out the increased difficulty of a problem instance twice larger, and 57.14% that their solution would be “worse” or “much worse”
FURTHER HUMAN COMPETITIVENESS ANALYSES
These results suggest that for larger problem instances, the potential of SBSE to generate even more accurate results, when compared to humans, increases In fact, this suggests that SBSE may be particularly useful in solving real-world large-scale software engineering problems
Threats to Validity Small instance sizes Artificial instances Number and diversity of human participants Number of problems
This paper reports the results of an extensive experimental research aimed at evaluating the human competitiveness of SBSE Secondarily, several tests were performed over four classical SBSE problems in order to evaluate the performance of well-known metaheuristics in solving both mono- and multi-objective problems
FINAL CONSIDERATIONS
Regarding the comparison of algorithms GA generated more accurate solutions for mono-objective problems than SA NSGA-II consistently outperformed MOCell in terms of hypervolume and number of generated solutions MOCell outperformed NSGA-II when considering spread and the execution time All of these results are consistent with previously published research
FINAL CONSIDERATIONS
Regarding the human competitiveness question
Experiments strongly suggest that the results generated by search based software engineering can, indeed, be said to be human competitive Results indicate that for real-world large-scale software engineering problem, the benefits from applying SBSE may be even greater
FINAL CONSIDERATIONS
That is it!
Thanks for your time and attention.
2nd International Symposium on Search Based Software Engineering
September 7 – 9, 2010 Benevento, Italy
The Human Competitiveness of Search Based Software Engineering
prof.jerff@gmail.com http://goes.comp.uece.br/ Optimization in Software Engineering Group (GOES.UECE) State University of Ceará, Brazil