The Human Competitiveness of Search Based Software Engineering by Fabricio Freitas

2nd International Symposium on Search Based Software Engineering

September 7 – 9, 2010 Benevento, Italy

The Human Competitiveness of Search Based Software Engineering Jerffeson Teixeira de Souza Camila Loiola Maia Fabrício Gomes de Freitas Daniel Pinto Coutinho

Optimization in Software Engineering Group (GOES.UECE) State University of Ceará, Brazil

Nice to meet you,

Jerffeson Teixeira de Souza, Ph.D. State University of Cearรก, Brazil Professor http://goes.comp.uece.br/ prof.jerff@gmail.com

Our little time will be divided as follows Part 01 Part 02 Part 03 Part 04

Research Questions Experimental Design Results and Analises Final Considerations

The question regarding the human competitiveness of SBSE ... has already been raised no comprehensive work has been Mark Harman, The Current State and Future of Search Based published to date. Software Engineering, Proceedings of International Conference on Software Engineering / Future of Software Engineering 2007 (ICSE/FOSE '07), Minneapolis: IEEE Computer Society, pp. 342-357, 2007.

why ?

one may argue ...

The human competitiveness of SBSE is not in doubt by the SBSE community

But, even if that is the case,...

Strong research results regarding this issue would likely, in the least, contribute to the increasing acceptance of SBSE outside its research community

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness

but ...

How to evaluate the Human Competitiveness of SBSE?

â&#x20AC;&#x153;

The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or humanwritten computer programs).

â&#x20AC;?

FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

SSBSE HUMANS 2010

MACHINE THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW

FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

SSBSE HUMANS 2010

SBSE ALGORITHMS THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW

FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

which SSBSE ones 2010

HUMANS VS

SBSE ALGORITHMS THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness

How do different metaheuristics compare in solving a variety of search based software engineering problems?

SBSE algorithms comparison

The Problems The Next Release Problem The Multi-Objective Next Release Problem

Motivation

The Workgroup Formation Problem

They can be considered “classical”Test formulations The Multi-Objective Case Selection Problem

They cover together a range of three different general phases in the software development life cycle

THE NEXT RELEASE PROBLEM Involves determining a set of customers which will have their selected requirements delivered in the next software release This selection prioritizes   * customers with higher cost   R i  ≤ B importance to the company  i∈S  and must respect a pre-determined ∑ wi budget i∈S

The cost of implementing the selected requirements is taken as an independent objective to be optimized, not as a constraint, n

along with

∑ costi ⋅ xi i =1

∑ scorei ⋅ xi THE

a score representing the importance of a given requirement

i =1

MULTI-OBJECTIVE

NEXT RELEASE PROBLEM

P N

∑ ∑ Sal p × Aap × Dura p =1a =1

− λ ∑∑ ∑ R ps × Aap × SI s

The pformulation displays a single objective =1 s =1 a =1 function toNbeP minimized, Nwhich composes both P   skill and preference factors −salary η  ∑ ∑costs, Ppa × A ap + ∑ ∑ Pmpa × Aap  a =1 p =1 

a =1 p =1

 

N P P   −η  ∑ ∑ Pp1 p 2 × Aapallocation × Aap 2 × X of deals with the human ∑ 1 p 1 p 2  a =1 p1=1 p 2 =1   resources to projecttasks THE

WORKGROUP FORMATION PROBLEM

THE MULTI-OBJECTIVE TEST CASE SELECTION PROBLEM extends previously published mono-objective formulations

The paper discusses two variations, one which considers two objectives (code coverage and execution time), used here, and the other covering three objectives (code coverage, execution time and fault detection).

For each problem (NRP, MONRP, WORK and TEST), two instances, A and B, with increasing sizes, were synthetically generated.

The Data

Instance Name

Instance Features # Customers

# Tasks

NRPA

NRPB

40 INSTANCES FOR PROBLEM NRP

Instance Name MONRPA MONRPB

Instance Features # Customers

# Requirements

10 20

20 40

INSTANCES FOR PROBLEM MONRP

Instance Name

Instance Features # Persons

# Skills

# Activities

WORKA

WORKB

INSTANCES FOR PROBLEM WORK

Instance Name TESTA TESTB

Instance Features # Test Cases

# Code Blocks

20 40

40 80

INSTANCES FOR PROBLEM TEST

The Algorithms For Mono-Objective Problems Genetic Algorithm (GA) Simulated Annealing (SA)

For Multi-Objective Problems NSGA-II MoCell

For Mono and Multi-Objective Problems Random Search

Human Subjects A total of 63 professional software engineers solved some or all of the instances.

NUMBER OF HUMAN RESPONDENTS PER INSTAN

Human Subjects Besides solving the problem instance, each respondent was asked to answer the following questions related to each problem instance How hard was it to solve this problem instance? How hard would it be for you to solve an instance twice this size? What do you think the quality of a solution generated by you over an instance twice this size would be?

In addition to these specific questions regarding each problem instance, general questions on the respondent theoretical and practical experience over software engineering

Comparison Metrics For Mono-Objective Problems Quality

For Multi-Objective Problems Hypervolume Spread Number of Solutions

For Mono and Multi-Objective Problems Execution Time

RESULTS AND ANALYSES

How do different metaheuristics compare in solving a variety of search based software engineering problems?

SBSE algorithms comparison

RESULTS

Problem

RAND

NRPA

26.45±0.500

25.74±0.949

15.03±5.950

NRPB

95.41±0.190

90.47±7.023

45.74±11.819

WORKA

16,026.17±51.700

18,644.71±1,260.194

19,391.34±1,220.17

WORKB

24,831.23±388.107

35,174.19±2,464.733

36,892.64±2,428.269

Quality of Results for NRP and WORK

averages and standard deviations, over 100 executions

RESULTS

Problem

RAND

NRPA

40.92±11.112

23.01±7.476

0.00±0.002

NRPB

504.72±95.665

292.62±55.548

0.06±0.016

WORKA

242.42±44.117

73.35±19.702

0.04±0.010

WORKB

4,797.89±645.360

2,211.28±234.256

1.75±0.158

Time (in miliseconds) Results for NRP and WORK averages and standard deviations, over 100 executions

NRPA

NRPB

WORKA

WORKB

Boxplots showing average (+), maximum (), minimum (×) and 25% 75% quartile ranges of quality for mono-objective problems NRP and WORK, instances A and B, for GA, SA and Random Search.

RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

0.6519±0.009

0.6494±0.013

0.5479±0.0701

MONRPB

0.6488±0.015

0.6470±0.017

0.5462±0.0584

TESTA

0.5997±0.009

0.5867±0.019

0.5804±0.0648

TESTB

0.6608±0.020

0.6243±0.044

0.5673±0.1083

Hypervolume Results for MONRP and TEST

averages and standard deviations, over 100 executions

RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

0.4216±0.094

0.3973±0.031

0.5492±0.1058

MONRPB

0.4935±0.098

0.3630±0.032

0.5504±0.1081

TESTA

0.4330±0.076

0.2659±0.038

0.5060±0.1029

TESTB

0.3503±0.178

0.2963±0.072

0.4712±0.1410

Spread Results for MONRP and TEST

averages and standard deviations, over 100 executions

RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

1,420.48±168.858

993.09±117.227

25.30±10.132

MONRPB

1,756.71±138.505

1,529.32±141.778

30.49±7.204

TESTA

1,661.03±125.131

1,168.47±142.534

25.24±11.038

TESTB

1,693.37±138.895

1,370.96±127.953

32.89±9.335

Time (in miliseconds) Results for MONRP and TEST averages and standard deviations, over 100 executions

RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

31.97±5.712

25.01±5.266

12.45±1.572

MONRPB

60.56±4.835

48.04±4.857

20.46±2.932

TESTA

35.43±4.110

26.20±5.971

12.54±1.282

TESTB

41.86±9.670

19.93±8.514

11.58±2.184

Number of Solutions Results for MONRP and TEST averages and standard deviations, over 100 executions

RESULTS 120 100

MOCell NSGA-II Random

cost

80 60 40 20

STANCES FOR PROBLEM NR

0 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 value

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances A

RESULTS 220

MOCell NSGA-II Random

200 180 160 cost

140 120 100 80 60

STANCES FOR PROBLEM NR

40 -11000 -10000 -9000

-8000

-7000 value

-6000

-5000

-4000

-3000

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances B

RESULTS 1400

MOCell NSGA-II Random

1200

cost

1000 800 600 400 200

STANCES FOR PROBLEM NR

0 -100

-80

-60 -40 % coverage

-20

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances A

RESULTS 1000

MOCell NSGA-II Random

900 800 700 cost

600 500 400 300 200 100

STANCES FOR PROBLEM NR

0 -100

-90

-80

-70 % coverage

-60

-50

-40

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances B

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness

RESULTS AND ANALYSES

RESULTS SBSE

Problem

Humans

Quality

Time

Quality

Time

NRPA

26.48 ±0.512

40.57 ±9.938

16.19 ±6.934

1,731,428.57 ±2,587,005.57

NRPB

95.77 ±0.832

534.69 ±91.133

77.85 ±23.459

3,084,000.00 ±2,542,943.10

WORKA

16,049.72 ±121.858

260.00 ±50.384

28,615.44 ±12,862.590

2,593,846.15 ±1,415,659.62

WORKB

25,047.40 ±322.085

4,919.30 ±1,219.912

50,604.60 ±20,378.740

5,280,000.00 ±3,400,588.14

Quality and Time (in milliseconds) for NRP and WORK averages and standard deviations

NRPA

NRPB

WORKA

WORKB

Boxplots showing average (+), maximum (), minimum (×) and 25% 75% quartile ranges of quality for mono-objective problems NRP and WORK, instances A and B, for SBSE and Human Subjects.

RESULTS SBSE

Problem

Humans

Time

MONRPA

0.6519 ±0.009

1,420.48 ±168.858

0.4448

1,365,000.00 ±1,065,086.42

MONRPB

0.6488 ±0.015

1,756.71 ±138.505

0.2870

2,689,090.91 ±2,046,662.91

TESTA

0.5997 ±0.009

1,661.03 ±125.131

0.4878

1,472,307.69 ±892,171.07

TESTB

0.6608 ±0.020

1,693.37 ±138.895

0.4979

3,617,142.86 ±3,819,431.52

Hypervolume and Time (in milliseconds) Results for SBSE and Humans For MONRP and TEST averages and standard deviations

RESULTS 120

MOCell NSGA-II Humans

100

cost

80 60 40 20

STANCES FOR PROBLEM NR

0 -2500

-2000

-1500

-1000

-500

value

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances A

RESULTS 250

MOCell NSGA-II Humans

200

cost

150

100

STANCES FOR PROBLEM NR

0 -11000-10000-9000 -8000 -7000 -6000 -5000 -4000 -3000 -2000 -1000 value

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances B

RESULTS 250

MOCell NSGA-II Humans

200

cost

150

100

0 -100

STANCES FOR PROBLEM NR -80

-60 -40 % coverage

-20

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances A

RESULTS 550

MOCell NSGA-II Humans

500 450 400 350 cost

300 250 200 150 100 50

STANCES FOR PROBLEM NR

0 -100

-90

-80

-70

-60 -50 % coverage

-40

-30

-20

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances B

FURTHER HUMAN COMPETITIVENESS ANALYSES Human participants were asked to rate how difficult they found each problem instance and how confident they were on their solutions

Bar chart showing percentage of human respondents who considered each problem “hard” or “very hard”

FURTHER HUMAN COMPETITIVENESS ANALYSES Human participants were asked to rate how difficult they found each problem instance and how confident they were on their solutions

Bar chart showing percentage of human respondents who were “confident” or “very confident”

FURTHER HUMAN COMPETITIVENESS ANALYSES

NRP

MONRP

WORK

TEST

Bar charts showing percentage differences in quality for mono and multi-objective problems generated by SBSE and the human subjects

FURTHER HUMAN COMPETITIVENESS ANALYSES 57.33% of the human participants which responded instance A indicated that solving instance B would be “harder” or “much harder”, and 55.00% predicted that their solution for this instance would be “worse” or “much worse” 62.50% of the instance B respondents pointed out the increased difficulty of a problem instance twice larger, and 57.14% that their solution would be “worse” or “much worse”

FURTHER HUMAN COMPETITIVENESS ANALYSES

These results suggest that for larger problem instances, the potential of SBSE to generate even more accurate results, when compared to humans, increases In fact, this suggests that SBSE may be particularly useful in solving real-world large-scale software engineering problems

Threats to Validity Small instance sizes Artificial instances Number and diversity of human participants Number of problems

This paper reports the results of an extensive experimental research aimed at evaluating the human competitiveness of SBSE Secondarily, several tests were performed over four classical SBSE problems in order to evaluate the performance of well-known metaheuristics in solving both mono- and multi-objective problems

FINAL CONSIDERATIONS

Regarding the comparison of algorithms GA generated more accurate solutions for mono-objective problems than SA NSGA-II consistently outperformed MOCell in terms of hypervolume and number of generated solutions MOCell outperformed NSGA-II when considering spread and the execution time All of these results are consistent with previously published research

FINAL CONSIDERATIONS

Regarding the human competitiveness question

Experiments strongly suggest that the results generated by search based software engineering can, indeed, be said to be human competitive Results indicate that for real-world large-scale software engineering problem, the benefits from applying SBSE may be even greater

FINAL CONSIDERATIONS

That is it!

Thanks for your time and attention.

2nd International Symposium on Search Based Software Engineering

September 7 – 9, 2010 Benevento, Italy

The Human Competitiveness of Search Based Software Engineering

prof.jerff@gmail.com http://goes.comp.uece.br/ Optimization in Software Engineering Group (GOES.UECE) State University of Ceará, Brazil