The Human Competitiveness of Search Based Software Engineering

Page 1

2nd International Symposium on Search Based Software Engineering

September 7 – 9, 2010 Benevento, Italy

The Human Competitiveness of Search Based Software Engineering Jerffeson Teixeira de Souza Camila Loiola Maia Fabrício Gomes de Freitas Daniel Pinto Coutinho

Optimization in Software Engineering Group (GOES.UECE) State University of Ceará, Brazil


Nice to meet you,

Jerffeson Teixeira de Souza, Ph.D. State University of Cearรก, Brazil Professor http://goes.comp.uece.br/ prof.jerff@gmail.com


Our little time will be divided as follows Part 01 Part 02 Part 03 Part 04

Research Questions Experimental Design Results and Analises Final Considerations


The question regarding the human competitiveness of SBSE ... has already been raised no comprehensive work has been Mark Harman, The Current State and Future of Search Based published to date. Software Engineering, Proceedings of International Conference on Software Engineering / Future of Software Engineering 2007 (ICSE/FOSE '07), Minneapolis: IEEE Computer Society, pp. 342-357, 2007.

why ?


one may argue ...


!

The human competitiveness of SBSE is not in doubt by the SBSE community

But, even if that is the case,...


Strong research results regarding this issue would likely, in the least, contribute to the increasing acceptance of SBSE outside its research community

!


?

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness


but ...

How to evaluate the Human Competitiveness of SBSE?


“

The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or humanwritten computer programs).

�


FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

SSBSE HUMANS 2010

VS

MACHINE THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW


FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

SSBSE HUMANS 2010

VS

SBSE ALGORITHMS THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW


FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

?

which SSBSE ones 2010

HUMANS VS

SBSE ALGORITHMS THRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET LIVE ON PAY-PER-VIEW


?

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness


??

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness

How do different metaheuristics compare in solving a variety of search based software engineering problems?

SBSE algorithms comparison



The Problems The Next Release Problem The Multi-Objective Next Release Problem

Motivation

The Workgroup Formation Problem

They can be considered “classical”Test formulations The Multi-Objective Case Selection Problem

They cover together a range of three different general phases in the software development life cycle


THE NEXT RELEASE PROBLEM Involves determining a set of customers which will have their selected requirements delivered in the next software release This selection prioritizes   * customers with higher cost   R i  ≤ B importance to the company  i∈S  and must respect a pre-determined ∑ wi budget i∈S


The cost of implementing the selected requirements is taken as an independent objective to be optimized, not as a constraint, n

along with

∑ costi ⋅ xi i =1

n

∑ scorei ⋅ xi THE

a score representing the importance of a given requirement

i =1

MULTI-OBJECTIVE

NEXT RELEASE PROBLEM


P N

∑ ∑ Sal p × Aap × Dura p =1a =1

P

S

N

− λ ∑∑ ∑ R ps × Aap × SI s

The pformulation displays a single objective =1 s =1 a =1 function toNbeP minimized, Nwhich composes both P   skill and preference factors −salary η  ∑ ∑costs, Ppa × A ap + ∑ ∑ Pmpa × Aap  a =1 p =1 

a =1 p =1

 

N P P   −η  ∑ ∑ Pp1 p 2 × Aapallocation × Aap 2 × X of deals with the human ∑ 1 p 1 p 2  a =1 p1=1 p 2 =1   resources to projecttasks THE

WORKGROUP FORMATION PROBLEM


THE MULTI-OBJECTIVE TEST CASE SELECTION PROBLEM extends previously published mono-objective formulations

The paper discusses two variations, one which considers two objectives (code coverage and execution time), used here, and the other covering three objectives (code coverage, execution time and fault detection).


For each problem (NRP, MONRP, WORK and TEST), two instances, A and B, with increasing sizes, were synthetically generated.

The Data


Instance Name

Instance Features # Customers

# Tasks

NRPA

10

20

NRPB

20

40 INSTANCES FOR PROBLEM NRP

Instance Name MONRPA MONRPB

Instance Features # Customers

# Requirements

10 20

20 40

INSTANCES FOR PROBLEM MONRP


Instance Name

Instance Features # Persons

# Skills

# Activities

WORKA

10

5

5

WORKB

20

10

10

INSTANCES FOR PROBLEM WORK

Instance Name TESTA TESTB

Instance Features # Test Cases

# Code Blocks

20 40

40 80

INSTANCES FOR PROBLEM TEST


The Algorithms For Mono-Objective Problems Genetic Algorithm (GA) Simulated Annealing (SA)

For Multi-Objective Problems NSGA-II MoCell

For Mono and Multi-Objective Problems Random Search


Human Subjects A total of 63 professional software engineers solved some or all of the instances.

NUMBER OF HUMAN RESPONDENTS PER INSTAN


Human Subjects Besides solving the problem instance, each respondent was asked to answer the following questions related to each problem instance How hard was it to solve this problem instance? How hard would it be for you to solve an instance twice this size? What do you think the quality of a solution generated by you over an instance twice this size would be?

In addition to these specific questions regarding each problem instance, general questions on the respondent theoretical and practical experience over software engineering


Comparison Metrics For Mono-Objective Problems Quality

For Multi-Objective Problems Hypervolume Spread Number of Solutions

For Mono and Multi-Objective Problems Execution Time


RESULTS AND ANALYSES

?

How do different metaheuristics compare in solving a variety of search based software engineering problems?

SBSE algorithms comparison


RESULTS

Problem

GA

SA

RAND

NRPA

26.45±0.500

25.74±0.949

15.03±5.950

NRPB

95.41±0.190

90.47±7.023

45.74±11.819

WORKA

16,026.17±51.700

18,644.71±1,260.194

19,391.34±1,220.17

WORKB

24,831.23±388.107

35,174.19±2,464.733

36,892.64±2,428.269

Quality of Results for NRP and WORK

averages and standard deviations, over 100 executions


RESULTS

Problem

GA

SA

RAND

NRPA

40.92±11.112

23.01±7.476

0.00±0.002

NRPB

504.72±95.665

292.62±55.548

0.06±0.016

WORKA

242.42±44.117

73.35±19.702

0.04±0.010

WORKB

4,797.89±645.360

2,211.28±234.256

1.75±0.158

Time (in miliseconds) Results for NRP and WORK averages and standard deviations, over 100 executions


NRPA

NRPB

WORKA

WORKB

Boxplots showing average (+), maximum (), minimum (×) and 25% 75% quartile ranges of quality for mono-objective problems NRP and WORK, instances A and B, for GA, SA and Random Search.


RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

0.6519±0.009

0.6494±0.013

0.5479±0.0701

MONRPB

0.6488±0.015

0.6470±0.017

0.5462±0.0584

TESTA

0.5997±0.009

0.5867±0.019

0.5804±0.0648

TESTB

0.6608±0.020

0.6243±0.044

0.5673±0.1083

Hypervolume Results for MONRP and TEST

averages and standard deviations, over 100 executions


RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

0.4216±0.094

0.3973±0.031

0.5492±0.1058

MONRPB

0.4935±0.098

0.3630±0.032

0.5504±0.1081

TESTA

0.4330±0.076

0.2659±0.038

0.5060±0.1029

TESTB

0.3503±0.178

0.2963±0.072

0.4712±0.1410

Spread Results for MONRP and TEST

averages and standard deviations, over 100 executions


RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

1,420.48±168.858

993.09±117.227

25.30±10.132

MONRPB

1,756.71±138.505

1,529.32±141.778

30.49±7.204

TESTA

1,661.03±125.131

1,168.47±142.534

25.24±11.038

TESTB

1,693.37±138.895

1,370.96±127.953

32.89±9.335

Time (in miliseconds) Results for MONRP and TEST averages and standard deviations, over 100 executions


RESULTS

Problem

NSGA-II

MOCell

RAND

MONRPA

31.97±5.712

25.01±5.266

12.45±1.572

MONRPB

60.56±4.835

48.04±4.857

20.46±2.932

TESTA

35.43±4.110

26.20±5.971

12.54±1.282

TESTB

41.86±9.670

19.93±8.514

11.58±2.184

Number of Solutions Results for MONRP and TEST averages and standard deviations, over 100 executions


RESULTS 120 100

MOCell NSGA-II Random

cost

80 60 40 20

STANCES FOR PROBLEM NR

0 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 value

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances A


RESULTS 220

MOCell NSGA-II Random

200 180 160 cost

140 120 100 80 60

STANCES FOR PROBLEM NR

40 -11000 -10000 -9000

-8000

-7000 value

-6000

-5000

-4000

-3000

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances B


RESULTS 1400

MOCell NSGA-II Random

1200

cost

1000 800 600 400 200

STANCES FOR PROBLEM NR

0 -100

-80

-60 -40 % coverage

-20

0

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances A


RESULTS 1000

MOCell NSGA-II Random

900 800 700 cost

600 500 400 300 200 100

STANCES FOR PROBLEM NR

0 -100

-90

-80

-70 % coverage

-60

-50

-40

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances B


?

Can the results generated by Search Based Software Engineering be said to be human competitive?

SBSE human competitiveness

RESULTS AND ANALYSES


RESULTS SBSE

Problem

Humans

Quality

Time

Quality

Time

NRPA

26.48 ±0.512

40.57 ±9.938

16.19 ±6.934

1,731,428.57 ±2,587,005.57

NRPB

95.77 ±0.832

534.69 ±91.133

77.85 ±23.459

3,084,000.00 ±2,542,943.10

WORKA

16,049.72 ±121.858

260.00 ±50.384

28,615.44 ±12,862.590

2,593,846.15 ±1,415,659.62

WORKB

25,047.40 ±322.085

4,919.30 ±1,219.912

50,604.60 ±20,378.740

5,280,000.00 ±3,400,588.14

Quality and Time (in milliseconds) for NRP and WORK averages and standard deviations


NRPA

NRPB

WORKA

WORKB

Boxplots showing average (+), maximum (), minimum (×) and 25% 75% quartile ranges of quality for mono-objective problems NRP and WORK, instances A and B, for SBSE and Human Subjects.


RESULTS SBSE

Problem

Humans

HV

Time

HV

Time

MONRPA

0.6519 ±0.009

1,420.48 ±168.858

0.4448

1,365,000.00 ±1,065,086.42

MONRPB

0.6488 ±0.015

1,756.71 ±138.505

0.2870

2,689,090.91 ±2,046,662.91

TESTA

0.5997 ±0.009

1,661.03 ±125.131

0.4878

1,472,307.69 ±892,171.07

TESTB

0.6608 ±0.020

1,693.37 ±138.895

0.4979

3,617,142.86 ±3,819,431.52

Hypervolume and Time (in milliseconds) Results for SBSE and Humans For MONRP and TEST averages and standard deviations


RESULTS 120

MOCell NSGA-II Humans

100

cost

80 60 40 20

STANCES FOR PROBLEM NR

0 -2500

-2000

-1500

-1000

-500

0

value

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances A


RESULTS 250

MOCell NSGA-II Humans

200

cost

150

100

50

STANCES FOR PROBLEM NR

0 -11000-10000-9000 -8000 -7000 -6000 -5000 -4000 -3000 -2000 -1000 value

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances B


RESULTS 250

MOCell NSGA-II Humans

200

cost

150

100

50

0 -100

STANCES FOR PROBLEM NR -80

-60 -40 % coverage

-20

0

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances A


RESULTS 550

MOCell NSGA-II Humans

500 450 400 350 cost

300 250 200 150 100 50

STANCES FOR PROBLEM NR

0 -100

-90

-80

-70

-60 -50 % coverage

-40

-30

-20

Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances B


FURTHER HUMAN COMPETITIVENESS ANALYSES Human participants were asked to rate how difficult they found each problem instance and how confident they were on their solutions

Bar chart showing percentage of human respondents who considered each problem “hard” or “very hard”


FURTHER HUMAN COMPETITIVENESS ANALYSES Human participants were asked to rate how difficult they found each problem instance and how confident they were on their solutions

Bar chart showing percentage of human respondents who were “confident” or “very confident”


FURTHER HUMAN COMPETITIVENESS ANALYSES

NRP

MONRP

WORK

TEST

Bar charts showing percentage differences in quality for mono and multi-objective problems generated by SBSE and the human subjects


FURTHER HUMAN COMPETITIVENESS ANALYSES 57.33% of the human participants which responded instance A indicated that solving instance B would be “harder” or “much harder”, and 55.00% predicted that their solution for this instance would be “worse” or “much worse” 62.50% of the instance B respondents pointed out the increased difficulty of a problem instance twice larger, and 57.14% that their solution would be “worse” or “much worse”


FURTHER HUMAN COMPETITIVENESS ANALYSES

These results suggest that for larger problem instances, the potential of SBSE to generate even more accurate results, when compared to humans, increases In fact, this suggests that SBSE may be particularly useful in solving real-world large-scale software engineering problems


Threats to Validity Small instance sizes Artificial instances Number and diversity of human participants Number of problems


This paper reports the results of an extensive experimental research aimed at evaluating the human competitiveness of SBSE Secondarily, several tests were performed over four classical SBSE problems in order to evaluate the performance of well-known metaheuristics in solving both mono- and multi-objective problems

FINAL CONSIDERATIONS


Regarding the comparison of algorithms GA generated more accurate solutions for mono-objective problems than SA NSGA-II consistently outperformed MOCell in terms of hypervolume and number of generated solutions MOCell outperformed NSGA-II when considering spread and the execution time All of these results are consistent with previously published research

FINAL CONSIDERATIONS


Regarding the human competitiveness question

Experiments strongly suggest that the results generated by search based software engineering can, indeed, be said to be human competitive Results indicate that for real-world large-scale software engineering problem, the benefits from applying SBSE may be even greater

FINAL CONSIDERATIONS


That is it!

Thanks for your time and attention.


2nd International Symposium on Search Based Software Engineering

September 7 – 9, 2010 Benevento, Italy

The Human Competitiveness of Search Based Software Engineering

prof.jerff@gmail.com http://goes.comp.uece.br/ Optimization in Software Engineering Group (GOES.UECE) State University of Ceará, Brazil


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.