Integrating Supervised Data Mining into Performance Oriented Design Contemporary optimisation techniques, which support Performance Oriented Design, are commonly used to analyse a combination of parameters for specific design problems by seeking for an optimum solution. Though they proved useful, they have major disadvantages: • Are focused only on the final result, eliminating intermediate steps. • Do not fully support informative study of design solutions which is essential for architectural design. • Lack methods of accumulating case-specific knowledge and reusing it. • Require an iterative evaluation of each instance which is computationally demanding.
Data Mining Flow Diagram
Symbolic Regression via Genetic Programming
i1
Data Sampling
The Problem
i1
x11 x12
x1n
i2
x21 x1n
x2n
in
xn1 xn2
Parametric Model
y1 y2
i2
xmn
It is achieved by implementing Supervised Data Mining techniques. They aim to extract knowledge about a collection of previously gathered data which target values already known for predicative purposes. The extraction is based on a statistical process, called Symbolic Regression which searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.
+
in Y
Performance evaluation (e.g total solar radiation)
1
X
1
Y
infix notation: prefix notation:
xn1 xn2
xmn
Symbolic Regression via Genetic Programming
y1 y2 yn
a11 a12 a21 a1n an1 an2
a1n
2
61.612
110
2
5
110
1
7
113
111
5
2
112
1
aproximated performance values
b1 b2 Model Generation
Performance evaluation
Verification data
bn “true” performance values
0.0233
Y
bn
/ X
Insert Mutatnt into New Population
Individuals = Individuals+1
2
Example(s) +-*/ sin, cos , exp ...
Iindividuals = Individuals+1
Variables Constant values ...
Y
The Basic Genetic Programming flow diagram.
Crossover
Crossover
1
Y
*
+ /
1
Example(s) x, y 3, 0.45 ...
/
x/(x-3) randomly generated sub-tree
X
Y
/
parent1
3
offspring
X
*
(y+1)*(x/x-3) 3
+
(x+y)+3 crossover point
-
3
X
-
X X
X
+
crossover point
+
2
parent
0.0534
offspring
(y+1)*2
Terminal Set Kind of Primitive
+
0.0569
Visualisation of test data and verification. The drawings represent a snapshot of test configuration samples of input parameters (radar chats) and the comparison between performance values obtained from original simulations (solSim) and approximated functions (solGP) in a form of a bar chart. The skin models were generated based on a B-Spline surface which had various number of control points placed on a 2D grid and variable, independent z components.
(x/2)+y
crossover point
*
0.0652
0.0538
Functional Set Kind of Primitive
0.0682
0.0591
Mutation
crossover point
0.0798
3
Mutation
Primitive Set Primitive Set
Arithmetic Mathematical ....
Desired Outcome
Perform Mutation
Insert Offspring into New Population
+
No particular model is provided as a starting point to * the algorithm. Instead, initial expressions are formed by randomly combining mathematical building blocks + / from a primitive set. New equations are then formed Y 1 X 2 by recombining previous equations by using genetic (y+1)*(x/2) operators (e.g. mutation, crossover). parent2
zn
1.001
7.0726
0.0879
parent1
b1 b2
23.080 11.9208
Perform Crossover
(x+y)+3
z1 z2
31.4061
+
X
z1 z2
Select OneIndividual Based on fitness
Select Two Individuals Based on Fitness
9
crossover point
amn
Test Set randomly generated input parameters not included in the trainig set
111
| yn - g(xn1, xn2, ... , xnm) |
zn
1270
31
Select Genetic Operation Probabilistically
An analytical function in a tree-based form. The red arrows 3 + show the direction of recursive evaluation of the syntax tree.
Training Set input parameters + target performance values
The candidate function to verify
Gen = Gen+1
450
(y-1)+(x-1) - (y+1)*(x/2) -( +( -(y 1) -(x 1) ) *( +(y 1) /(x 2) ) )
i=1
g(an1, an2, ... , anm)
4042
2972
1509
X
crossover point
n
min
The methodology offers an opportunity to simultaneously handle many performance evaluation and eliminates the necessity to rerun the exploration process when a new objective is to be introduced. Therefore, it allows flexibility in design space exploration.
/ 1
solGP
5348
Individual = M ?
+
solSim
66.342
110
x1n x2n
For the most complex skin model (25 input parameters) test error was of 5.7% and execution time less than 1[ms] (283 times faster than the original simulation).
3477
Crossover - + - Y 1 - X 1 * + Y 1 / X 2 x11 x12 x21 x1n
Termination Criterion Satisifed ?
To achieve this 75 million candidate functions were evolved over 750 generations based on precomputed training sets.
*
Instancies
As a test case an analytical function which directly computes total solar radiation of any given building envelope was tested.
Individual = 0
performance values
Input Parameters
Create initial population of programs from the available primitives
Execute each program and measure its fitness
-
yn
The Solution To overcame aforementioned issues presented methodology aims to learn a function, which directly maps a performance value for any given configuration of input parameters. The method offers significant trade-off between speed and accuracy and can be widely reused efficiently replacing time consuming simulations.
GP is a type of evolutionary computation that generation by generation transforms populations of programs into a new, ideally better, population of programs. Programs are represented in memory as tree-based structures which are executed recursively. In the case of symbolic regression computer programs which are being evolved are analytical functions.
Evolution and Output Verification
+ Y
/ 1
X
(y+1)*(x/2) parent2
3 2
(x/2)+y offspring
2
solSim
solSim
solSim
solGP
solGP
solGP