RESEARCH SUMMARIES :Indoor environment thermal comfort prediction using several machine learning applications
JULY – DECEMBER 2021 JONG JOO KIM
#ThermalComfort #MachineLearning #Data #MatLab
Contents 1.
Comparing the performance of data-driven prediction models using small-sized thermal comfort datasets
2.
Utilization of Machine Learning Algorithms for predicting Thermal Comfort in Korean Climate
3.
Collection and Analysis of Occupant Behavior in Thermal Environment Using Occupant Voting System 1
2021 KIEAE International Conference
Comparing the performance of data-driven prediction models using small-sized thermal comfort datasets
Jong Joo Kim 김종주
1
2
1
INTRODUCTION
1.1 Background
• Thermal comfort prediction: PMV vs Data-driven models [1] [2] Calculation- based Indoor Air Temperature
• Machine Learning
Indoor Radiant Mean Temperature
• •
Indoor Air Velocity Indoor Relative Humidity
Metabolic Rate Clothing
Random Forest (RF) Support Vector Machine (SVM)
• Deep Learning (neural network)
Indoor Environmental factors Personal factors
•
Multilayer neural network (MLP)
•
CNN_LSTM
• Transfer Learning (TL) + Deep
3
-3
-2
-1
Cold
Cool
Slightly Cool
0
+1
Neutral Slightly Warm
+2
+3
Warm
Hot
Fanger’s PMV model
Learning
Performance
• •
TL _MLP TL_CNN_LSTM
Data-driven models
1
INTRODUCTION
1.1 Background Transfer Learning Strategy [3] [4] • 2 neural network models for similar tasks a. Source Model b. Target Model
Transfer Learning Steps • Pre-train the first model (Source) on the vast data • Apply the knowledge gained into the new model (Target) for the desired, related task a. Transfer the task-related layers to the target model. 4
1
INTRODUCTION
1.1 Background Challenge • Use of thermal comfort dataset a. Thermal comfort data exhibits high subjectivity • Thermal comfort dataset collection a. Steady participation over a long time period
1.2 Missing Gap • Previous studies demonstrated their proposed data-driven models using large thermal comfort datasets. Question.
Are data-driven models still effective for small-sized thermal comfort datasets? 5
Thermal Comfort Dataset
1
INTRODUCTION
1.3 Research Objective • Extract small-sized thermal comfort datasets based on the experiment human subject ID • Apply the small-sized datasets into the data-driven prediction models and analyze the results through a series of comparisons.
1.4 Research Goal • Learn whether data-driven models are still effective with small-sized thermal comfort dataset
6
2
METHDOLOGY
Modeling Program
2.1 Algorithm Selection • Three algorithms known for high classification performance a. Random Forest (RF) b. Convolutional Neural Network Long Short-Term Memory Networks (CNN_LSTM) [4] c. Transfer learning Convolutional Neural Network Long Short-Term Memory Networks (TL_CNN_LSTM)
2.2 Thermal comfort dataset Collection • Medium US office dataset [5] a. Thermal comfort data collected from the Friends Center office building in Philadelphia , USA with 24 participants from 2012 to 2013. • ASHRAE Global Thermal comfort Database II [6] a. An extensive set of thermal comfort data collected from around the world over a period of several decades, released in 2018 b. Source dataset: Only used to pre-train a model for the transfer learning strategy 7
2
METHDOLOGY
2.3 Feature Selection Feature set 1 Indoor Air Temperature Indoor Relative Humidity
Features
Feature set 2
Clothing
Indoor Air Temperature
Clothing
Metabolic Rate
Indoor Relative Humidity
Metabolic Rate
Indoor RMT
Indoor RMT
Indoor Air Velocity
Indoor Air Velocity
*Thermal Sensation Vote
*Thermal Sensation Vote
Outdoor Temperature
*Thermal Sensation Vote (TSV): Thermal comfort Prediction response class Outdoor Environmental factors Indoor Environmental factors Personal factors
8
Units
ASHRAE
US Medium office
Indoor Air temperature
℃
0.6 – 37
16.4 – 27.8
Indoor Relative Humidity
%
0.4 – 100
15.7 – 72.4
Indoor RMT
℃
4 – 49.5
16.4 – 27.6
Indoor Air Velocity
m/s
0 – 56.2
0.02 – 0.19
Metabolic Rate
met
0.7 – 6.8
1.00 – 6.80
Clothing Insulation
clo
0.04 – 2.89
0.31 – 1.83
Outdoor Air Temperature
℃
-18.4 – 45.1
-5 – 33
2
METHDOLOGY
2.4 Data-preprocessing •
Delete every row that has any missing entries
•
Data Conversion • Transform the continuous range to points categories • Scale conversion from 7-point to 5-point Continuous TSV (-3~+3) Class TSV (7-point scale) Class TSV (5-point scale)
9
-3
-2
-1
0
+1
+2
+3
-3
-2
-1
0
+1
+2
+3
-1
0
+1
-2
+2
2
METHDOLOGY
2.5 Small Dataset Extraction
Final Instances
•
Medium US office Dataset
A
133
•
4 participants’ datasets: A,B,C, and D
B
112
C
132
D
120
A
10
Dataset
B
C
D
2
METHDOLOGY
2.6 Dataset Split: Training and Test sets •
When splitting, keep the prediction class (TSV) distribution proportional Raw Dataset
11
50%
50%
Training Set
Test Set
2
METHDOLOGY
2.6 Dataset Split: Training set Partitions Training Set
50% 60% 70% 80% 90% 100%
• 12
Test Set
Fixed: 100%
Training set partitions are made to analyze the influences of changing the training set size.
2
METHDOLOGY
2.7 Experiment Setup • Training/ Test split X 10 Raw Dataset Training Set: 50%
Test Set: 50%
wF1
MCC
1st
50%
33%
0.2
2nd
42%
30%
0.23
…
…
…
10th
43%
32%
0.25
Average
45%
31%
0.24
…
Acc
13
2
METHDOLOGY
2.8 Performance Evaluation Metrics a. Accuracy b. weighted F1-score c. MCC
• Weighted F1-score and MCC are proper metrics to evaluate the imbalanced datasets
14
3
RESULTS
3.1 Experiment Results -1 • 6 features
Dataset
• Dataset: A, B
Train Data 50%
60%
70%
A 80%
90%
100%
15
Algorithm
ACC
wF1
MCC
PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest
28.9% 36.5% 35.4% 37.6% 28.9% 39.7% 39.0% 42.7% 28.9% 42.1% 39.8% 44.1% 28.9% 41.2% 39.6% 41.9% 28.9% 43.1% 41.1% 44.8% 28.9% 43.7% 41.0% 44.6%
35.2% 34.2% 37.6% 38.3% 38.7% 41.4% 40.4% 39.1% 42.9% 39.0% 39.0% 40.9% 41.5% 40.3% 44.1% 42.2% 39.9% 43.9%
0.22 0.20 0.22 0.25 0.25 0.26 0.27 0.25 0.29 0.26 0.25 0.25 0.28 0.25 0.29 0.29 0.26 0.29
Dataset
Train Data 50%
60%
70%
B 80%
90%
100%
Algorithm
ACC
wF1
MCC
PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest
34.0% 37.9% 37.2% 40.3% 34.0% 36.7% 34.3% 39.7% 34.0% 37.3% 36.8% 38.7% 34.0% 38.3% 37.8% 38.7% 34.0% 38.2% 36.3% 38.4% 34.0% 37.8% 36.0% 38.3%
36.8% 40.1% 42.1% 34.5% 34.6% 42.5% 34.0% 35.3% 36.5% 37.9% 38.2% 38.0% 35.3% 37.4% 36.6% 34.8% 38.9% 38.5%
0.16 0.19 0.24 0.15 0.13 0.22 0.17 0.18 0.19 0.22 0.18 0.19 0.21 0.19 0.19 0.22 0.21 0.21
3
RESULTS
3.1 Experiment Results -1 • 6 features
Dataset
• Dataset: C, D
Train Data 50%
60%
70%
C 80%
90%
100%
16
Algorithm PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest
ACC
wF1
MCC
30.0% 31.5% 30.9% 29.0% 30.0% 28.7% 27.3% 29.1% 30.0% 30.5% 31.4% 31.7% 30.0% 30.4% 29.9% 30.5% 30.0% 29.4% 30.5% 31.6% 30.0% 30.1% 30.6% 31.5%
30.6% 30.3% 29.7% 27.0% 27.8% 29.6% 32.7% 30.7% 30.0% 27.8% 28.1% 30.7% 29.5% 30.3% 31.5% 27.7% 28.7% 32.4%
0.18 0.18 0.16 0.16 0.16 0.16 0.20 0.16 0.18 0.15 0.17 0.17 0.17 0.19 0.18 0.18 0.17 0.19
Dataset
Train Data 50%
60%
70%
D 80%
90%
100%
Algorithm
ACC
wF1
MCC
PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest
25.3% 39.9% 38.9% 35.2% 25.3% 43.4% 42.6% 38.6% 25.3% 42.3% 41.1% 36.5% 25.3% 43.9% 42.2% 36.8% 25.3% 46.9% 43.5% 36.2% 25.3% 46.8% 43.2% 36.2%
31.2% 39.2% 37.3% 39.6% 41.5% 37.7% 38.1% 45.4% 39.1% 41.2% 43.1% 38.6%
0.18 0.25 0.21 0.23 0.22 0.25 0.23 0.29 0.23 0.25 0.26 0.20
3
RESULTS Data A Prediction Performance
• Model Performance Comparison • Random Forest > Rest • Data-driven models > PMV
50.0%
Accuracy
: Dataset A
60.0%
40.0% 30.0% 20.0% 10.0%
• more training sets, more stabilized performance
0.0%
50%
60%
PMV
70%
TL_CNN_LSTM
80%
CNN_LSTM
90%
100%
Random Forest
60%
wF1-Score
50% 40% 30% 20% 10% 0%
17
50%
60%
TL_CNN_LSTM
70%
CNN_LSTM
80%
90%
Random Forest
100%
3
RESULTS MCC
/표준 /표준 /표준 /표준
MCC
/표준 /표준 /표준 /표준 /표준 /표준 /표준
50%
60%
70% TL_CNN_LSTM
18
80% CNN_LSTM
Random Forest
90%
100%
3 60%
Dataset A
wF1-Score
50% 40% 30% 20% 10% 0%
50%
60%
70%
TL_CNN_LSTM
CNN_LSTM
80%
90%
100%
90%
100%
Random Forest
Data A Prediction Performance 60%
MCC
Accuracy
50% 40% 30% 20% 10% 0%
50%
60% PMV
19
70%
TL_CNN_LSTM
80% CNN_LSTM
90%
Random Forest
100%
50%
60%
70%
TL_CNN_LSTM
CNN_LSTM
80% Random Forest
3
RESULTS
• Comparison between PMV and Data-driven models Dataset A with 100% training set partition
20
PMV
TL_CNN_LSTM
3
RESULTS
3.1 Experiment Results -2 • 7 features
Dataset
• Dataset: A, B
Train Data 50%
60%
70%
A 80%
90%
100%
21
Algorithm
ACC
wF1
MCC
TL_CNN_LSTM
39.2%
38.1%
0.24
CNN_LSTM Random Forest
36.4% 37.8%
36.8% 36.5%
0.23 0.22
TL_CNN_LSTM CNN_LSTM
41.1% 41.1%
40.1% 40.3%
0.26 0.26
Random Forest TL_CNN_LSTM
41.6% 40.6%
39.6% 39.0%
0.24 0.26
CNN_LSTM Random Forest
39.9% 41.8%
39.1% 40.4%
0.25 0.25
TL_CNN_LSTM CNN_LSTM
43.8% 42.3%
41.5% 41.7%
0.29 0.27
Random Forest TL_CNN_LSTM
44.1% 43.5%
42.2% 41.6%
0.27 0.28
CNN_LSTM Random Forest
42.2% 43.2%
40.9% 41.5%
0.26 0.26
TL_CNN_LSTM CNN_LSTM
43.1% 40.9%
41.3% 39.5%
0.28 0.25
Random Forest
43.6%
41.8%
0.26
Dataset
Train Data
ACC
wF1
MCC
50%
TL_CNN_LSTM CNN_LSTM
36.3% 35.4%
33.2% 37.8%
0.14 0.17
60%
Random Forest TL_CNN_LSTM CNN_LSTM
36.8% 41.4% 38.2%
39.0% 49.8% 32.6%
0.20 0.32 0.19
70%
Random Forest TL_CNN_LSTM CNN_LSTM
43.9% 41.4% 38.3%
43.6% 42.1% 39.8%
0.26 0.22 0.22
42.1% 42.8% 40.2%
42.4% -
0.26 -
80%
Random Forest TL_CNN_LSTM CNN_LSTM
34.2%
0.19
42.2% 43.5% 39.3%
37.4% -
0.16 -
90%
Random Forest TL_CNN_LSTM CNN_LSTM
40.7%
0.21
43.0% 42.6% 38.8%
43.8% -
0.26 -
100%
Random Forest TL_CNN_LSTM CNN_LSTM
40.1%
0.21
Random Forest
43.3%
43.6%
0.29
B
Algorithm
3
RESULTS
3.1 Experiment Results -2 • 7 features • Dataset: A, B
Dataset
Train Data 50%
60%
70%
C 80%
90%
100%
22
Algorithm
ACC
wF1
MCC
TL_CNN_LSTM
28.5%
29.7%
0.15
CNN_LSTM
27.8%
28.1%
0.17
Random Forest
27.5%
28.6%
TL_CNN_LSTM
32.7%
CNN_LSTM
Dataset
Train Data
ACC
wF1
MCC
TL_CNN_LSTM
36.1%
38.7%
0.19
CNN_LSTM
36.9%
40.9%
0.23
0.16
Random Forest
37.3%
40.6%
0.23
31.8%
0.17
TL_CNN_LSTM
40.2%
36.0%
0.20
31.3%
32.1%
0.18
CNN_LSTM
38.2%
37.2%
0.22
Random Forest
29.7%
28.3%
0.17
Random Forest
36.7%
38.0%
0.22
TL_CNN_LSTM
31.6%
28.4%
0.15
TL_CNN_LSTM
41.6%
37.4%
0.19
CNN_LSTM
30.6%
29.7%
0.18
CNN_LSTM
41.2%
43.9%
0.26
Random Forest
30.4%
30.6%
0.18
Random Forest
38.6%
39.2%
0.24
TL_CNN_LSTM
29.2%
31.9%
0.24
TL_CNN_LSTM
41.3%
45.8%
0.32
CNN_LSTM
29.9%
29.9%
0.17
CNN_LSTM
39.1%
40.2%
0.24
Random Forest
29.6%
29.7%
0.18
Random Forest
37.4%
39.4%
0.23
TL_CNN_LSTM
31.9%
27.7%
0.17
TL_CNN_LSTM
40.6%
42.7%
0.24
CNN_LSTM
29.6%
30.2%
0.17
CNN_LSTM
40.3%
43.5%
0.25
Random Forest
29.2%
28.8%
0.18
Random Forest
37.7%
38.8%
0.22
TL_CNN_LSTM
32.4%
32.8%
0.17
TL_CNN_LSTM
40.4%
44.6%
0.26
CNN_LSTM
30.1%
30.7%
0.16
CNN_LSTM
40.4%
42.0%
0.26
Random Forest
30.1%
28.9%
0.17
Random Forest
37.2%
37.3%
0.20
50%
60%
70%
D 80%
90%
100%
Algorithm
3
RESULTS Data A Prediction Performance
• Model Performance Comparison
60.0%
: Dataset A
• more training sets, more stabilized performance
Accuracy
• TL_CNN_LSTM > Random Forest with 7 features
50.0% 40.0% 30.0% 20.0% 10.0% 0.0%
50%
60%
70%
80%
90%
100%
50%
60%
70%
80%
90%
100%
60.0%
w F1-score
50.0% 40.0% 30.0% 20.0% 10.0% 0.0%
TL_CNN_LSTM
23
CNN_LSTM
Random Forest
3 MCC 0.50 0.45 0.40 0.35
MCC
0.30 0.25 0.20 0.15 0.10 0.05 0.00
50%
60%
70% TL_CNN_LSTM
24
80% CNN_LSTM
Random Forest
90%
100%
3
RESULTS 60%
w F1-score
Dataset A
50% 40% 30% 20% 10% 0%
50%
60% TL_CNN_LSTM
Data A Prediction Performance 60%
40%
MCC
Accuracy
50%
30% 20% 10% 0%
25
50%
60%
70%
80%
90%
100%
0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00
50%
60%
70% CNN_LSTM
70%
80%
90%
100%
90%
100%
Random Forest
80%
3
RESULTS
3.2 Summary
• The random forest model achieved the highest performance overall but the TL_CNN_LSTM model showed slightly better performance with 7 features. •
Possible that TL_CNN_LSTM can perform better with more features.
•
Additionally, TL_CNN_LSTM shows better performance with imbalanced datasets
• No huge difference in performance between various training sets were observed, but more training set, the more performance becomes stabilized. • The entire data-driven model performances were not as meaningful as expected, but still achieved better performance than PMV.
26
4
CONCLUSION
4. Conclusion Question: Are data-driven models still effective for small size thermal comfort
datasets?
A: By nature of subjectivity of thermal comfort dataset, it is not yet likable to apply small-sized thermal comfort data for data-driven prediction modeling in practical use.
27
4
APPENDIX-1 (Running algorithms)
load("7US_ID_TSV.mat") rng('default') % For reproducibility %% ID i = 1; for k = 1:4 for ID = 1 if ID == 1 Dataset = ID3; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; elseif ID == 3 Dataset = ID8; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; elseif ID == 4 Dataset = ID14; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; else Dataset = ID16; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; end
28
%% Data Partition for Data = 6 if Data == 1 hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 2 hpartition = cvpartition(Dataset.TSV,'Holdout',0.4); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 3 hpartition = cvpartition(Dataset.TSV,'Holdout',0.3); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 4 hpartition = cvpartition(Dataset.TSV,'Holdout',0.2); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 5 hpartition = cvpartition(Dataset.TSV,'Holdout',0.1); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); else tblTrain = tblTrain; end
% Classifiers for Mode =1 if Mode == 1 [MeanLog_Test] = TL_CNN_LSTM(tblTrain, tblTest); elseif Mode == 2 [MeanLog_Test] = CNN_LSTM(tblTrain, tblTest); elseif Mode == 3 [MeanLog_Test] = RF(tblTrain, tblTest); end Val_Log(i,:) = [k,ID,Data,Mode,MeanLog_Test(1), MeanLog_Test(2) , MeanLog_Test(3)]; i = i+1; end end end end
4
APPENDIX-2 (TL_CNN_LSTM modeling)
function [MeanLog_Test] = TL_CNN_LSTM(tblTrain, tblTest) training_raw = rmmissing(tblTrain); test_raw = rmmissing(tblTest); %% predictor_col = 1:6; response_col = 7; d = 1; % [training_oversampled] = training_raw; %% Standardization tMean = varfun(@mean, training_oversampled, 'InputVariables', @isnumeric); tSigma = varfun(@std, training_oversampled, 'InputVariables', @isnumeric); training_oversampled = helperNormalize(training_oversampled, tMean, tSigma); test = helperNormalize(test_raw, tMean, tSigma); %% Training, Validation for n = 1:(size(training_oversampled,1)-d+1) Time_train = training_oversampled{n+d-1,1}; predictor_train = (training_oversampled{n:n+d-1,predictor_col })'; respone_train = (training_oversampled{n+d-1,response_col})'; XTrain_pre(n,2) = {predictor_train}; YTrain_pre(n,2) = respone_train; end XTrain = XTrain_pre(:,2); YTrain = YTrain_pre(:,2); YTrain = categorical(YTrain); %% % Test dataset for n = 1:(size(test,1)-d+1) Time_test = test{n+d-1,1}; predictor_test = (test{n:n+d-1,predictor_col})'; respone_test = (test{n+d-1,response_col})’; XTest_pre(n,2) = {predictor_test}; YTest_pre(n,2) = respone_test; end XTest = XTest_pre(:,2); YTest = YTest_pre(:,2); YTest = categorical(YTest);
29
%% Network generation inputSize = size(predictor_col,2); numClasses = numel(categories(YTrain)); %% Load the trained source layer load('sourceNet.mat') % analyzeNetwork(net); inputSize = net.Layers(1).InputSize; numClasses = numel(categories(YTrain)); %% Transfer layers lgraph = layerGraph(net.Layers); newlayers = [ ... sequenceInputLayer(inputSize) convolution1dLayer(5,128,Padding="causal",Name='conv1d') reluLayer('Name','relu_1') layerNormalizationLayer('Name','layernorm') globalAveragePooling1dLayer('Name','globalavgpool1d') lstmLayer(256,'OutputMode','sequence','Name','lstm1') dropoutLayer(0.1,'Name','dropout_1') lstmLayer(256,'OutputMode','sequence','Name','lstm2') dropoutLayer(0.1,'Name','dropout_2') flattenLayer('Name','flatten') lgraph.Layers(11:17)]; %% newlayers(11:17) = freezeWeights(newlayers(11:17)); connections = lgraph.Connections; newlgraph = createLgraphUsingConnections(newlayers,connections); %% Hyperparameter settings miniBatchSize = 32; options = trainingOptions('adam', ... 'MiniBatchSize',miniBatchSize, ... 'GradientThreshold',1, ... 'InitialLearnRate',3e-4, ... % 'Verbose',false, ... 'Shuffle',"every-epoch",... 'MaxEpochs', 30); %
%% Modeling iteration & performance outputs i = 1; for i = 1:10 try net = trainNetwork(XTrain,YTrain,newlgraph,options); %% Validation, Test accuracy [Ypred, scores_test] = classify(net,XTest,... 'MiniBatchSize', miniBatchSize,... 'ExecutionEnvironment','auto'); C = confusionmat(YTest, Ypred); confusionchart(YTest,Ypred) [stats weightedf1] = statsOfMeasure(C); accuracy = stats.microAVG(8); % if isnan(weightedf1) % MCC = NaN; % else [c_matrix,Result,RefereceResult]= confusion.getMatrix(double(string(YTest))’ ,double(string(Ypred))'); MCC = Result.MatthewsCorrelationCoefficient; % end Afold(i,1) = accuracy; Afold(i,2) = weightedf1; Afold(i,3) = MCC; i = i+1; Perform_test = [accuracy weightedf1 MCC]; catch fprintf('loop number %d failed\n',i) end end indices(:,1) = find(Afold(:,1)==0); Afold(indices,:) = []; MeanLog_Test = mean(Afold,1,'omitnan'); end
4
REFERENCE
[1]
M. Luo et al., “Comparing machine learning algorithms in predicting thermal sensation using ASHRAE Comfort Database II,” Energy Build., vol. 210, p. 109776, Mar. 2020, doi: 10.1016/J.ENBUILD.2020.109776.
[2]
S. Lu, W. Wang, C. Lin, and E. C. Hameen, “Data-driven simulation of a thermal comfort-based temperature setpoint control with ASHRAE RP884,” Build. Environ., vol. 156, pp. 137–146, Jun. 2019, doi: 10.1016/j.buildenv.2019.03.010.
[3]
N. Somu, A. Sriram, A. Kowli, and K. Ramamritham, “A hybrid deep transfer learning strategy for thermal comfort prediction in buildings,” Build. Environ., vol. 204, p. 108133, Oct. 2021, doi: 10.1016/j.buildenv.2021.108133.
[4]
N. Gao, W. Shao, M. S. Rahaman, J. Zhai, K. David, and F. D. Salim, “Transfer learning for thermal comfort prediction in multiple cities,” Build. Environ., vol. 195, p. 107725, May 2021, doi: 10.1016/j.buildenv.2021.107725.
[5]
V. Földváry Ličina et al., “Development of the ASHRAE Global Thermal Comfort Database II,” Build. Environ., vol. 142, pp. 502–512, Sep. 2018, doi: 10.1016/J.BUILDENV.2018.06.022.
[6]
J. Langevin, P. L. Gurian, and J. Wen, “Tracking the human-building interaction: A longitudinal field study of occupant behavior in air-conditioned offices,” J. Environ. Psychol., vol. 42, pp. 94–115, 2015, doi: 10.1016/j.jenvp.2015.01.007.
30
Utilization of Machine Learning Algorithms for Predicting Thermal Comfort in Korean Climate
2
1. Introduction Korea HVAC
PMV
Korea HVAC
Prediction model
Occupants
•
The purpose of this study is to experimentally evaluate the performance of occupant thermal comfort prediction using machine learning algorithms
•
The thermal comfort data collected in the same climate zones as Korea were selected among internationally famous thermal comfort data sets.
•
The results of this study are valuable in that they can be used as reference data for the development of an occupant-centered thermal control system suitable for the climate specifically in Korea.
2. Research Methodology 2.1. Data Collection ü ASHRAE global thermal comfort database II: Published in 2018 and collected over 20 years by researchers worldwide, it is an extensive and systematic thermal comfort dataset[1]. It contains about 82,000 sets of thermal comfort data ü Medium US office dataset: Developed by Langevin et al. [3], it is a thermal comfort dataset collected over a long period of time from 24 participants at the Friends Center Office building in Philadelphia City, USA. • According to the Köppen climate classification, Korea includes a temperate climate (Cwa, Cfa) and a cold climate (Dwa, Dfa).
[1] Development of the ASHRAE global thermal comfort database II [2] The scales project, a cross-national dataset on the interpretation of thermal perception scales [3] Tracking the human-building interaction: A longitudinal field study of occupant behavior in air-conditioned offices
2. Research Methodology 2.2. Feature Selection
1
1. Indoor Temperature(℃) 2. Indoor Relative Humidity(%) 3. Indoor Mean Radiant Temperature(℃) 4. Indoor Air Velocity(m/s) 5. Clothing (clo) 6. Metabolic Rate (met)
2
1. Indoor Temperature(℃) 2. Indoor Relative Humidity(%) 3. Indoor Mean Radiant Temperature(℃) 4. Indoor Air Velocity(m/s) 5. Clothing (clo) 6. Metabolic Rate (met) 7. Outdoor Temperature(℃)
Feature: Input Variable
6
7
3
1. Indoor Temperature(℃) 2. Indoor Relative Humidity(%) 3. Indoor Mean Radiant Temperature(℃) 4. Indoor Air Velocity(m/s) 5. Clothing (clo) 6. Metabolic Rate (met) 7. Outdoor Temperature(℃) 8. Gender (-) 9. Age (-)
9
2. Research Methodology 2.3. Data Preprocessing • Missing Data elimination • Feature labels and scales conversions Data Features
Label
Method Outdoor Temp.
• ASHRAE: Monthly average • Medium office: Temperature at the time of survey
Gender
1: Female, 2: Male
Age
1: <20, 2: 21–30, 3: 31–40, and 4: 40+
TSV
• • • • • •
Scales reclassified on a 5-point scale -3~-1.5 è -2 -1.5~-0.5 è -1 -0.5 ~ 0.5 è 0 0.5 ~ 1.5 è 1 1.5~3 è 2
2. Research Methodology 2.4. Data Balancing • When training with an imbalance dataset, a model may not classify a small number of classes properly, meaning the prediction can become biased. Therefore, oversampling is required in advance to prevent this tendency. • Over sampling: A technique that purposely increases the number of samples by creating a synthetic data in the minority class. Common oversampling techniques include random resampling, SMOTE, edited nearest neighbors, TabularGAN, TableGAN, etc. • In this study, SMOTE was used.
2. Research Methodology 2.5. Machine Learning Algorithms 1. RF: The ensemble method is set to bagging, the learner type is decision tree, the maximum number of partitions is 371, and the number of learners is set to 30. 2. SVM: Several studies show that SVM with RBF kernel is effective for thermal comfort modeling. In this study, the RBF kernel (kernel function: Gaussian(same as RBF), kernel scale: auto) was used. 3. MLP: A similar study constructed the MLP with 2 fully connected layers with 64 neurons in each layer, and another study constructed 2 fully connected layers with 32, 256, 512 neurons in each layer. In this study, 2 fully connected layers with 64 were used. Batch size set to 200, max epoch set to 500, and iteration limit set to 1000.
2. Research Methodology 2.6. Training/Test data split • Dataset split: Training: 80% and Test: 20% • To avoid overfitting and to improve the performance, cross-validation accompanied during training. 10-fold cross validation was used. 80%
20%
Training dataset
Test dataset
72% Training dataset Training dataset … Training dataset Stratified 10-fold cross validation
8%
Validation dataset
2. Research Methodology 2.7. Performance Evaluation Metrics • Accuracy: The most basic and used prediction evaluation metric. However, not suitable for evaluating the performance of the classification model built with the imbalanced dataset. • Weighted F1-score: F1 score considering the size of the dataset. Suitable for evaluating the performance of the classification model based on the imbalanced dataset. Several studies related to thermal comfort modeling also use this indicator to evaluate the performance of the model [3,4] • Matthews correlation coefficient (MCC): Also, suitable for evaluating the performance of the classification model based on the imbalanced dataset [1,2]
[1] A hybrid deep transfer learning strategy for thermal comfort prediction in buildings [2] Heterogeneous transfer learning for thermal comfort modeling [3] Transfer learning for thermal comfort prediction in multiple cities [4] Adaptive behavior and different thermal experiences of real people: A Bayesian neural network approach to thermal preference prediction and classification
3. Experiment Result 3.1. Performance Results Thermal Comfort prediction based on various feature sets and algorithms Feature set
1
2
3
Algorithm
Accuracy (%)
F1-score (%)
MCC
RF
54.29
53.60
0.36
SVM
53.10
50.72
0.33
MLP
50.22
49.70
0.31
PMV
34.63
32.04
0.1
RF
56.43
55.82
0.39
SVM
55.23
53.30
0.36
MLP
51.13
50.83
0.33
RF
60.73
60.19
0.46
SVM
57.30
55.81
0.4
MLP
58.24
58.15
0.43
3. Experiment Result 3.1. Performance Results (Feature set 1) Confusion Matrix
PMV
Random forest
4. Conclusion Is it possible to create machine learning-based thermal comfort prediction models suitable for the Korean climate? • It was able to create a machine learning-based thermal comfort model suitable for the domestic climate, • although it is unfortunate that there is no large-scale data set systematically collected in Korea. How does the performance of machine learning-based models compare to PMV? • Machine learning models outperformed PMV. • The prediction performance is vastly improved when Including an additional features in addition to the PMV 6 factors. Which machine learning algorithm performs the best? • Overall, the random forest model had the best thermal comfort prediction performance. • In the future, I plan to conduct an field-test to collect my own thermal comfort datasets and utilize more complex models, such as deep learning algorithms.
Reference • • • •
• • • • • •
Xie, J., Li, H., Li, C., Zhang, J., & Luo, M. (2020). Review on occupant-centric thermal comfort sensing, predicting, and controlling. Energy and Buildings, 110392. Pang, Z., Chen, Y., Zhang, J., O'Neill, Z., Cheng, H., & Dong, B. (2020). Nationwide HVAC energy-saving potential quantification for office buildings with occupant-centric controls in various climates. Applied Energy, 279, 115727. Sun, K., Zhao, Q., & Zou, J. (2020). A review of building occupancy measurement systems. Energy and Buildings, 216, 109965. Tse, R., Monti, L., Im, M., Mirri, S., Pau, G., & Salomoni, P. (2020). DeepClass: edge based class occupancy detection aided by deep learning and image cropping. In Twelfth International Conference on Digital Image Processing (ICDIP 2020) (Vol. 11519, p. 1151904). International Society for Optics and Photonics. Monti, L., Mirri, S., Prandi, C., & Salomoni, P. (2019). Smart Sensing Supporting Energy-Efficient Buildings: On Comparing Prototypes for People Counting. In Proceedings of the 5th EAI International Conference on Smart Objects and Technologies for Social Good (pp. 171-176). Chen, C., Ruan, Y., & Liao, Z. (2018). iOccupancy: An Investigation of Online Occupancy-driven HVAC Control in Campus Classrooms. In Proceedings of the 1st ACM International Workshop on Smart Cities and Fog Computing (pp. 25-28). Dino, I. G., Kalfaoglu, E., Sari, A. E., Akin, S., Iseri, O., Alatan, A., ... & Erdogan, B. (2019). Video content analysis-based detection of occupant presence for building energy modelling. Advances in ICT in Design, Construction and Management in Architecture, Engineering, Construction and Operations, Northumbria University, 974-985. Meng, Y. B., Li, T. Y., Liu, G. H., Xu, S. J., & Ji, T. (2020). Real-time dynamic estimation of occupancy load and an airconditioning predictive control method based on image information fusion. Building and Environment, 173, 106741. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). Yang, J., Pantazaras, A., Chaturvedi, K. A., Chandran, A. K., Santamouris, M., Lee, S. E., & Tham, K. W. (2018). Comparison of different occupancy counting methods for single system-single zone applications. Energy and Buildings, 172, 221-234.
Collection and Analysis of Occupant Behavior in Thermal Environment Using Occupant Voting System : Observations of Office occupants specifically in Korean climate
3
1 •
Introduction
According to the global thermal comfort database developed in 2018, less than 80% of people are satisfied with the indoor thermal environment [1].
•
The occupant's thermal discomfort lowers productivity and sometimes leads to energy-consuming behaviors [2].
•
Recent research has paid attention to behavior of occupants to solve this problem.
•
Numerous active research on monitoring thermal conditions with respect to occupant behavior and controlling building thermal systems.
Study the occupant thermal behavior and condition in the offices in summer and analyze the data It can be used for future occupant thermal behavior prediction model development and occupant thermal behavior-based building system control.
[1] Ličina, V. F., Cheung, T., Zhang, H., De Dear, R., Parkinson, T., Arens, E., ... & Zhou, X. (2018). Development of the ASHRAE global thermal comfort database II. Building and Environment, 142, 502512.204,2021,108129, ISSN 0360-1323 [2] Kükrer, E., & Eskin, N. (2021). Effect of design and operational strategies on thermal comfort and productivity in a multipurpose school building. Journal of Building Engineering, 44, 102697.
45
2.1
Experiment settings Total Participants
Air Conditioner
12
Lab Location Data Collection •
Indoor thermal Env.
•
Occupant Behavior
2 offices in Seoul
Air Conditioner Air Conditioner
Experiment Duration 2021 July ~ 2021 September 10:00 am ~ 06:00 pm
Thermal Control
Air Conditioner
Manual control of air conditioners
Air Conditioner Air Conditioner
Testo 400 46
2.2
Indoor thermal environment measurement
Followed ASHRAE 55
[1]
Thermal measurement parameters • Air temperature, Mean Radiant Temperature, Relative Humidity, Air Velocity
Equipment Location •
Most populated spot
•
Center of the rooms
•
At least 1m away from the wall
Height •
Testo 400
Testo 400
Air temperature and average air speed measured at 0.6m height in the office
Testo 400
Measurement Period 5-minute time period for accurate measurements of indoor environments
Testo 400 Air Conditioner [1] ANSI/ASHRAE Standard 55-2017, Thermal Environmental Conditions for Human Occupancy, American Society of Heating, Refrigeration and Airconditioning Engineers, Inc., 2017.
Testo 400 47
2.2 Equipment
Indoor thermal environment measurement Parameter
compatibility
Measurement Period
Equipment
Parameter
Measurement Range • -20~ +60°C Air Temperature
compatibility
Measurement Period
Measurement Range: • 0 ~ +5 m/s
Accuracy • ±0.8 °C (-20 ~ 0°C) • ±0.5 °C (0 ~+60 °C)
Air Velocity
sensor resolution • 0.1 °C
Accuracy • ±(0.03 m/s + 4%) (0 ~ 5 m/s)
5 Minutes
Resolution • 0.01 m/s
Turbulence measurement probe (fixed cable)
5 Minutes Thermocouple Type K Range • 0~ +120°C
Measurement Range: • 0 ~ 100 %RH Testo 400 + Testo 605i – thermohydromet er operated via smartphone
Relative Humidity
Accuracy • 5 ~ 80 %RH: ±(1.8 %RH + 3%) at +25 °C Sensor resolution • 0.1 %RH
Mean Radiant Temperature
Ø 150mm black bulb temperature probe (Thermocouple K type)
Thermocouple Type K Accuracy • -40 to +1000°C (±1.5°C) or • ±t x 0.004
5 Minutes
48
2.3
Data collection Application of occupant behavior Voter’s Identity
Enter the voter's individual ID Age and gender can be used as variables ID can tell where a voter is in Office
Clothing Insulation (clo)
For more accurate values, input the clo amount value defined by ASHRAE 55 Add 0.05 to clo included in the chair
Metabolic(Activity) Rate
The list of activity for the behavior that occurs mainly in the office is entered.
Thermal comfort labels
Through TSV (Thermal Sensation Vote) and TP (Thermal Preference), the thermal preferences were identified. According to ASHRAE 55, TSV has 7-point scale, TP has 3-point scale
Following Behavior
When the TSV is greater than or less than 0, an individual's behavioral pattern is selected to increase the thermal comfort.
Occupant thermal comfort voting application developed by using Matlab
49
2.3
Data collection Application of occupant behavior
Categories of selected occupant behaviors
Personal activities
behaviors in a hot environment Activities that affect others
[1] Personal activities
Behaviors in a Cold Number of votes for thermal behavior through surveys [2]
Observed occupant behavior by category [3]
environment Activities that affect others
•
no special action
•
drink a cool drink
•
leave
•
undress
•
use of personal fan
•
speak to a colleague
•
air conditioning temperature control
•
Open or close windows/doors
•
blind adjustment
•
no special action
•
drink hot beverages
•
leave
•
speak to a colleague
•
air conditioning temperature control
•
Open or close windows/doors
•
blind adjustment
[1] Chen, C. F., De Simone, M., Yilmaz, S., Xu, X., Wang, Z., Hong, T., & Pan, Y. (2021). Intersecting heuristic adaptive strategies, building design and energy saving intentions when facing discomfort environment: A crosscountry analysis. Building and Environment, 204, 108129. [2] Langevin, J., Gurian, P. L., & Wen, J. (2015). Tracking the human-building interaction: A longitudinal field study of occupant behavior in air-conditioned offices. Journal of Environmental Psychology, 42, 94-115. [3] Langevin, J. (2019). Longitudinal dataset of human-building interactions in US offices. Scientific data, 6(1), 1-10.
50
2.4
Data Collection and Analysis Process
Comparison and Analysis
Voting system Data Storage
Field-measurement
3.1
Analysis of thermal preference PMV(classified) VS TSV 3
PMV/TSV -3
2
-1
0
1
2
3
SUM
-2
0
0
0
10
5
0
0
5
-1
0
1
11
144
33
2
0
191
0
0
0
3
25
399
99
7
1
534
-1
1
0
0
2
97
30
7
0
136
2
0
0
0
0
0
1
0
1
SUM
0
4
38
17
1
877
1
PMV
-2
-2
650 167
-3 -3
-2
-1
0
1
2
3
TSV •
The percentage of voting for 0 (Neutral) in the total TSV voting was 75.1(%), showing a concordance rate of about 50.2(%) with PMV.
•
There is a difference between PMV and actual occupants’ thermal sensation.
3.1
Analysis of thermal preference TSV(3-scaled) vs TP
2
TP
1
0
-1
TP/TSV
-1
0
1
SUM
-1
14
1
1
16
0
28
641
47
716
1
0
8
137
145
SUM
42
650
185
877
-2 -2
• •
-1
0
TSV
1
2
The individual’s thermal sensation (TSV) and preference (TP) match about 90% It shows that TSV and TP can be used together in occupant-centered temperature control
3.2
Occupant behavior in hot environment B
A %
%
17
%
53%
2
75
%
%
75
F
%
17
50% 25
%
100%
67
J %
%
K
% 40
No special action 특별한 행동 안함 개인of선풍기 use personal사용 fan 시원한 음료drink 마시기 drink a cool 100%
31
27%
L
40% 100%
%
46%
36%
%
20
16
12%
25%
22%
I
58%
H
G
11%
11
58%
1%
42%
6%
%
E 26%
Behaviors
18
17
D
C
Not hot
• • • •
People have different preferences for behavior Doing nothing was the highest with 35.7 (%) The rate of control through personal actions, such as using a cool drink or personal fan, was 51.7 (%). The behavior of actively controlling the temperature of the air conditioner was found to be much lower at 11.8 (%).
3.2
Occupant behavior in cold environment B
A
C
D
Behaviors 2.5%
40%
60% 100 %
F
100 %
100 %
E
6.3%
Not cold
Not cold
G
7.5%
H
16.3% 67.5%
J
K
100 %
29%
43%
L
15%
13%
74
%
No special action 특별한 행동 안함 Speak to a 말하기 colleague 동료에게
13%
14% 14%
100 %
100 %
100 %
100 %
I
75
자리 Leave뜨기 %
Not Cold
• • • •
People have different preferences for behavior For the behavior when it is cold, the behavior of doing nothing was the highest at 67.5 (%). Controlling the air conditioner temperature that affects others or talking to a co-worker accounted for 23.8 (%) Drinking a hot beverage or taking personal actions was relatively low at 8.8 (%).
A/C temperature control 에어컨 온도 조절 따뜻한 마시기 Drink hot음료 beverages
Behavior analysis over time TSV <0
TSV >0 35
45% 35%
20
30% 25%
15
20%
10 5 10
11
12
13
14
15
16
17
20
30%
15
20%
10%
5
25% 15% 10% 5%
0
0% 10
11
12
13
14
15
16
17
Time of Day
Time of Day
•
40% 35%
10
0%
45%
25
15% 5%
0
Frequency
40%
25
50%
30
Probability
Frequency
30
• •
35
50%
Probability
3.3
특별행동하지않음 Do no special action
개인적 조절 Personal actions
Do no special action 특별행동하지않음
Personal actions 개인적 조절
티인에게 영향 Affect others
TSV>0 투표율 Vote rate of TSV>0
타인에게 영향 Affect others
TSV<0 투표율 Vote rate of TSV>0
Analyzed voting and thermal behavior according to thermal and time change When it's hot (TSV>0), the number of votes reached highest when people just arrived at the offices. Additionally, they mainly chose to do personal activities such as using a personal fan or drinking a cool drink. In the case of cold (TSV<0) environment, the results show people normally took no special behavior.
4
Conclusion
Conclusion •
In this study, the behavior and condition of office occupants in summer of Korea were monitored and analyzed.
•
TSV showed the significant difference from PMV, but had a high concordance rate with TP.
•
When multiple people are present and thermally uncomfortable, the occupants have a strong tendency to do nothing.
•
Also, when it is hot, occupants tend to get rid of discomfort by taking personal actions.
•
The experiment was limited to summer and had a small number of participants.
•
In the long term, the behavior of more occupants will be tracked over a longer time period, and the experiment that systematically controls the indoor thermal environment based on the findings from this study will be conducted.
5
REFERENCE
[1] Ličina, V. F., Cheung, T., Zhang, H., De Dear, R., Parkinson, T., Arens, E., ... & Zhou, X. (2018). Development of the ASHRAE global thermal comfort database II. Building and Environment, 142, 502-512.204,2021,108129, ISSN 0360-1323 [2] Kükrer, E., & Eskin, N. (2021). Effect of design and operational strategies on thermal comfort and productivity in a multipurpose school building. Journal of Building Engineering, 44, 102697. [3] ANSI/ASHRAE Standard 55-2017, Thermal Environmental Conditions for Human Occupancy, American Society of Heating, Refrigeration and Airconditioning Engineers, Inc., 2017. [4] Chen, C. F., De Simone, M., Yilmaz, S., Xu, X., Wang, Z., Hong, T., & Pan, Y. (2021). Intersecting heuristic adaptive strategies, building design and energy saving intentions when facing discomfort environment: A cross-country analysis. Building and Environment, 204, 108129. [5] Langevin, J., Gurian, P. L., & Wen, J. (2015). Tracking the human-building interaction: A longitudinal field study of occupant behavior in airconditioned offices. Journal of Environmental Psychology, 42, 94-115. [6] Langevin, J. (2019). Longitudinal dataset of human-building interactions in US offices. Scientific data, 6(1), 1-10.