Research

Page 1

RESEARCH SUMMARIES :Indoor environment thermal comfort prediction using several machine learning applications

JULY – DECEMBER 2021 JONG JOO KIM

#ThermalComfort #MachineLearning #Data #MatLab

Contents 1.

Comparing the performance of data-driven prediction models using small-sized thermal comfort datasets

2.

Utilization of Machine Learning Algorithms for predicting Thermal Comfort in Korean Climate

3.

Collection and Analysis of Occupant Behavior in Thermal Environment Using Occupant Voting System 1


2021 KIEAE International Conference

Comparing the performance of data-driven prediction models using small-sized thermal comfort datasets

Jong Joo Kim 김종주

1

2


1

INTRODUCTION

1.1 Background

• Thermal comfort prediction: PMV vs Data-driven models [1] [2] Calculation- based Indoor Air Temperature

• Machine Learning

Indoor Radiant Mean Temperature

• •

Indoor Air Velocity Indoor Relative Humidity

Metabolic Rate Clothing

Random Forest (RF) Support Vector Machine (SVM)

• Deep Learning (neural network)

Indoor Environmental factors Personal factors

Multilayer neural network (MLP)

CNN_LSTM

• Transfer Learning (TL) + Deep

3

-3

-2

-1

Cold

Cool

Slightly Cool

0

+1

Neutral Slightly Warm

+2

+3

Warm

Hot

Fanger’s PMV model

Learning

Performance

• •

TL _MLP TL_CNN_LSTM

Data-driven models


1

INTRODUCTION

1.1 Background Transfer Learning Strategy [3] [4] • 2 neural network models for similar tasks a. Source Model b. Target Model

Transfer Learning Steps • Pre-train the first model (Source) on the vast data • Apply the knowledge gained into the new model (Target) for the desired, related task a. Transfer the task-related layers to the target model. 4


1

INTRODUCTION

1.1 Background Challenge • Use of thermal comfort dataset a. Thermal comfort data exhibits high subjectivity • Thermal comfort dataset collection a. Steady participation over a long time period

1.2 Missing Gap • Previous studies demonstrated their proposed data-driven models using large thermal comfort datasets. Question.

Are data-driven models still effective for small-sized thermal comfort datasets? 5

Thermal Comfort Dataset


1

INTRODUCTION

1.3 Research Objective • Extract small-sized thermal comfort datasets based on the experiment human subject ID • Apply the small-sized datasets into the data-driven prediction models and analyze the results through a series of comparisons.

1.4 Research Goal • Learn whether data-driven models are still effective with small-sized thermal comfort dataset

6


2

METHDOLOGY

Modeling Program

2.1 Algorithm Selection • Three algorithms known for high classification performance a. Random Forest (RF) b. Convolutional Neural Network Long Short-Term Memory Networks (CNN_LSTM) [4] c. Transfer learning Convolutional Neural Network Long Short-Term Memory Networks (TL_CNN_LSTM)

2.2 Thermal comfort dataset Collection • Medium US office dataset [5] a. Thermal comfort data collected from the Friends Center office building in Philadelphia , USA with 24 participants from 2012 to 2013. • ASHRAE Global Thermal comfort Database II [6] a. An extensive set of thermal comfort data collected from around the world over a period of several decades, released in 2018 b. Source dataset: Only used to pre-train a model for the transfer learning strategy 7


2

METHDOLOGY

2.3 Feature Selection Feature set 1 Indoor Air Temperature Indoor Relative Humidity

Features

Feature set 2

Clothing

Indoor Air Temperature

Clothing

Metabolic Rate

Indoor Relative Humidity

Metabolic Rate

Indoor RMT

Indoor RMT

Indoor Air Velocity

Indoor Air Velocity

*Thermal Sensation Vote

*Thermal Sensation Vote

Outdoor Temperature

*Thermal Sensation Vote (TSV): Thermal comfort Prediction response class Outdoor Environmental factors Indoor Environmental factors Personal factors

8

Units

ASHRAE

US Medium office

Indoor Air temperature

0.6 – 37

16.4 – 27.8

Indoor Relative Humidity

%

0.4 – 100

15.7 – 72.4

Indoor RMT

4 – 49.5

16.4 – 27.6

Indoor Air Velocity

m/s

0 – 56.2

0.02 – 0.19

Metabolic Rate

met

0.7 – 6.8

1.00 – 6.80

Clothing Insulation

clo

0.04 – 2.89

0.31 – 1.83

Outdoor Air Temperature

-18.4 – 45.1

-5 – 33


2

METHDOLOGY

2.4 Data-preprocessing •

Delete every row that has any missing entries

Data Conversion • Transform the continuous range to points categories • Scale conversion from 7-point to 5-point Continuous TSV (-3~+3) Class TSV (7-point scale) Class TSV (5-point scale)

9

-3

-2

-1

0

+1

+2

+3

-3

-2

-1

0

+1

+2

+3

-1

0

+1

-2

+2


2

METHDOLOGY

2.5 Small Dataset Extraction

Final Instances

Medium US office Dataset

A

133

4 participants’ datasets: A,B,C, and D

B

112

C

132

D

120

A

10

Dataset

B

C

D


2

METHDOLOGY

2.6 Dataset Split: Training and Test sets •

When splitting, keep the prediction class (TSV) distribution proportional Raw Dataset

11

50%

50%

Training Set

Test Set


2

METHDOLOGY

2.6 Dataset Split: Training set Partitions Training Set

50% 60% 70% 80% 90% 100%

• 12

Test Set

Fixed: 100%

Training set partitions are made to analyze the influences of changing the training set size.


2

METHDOLOGY

2.7 Experiment Setup • Training/ Test split X 10 Raw Dataset Training Set: 50%

Test Set: 50%

wF1

MCC

1st

50%

33%

0.2

2nd

42%

30%

0.23

10th

43%

32%

0.25

Average

45%

31%

0.24

Acc

13


2

METHDOLOGY

2.8 Performance Evaluation Metrics a. Accuracy b. weighted F1-score c. MCC

• Weighted F1-score and MCC are proper metrics to evaluate the imbalanced datasets

14


3

RESULTS

3.1 Experiment Results -1 • 6 features

Dataset

• Dataset: A, B

Train Data 50%

60%

70%

A 80%

90%

100%

15

Algorithm

ACC

wF1

MCC

PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest

28.9% 36.5% 35.4% 37.6% 28.9% 39.7% 39.0% 42.7% 28.9% 42.1% 39.8% 44.1% 28.9% 41.2% 39.6% 41.9% 28.9% 43.1% 41.1% 44.8% 28.9% 43.7% 41.0% 44.6%

35.2% 34.2% 37.6% 38.3% 38.7% 41.4% 40.4% 39.1% 42.9% 39.0% 39.0% 40.9% 41.5% 40.3% 44.1% 42.2% 39.9% 43.9%

0.22 0.20 0.22 0.25 0.25 0.26 0.27 0.25 0.29 0.26 0.25 0.25 0.28 0.25 0.29 0.29 0.26 0.29

Dataset

Train Data 50%

60%

70%

B 80%

90%

100%

Algorithm

ACC

wF1

MCC

PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest

34.0% 37.9% 37.2% 40.3% 34.0% 36.7% 34.3% 39.7% 34.0% 37.3% 36.8% 38.7% 34.0% 38.3% 37.8% 38.7% 34.0% 38.2% 36.3% 38.4% 34.0% 37.8% 36.0% 38.3%

36.8% 40.1% 42.1% 34.5% 34.6% 42.5% 34.0% 35.3% 36.5% 37.9% 38.2% 38.0% 35.3% 37.4% 36.6% 34.8% 38.9% 38.5%

0.16 0.19 0.24 0.15 0.13 0.22 0.17 0.18 0.19 0.22 0.18 0.19 0.21 0.19 0.19 0.22 0.21 0.21


3

RESULTS

3.1 Experiment Results -1 • 6 features

Dataset

• Dataset: C, D

Train Data 50%

60%

70%

C 80%

90%

100%

16

Algorithm PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest

ACC

wF1

MCC

30.0% 31.5% 30.9% 29.0% 30.0% 28.7% 27.3% 29.1% 30.0% 30.5% 31.4% 31.7% 30.0% 30.4% 29.9% 30.5% 30.0% 29.4% 30.5% 31.6% 30.0% 30.1% 30.6% 31.5%

30.6% 30.3% 29.7% 27.0% 27.8% 29.6% 32.7% 30.7% 30.0% 27.8% 28.1% 30.7% 29.5% 30.3% 31.5% 27.7% 28.7% 32.4%

0.18 0.18 0.16 0.16 0.16 0.16 0.20 0.16 0.18 0.15 0.17 0.17 0.17 0.19 0.18 0.18 0.17 0.19

Dataset

Train Data 50%

60%

70%

D 80%

90%

100%

Algorithm

ACC

wF1

MCC

PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest PMV TL_CNN_LSTM CNN_LSTM Random Forest

25.3% 39.9% 38.9% 35.2% 25.3% 43.4% 42.6% 38.6% 25.3% 42.3% 41.1% 36.5% 25.3% 43.9% 42.2% 36.8% 25.3% 46.9% 43.5% 36.2% 25.3% 46.8% 43.2% 36.2%

31.2% 39.2% 37.3% 39.6% 41.5% 37.7% 38.1% 45.4% 39.1% 41.2% 43.1% 38.6%

0.18 0.25 0.21 0.23 0.22 0.25 0.23 0.29 0.23 0.25 0.26 0.20


3

RESULTS Data A Prediction Performance

• Model Performance Comparison • Random Forest > Rest • Data-driven models > PMV

50.0%

Accuracy

: Dataset A

60.0%

40.0% 30.0% 20.0% 10.0%

• more training sets, more stabilized performance

0.0%

50%

60%

PMV

70%

TL_CNN_LSTM

80%

CNN_LSTM

90%

100%

Random Forest

60%

wF1-Score

50% 40% 30% 20% 10% 0%

17

50%

60%

TL_CNN_LSTM

70%

CNN_LSTM

80%

90%

Random Forest

100%


3

RESULTS MCC

/표준 /표준 /표준 /표준

MCC

/표준 /표준 /표준 /표준 /표준 /표준 /표준

50%

60%

70% TL_CNN_LSTM

18

80% CNN_LSTM

Random Forest

90%

100%


3 60%

Dataset A

wF1-Score

50% 40% 30% 20% 10% 0%

50%

60%

70%

TL_CNN_LSTM

CNN_LSTM

80%

90%

100%

90%

100%

Random Forest

Data A Prediction Performance 60%

MCC

Accuracy

50% 40% 30% 20% 10% 0%

50%

60% PMV

19

70%

TL_CNN_LSTM

80% CNN_LSTM

90%

Random Forest

100%

50%

60%

70%

TL_CNN_LSTM

CNN_LSTM

80% Random Forest


3

RESULTS

• Comparison between PMV and Data-driven models Dataset A with 100% training set partition

20

PMV

TL_CNN_LSTM


3

RESULTS

3.1 Experiment Results -2 • 7 features

Dataset

• Dataset: A, B

Train Data 50%

60%

70%

A 80%

90%

100%

21

Algorithm

ACC

wF1

MCC

TL_CNN_LSTM

39.2%

38.1%

0.24

CNN_LSTM Random Forest

36.4% 37.8%

36.8% 36.5%

0.23 0.22

TL_CNN_LSTM CNN_LSTM

41.1% 41.1%

40.1% 40.3%

0.26 0.26

Random Forest TL_CNN_LSTM

41.6% 40.6%

39.6% 39.0%

0.24 0.26

CNN_LSTM Random Forest

39.9% 41.8%

39.1% 40.4%

0.25 0.25

TL_CNN_LSTM CNN_LSTM

43.8% 42.3%

41.5% 41.7%

0.29 0.27

Random Forest TL_CNN_LSTM

44.1% 43.5%

42.2% 41.6%

0.27 0.28

CNN_LSTM Random Forest

42.2% 43.2%

40.9% 41.5%

0.26 0.26

TL_CNN_LSTM CNN_LSTM

43.1% 40.9%

41.3% 39.5%

0.28 0.25

Random Forest

43.6%

41.8%

0.26

Dataset

Train Data

ACC

wF1

MCC

50%

TL_CNN_LSTM CNN_LSTM

36.3% 35.4%

33.2% 37.8%

0.14 0.17

60%

Random Forest TL_CNN_LSTM CNN_LSTM

36.8% 41.4% 38.2%

39.0% 49.8% 32.6%

0.20 0.32 0.19

70%

Random Forest TL_CNN_LSTM CNN_LSTM

43.9% 41.4% 38.3%

43.6% 42.1% 39.8%

0.26 0.22 0.22

42.1% 42.8% 40.2%

42.4% -

0.26 -

80%

Random Forest TL_CNN_LSTM CNN_LSTM

34.2%

0.19

42.2% 43.5% 39.3%

37.4% -

0.16 -

90%

Random Forest TL_CNN_LSTM CNN_LSTM

40.7%

0.21

43.0% 42.6% 38.8%

43.8% -

0.26 -

100%

Random Forest TL_CNN_LSTM CNN_LSTM

40.1%

0.21

Random Forest

43.3%

43.6%

0.29

B

Algorithm


3

RESULTS

3.1 Experiment Results -2 • 7 features • Dataset: A, B

Dataset

Train Data 50%

60%

70%

C 80%

90%

100%

22

Algorithm

ACC

wF1

MCC

TL_CNN_LSTM

28.5%

29.7%

0.15

CNN_LSTM

27.8%

28.1%

0.17

Random Forest

27.5%

28.6%

TL_CNN_LSTM

32.7%

CNN_LSTM

Dataset

Train Data

ACC

wF1

MCC

TL_CNN_LSTM

36.1%

38.7%

0.19

CNN_LSTM

36.9%

40.9%

0.23

0.16

Random Forest

37.3%

40.6%

0.23

31.8%

0.17

TL_CNN_LSTM

40.2%

36.0%

0.20

31.3%

32.1%

0.18

CNN_LSTM

38.2%

37.2%

0.22

Random Forest

29.7%

28.3%

0.17

Random Forest

36.7%

38.0%

0.22

TL_CNN_LSTM

31.6%

28.4%

0.15

TL_CNN_LSTM

41.6%

37.4%

0.19

CNN_LSTM

30.6%

29.7%

0.18

CNN_LSTM

41.2%

43.9%

0.26

Random Forest

30.4%

30.6%

0.18

Random Forest

38.6%

39.2%

0.24

TL_CNN_LSTM

29.2%

31.9%

0.24

TL_CNN_LSTM

41.3%

45.8%

0.32

CNN_LSTM

29.9%

29.9%

0.17

CNN_LSTM

39.1%

40.2%

0.24

Random Forest

29.6%

29.7%

0.18

Random Forest

37.4%

39.4%

0.23

TL_CNN_LSTM

31.9%

27.7%

0.17

TL_CNN_LSTM

40.6%

42.7%

0.24

CNN_LSTM

29.6%

30.2%

0.17

CNN_LSTM

40.3%

43.5%

0.25

Random Forest

29.2%

28.8%

0.18

Random Forest

37.7%

38.8%

0.22

TL_CNN_LSTM

32.4%

32.8%

0.17

TL_CNN_LSTM

40.4%

44.6%

0.26

CNN_LSTM

30.1%

30.7%

0.16

CNN_LSTM

40.4%

42.0%

0.26

Random Forest

30.1%

28.9%

0.17

Random Forest

37.2%

37.3%

0.20

50%

60%

70%

D 80%

90%

100%

Algorithm


3

RESULTS Data A Prediction Performance

• Model Performance Comparison

60.0%

: Dataset A

• more training sets, more stabilized performance

Accuracy

• TL_CNN_LSTM > Random Forest with 7 features

50.0% 40.0% 30.0% 20.0% 10.0% 0.0%

50%

60%

70%

80%

90%

100%

50%

60%

70%

80%

90%

100%

60.0%

w F1-score

50.0% 40.0% 30.0% 20.0% 10.0% 0.0%

TL_CNN_LSTM

23

CNN_LSTM

Random Forest


3 MCC 0.50 0.45 0.40 0.35

MCC

0.30 0.25 0.20 0.15 0.10 0.05 0.00

50%

60%

70% TL_CNN_LSTM

24

80% CNN_LSTM

Random Forest

90%

100%


3

RESULTS 60%

w F1-score

Dataset A

50% 40% 30% 20% 10% 0%

50%

60% TL_CNN_LSTM

Data A Prediction Performance 60%

40%

MCC

Accuracy

50%

30% 20% 10% 0%

25

50%

60%

70%

80%

90%

100%

0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

50%

60%

70% CNN_LSTM

70%

80%

90%

100%

90%

100%

Random Forest

80%


3

RESULTS

3.2 Summary

• The random forest model achieved the highest performance overall but the TL_CNN_LSTM model showed slightly better performance with 7 features. •

Possible that TL_CNN_LSTM can perform better with more features.

Additionally, TL_CNN_LSTM shows better performance with imbalanced datasets

• No huge difference in performance between various training sets were observed, but more training set, the more performance becomes stabilized. • The entire data-driven model performances were not as meaningful as expected, but still achieved better performance than PMV.

26


4

CONCLUSION

4. Conclusion Question: Are data-driven models still effective for small size thermal comfort

datasets?

A: By nature of subjectivity of thermal comfort dataset, it is not yet likable to apply small-sized thermal comfort data for data-driven prediction modeling in practical use.

27


4

APPENDIX-1 (Running algorithms)

load("7US_ID_TSV.mat") rng('default') % For reproducibility %% ID i = 1; for k = 1:4 for ID = 1 if ID == 1 Dataset = ID3; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; elseif ID == 3 Dataset = ID8; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; elseif ID == 4 Dataset = ID14; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; else Dataset = ID16; hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); idxTest = test(hpartition); tblTest = Dataset(idxTest,:); Dataset = tblTrain; end

28

%% Data Partition for Data = 6 if Data == 1 hpartition = cvpartition(Dataset.TSV,'Holdout',0.5); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 2 hpartition = cvpartition(Dataset.TSV,'Holdout',0.4); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 3 hpartition = cvpartition(Dataset.TSV,'Holdout',0.3); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 4 hpartition = cvpartition(Dataset.TSV,'Holdout',0.2); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); elseif Data == 5 hpartition = cvpartition(Dataset.TSV,'Holdout',0.1); idxTrain = training(hpartition); tblTrain = Dataset(idxTrain,:); else tblTrain = tblTrain; end

% Classifiers for Mode =1 if Mode == 1 [MeanLog_Test] = TL_CNN_LSTM(tblTrain, tblTest); elseif Mode == 2 [MeanLog_Test] = CNN_LSTM(tblTrain, tblTest); elseif Mode == 3 [MeanLog_Test] = RF(tblTrain, tblTest); end Val_Log(i,:) = [k,ID,Data,Mode,MeanLog_Test(1), MeanLog_Test(2) , MeanLog_Test(3)]; i = i+1; end end end end


4

APPENDIX-2 (TL_CNN_LSTM modeling)

function [MeanLog_Test] = TL_CNN_LSTM(tblTrain, tblTest) training_raw = rmmissing(tblTrain); test_raw = rmmissing(tblTest); %% predictor_col = 1:6; response_col = 7; d = 1; % [training_oversampled] = training_raw; %% Standardization tMean = varfun(@mean, training_oversampled, 'InputVariables', @isnumeric); tSigma = varfun(@std, training_oversampled, 'InputVariables', @isnumeric); training_oversampled = helperNormalize(training_oversampled, tMean, tSigma); test = helperNormalize(test_raw, tMean, tSigma); %% Training, Validation for n = 1:(size(training_oversampled,1)-d+1) Time_train = training_oversampled{n+d-1,1}; predictor_train = (training_oversampled{n:n+d-1,predictor_col })'; respone_train = (training_oversampled{n+d-1,response_col})'; XTrain_pre(n,2) = {predictor_train}; YTrain_pre(n,2) = respone_train; end XTrain = XTrain_pre(:,2); YTrain = YTrain_pre(:,2); YTrain = categorical(YTrain); %% % Test dataset for n = 1:(size(test,1)-d+1) Time_test = test{n+d-1,1}; predictor_test = (test{n:n+d-1,predictor_col})'; respone_test = (test{n+d-1,response_col})’; XTest_pre(n,2) = {predictor_test}; YTest_pre(n,2) = respone_test; end XTest = XTest_pre(:,2); YTest = YTest_pre(:,2); YTest = categorical(YTest);

29

%% Network generation inputSize = size(predictor_col,2); numClasses = numel(categories(YTrain)); %% Load the trained source layer load('sourceNet.mat') % analyzeNetwork(net); inputSize = net.Layers(1).InputSize; numClasses = numel(categories(YTrain)); %% Transfer layers lgraph = layerGraph(net.Layers); newlayers = [ ... sequenceInputLayer(inputSize) convolution1dLayer(5,128,Padding="causal",Name='conv1d') reluLayer('Name','relu_1') layerNormalizationLayer('Name','layernorm') globalAveragePooling1dLayer('Name','globalavgpool1d') lstmLayer(256,'OutputMode','sequence','Name','lstm1') dropoutLayer(0.1,'Name','dropout_1') lstmLayer(256,'OutputMode','sequence','Name','lstm2') dropoutLayer(0.1,'Name','dropout_2') flattenLayer('Name','flatten') lgraph.Layers(11:17)]; %% newlayers(11:17) = freezeWeights(newlayers(11:17)); connections = lgraph.Connections; newlgraph = createLgraphUsingConnections(newlayers,connections); %% Hyperparameter settings miniBatchSize = 32; options = trainingOptions('adam', ... 'MiniBatchSize',miniBatchSize, ... 'GradientThreshold',1, ... 'InitialLearnRate',3e-4, ... % 'Verbose',false, ... 'Shuffle',"every-epoch",... 'MaxEpochs', 30); %

%% Modeling iteration & performance outputs i = 1; for i = 1:10 try net = trainNetwork(XTrain,YTrain,newlgraph,options); %% Validation, Test accuracy [Ypred, scores_test] = classify(net,XTest,... 'MiniBatchSize', miniBatchSize,... 'ExecutionEnvironment','auto'); C = confusionmat(YTest, Ypred); confusionchart(YTest,Ypred) [stats weightedf1] = statsOfMeasure(C); accuracy = stats.microAVG(8); % if isnan(weightedf1) % MCC = NaN; % else [c_matrix,Result,RefereceResult]= confusion.getMatrix(double(string(YTest))’ ,double(string(Ypred))'); MCC = Result.MatthewsCorrelationCoefficient; % end Afold(i,1) = accuracy; Afold(i,2) = weightedf1; Afold(i,3) = MCC; i = i+1; Perform_test = [accuracy weightedf1 MCC]; catch fprintf('loop number %d failed\n',i) end end indices(:,1) = find(Afold(:,1)==0); Afold(indices,:) = []; MeanLog_Test = mean(Afold,1,'omitnan'); end


4

REFERENCE

[1]

M. Luo et al., “Comparing machine learning algorithms in predicting thermal sensation using ASHRAE Comfort Database II,” Energy Build., vol. 210, p. 109776, Mar. 2020, doi: 10.1016/J.ENBUILD.2020.109776.

[2]

S. Lu, W. Wang, C. Lin, and E. C. Hameen, “Data-driven simulation of a thermal comfort-based temperature setpoint control with ASHRAE RP884,” Build. Environ., vol. 156, pp. 137–146, Jun. 2019, doi: 10.1016/j.buildenv.2019.03.010.

[3]

N. Somu, A. Sriram, A. Kowli, and K. Ramamritham, “A hybrid deep transfer learning strategy for thermal comfort prediction in buildings,” Build. Environ., vol. 204, p. 108133, Oct. 2021, doi: 10.1016/j.buildenv.2021.108133.

[4]

N. Gao, W. Shao, M. S. Rahaman, J. Zhai, K. David, and F. D. Salim, “Transfer learning for thermal comfort prediction in multiple cities,” Build. Environ., vol. 195, p. 107725, May 2021, doi: 10.1016/j.buildenv.2021.107725.

[5]

V. Földváry Ličina et al., “Development of the ASHRAE Global Thermal Comfort Database II,” Build. Environ., vol. 142, pp. 502–512, Sep. 2018, doi: 10.1016/J.BUILDENV.2018.06.022.

[6]

J. Langevin, P. L. Gurian, and J. Wen, “Tracking the human-building interaction: A longitudinal field study of occupant behavior in air-conditioned offices,” J. Environ. Psychol., vol. 42, pp. 94–115, 2015, doi: 10.1016/j.jenvp.2015.01.007.

30


Utilization of Machine Learning Algorithms for Predicting Thermal Comfort in Korean Climate

2


1. Introduction Korea HVAC

PMV

Korea HVAC

Prediction model

Occupants

The purpose of this study is to experimentally evaluate the performance of occupant thermal comfort prediction using machine learning algorithms

The thermal comfort data collected in the same climate zones as Korea were selected among internationally famous thermal comfort data sets.

The results of this study are valuable in that they can be used as reference data for the development of an occupant-centered thermal control system suitable for the climate specifically in Korea.


2. Research Methodology 2.1. Data Collection ü ASHRAE global thermal comfort database II: Published in 2018 and collected over 20 years by researchers worldwide, it is an extensive and systematic thermal comfort dataset[1]. It contains about 82,000 sets of thermal comfort data ü Medium US office dataset: Developed by Langevin et al. [3], it is a thermal comfort dataset collected over a long period of time from 24 participants at the Friends Center Office building in Philadelphia City, USA. • According to the Köppen climate classification, Korea includes a temperate climate (Cwa, Cfa) and a cold climate (Dwa, Dfa).

[1] Development of the ASHRAE global thermal comfort database II [2] The scales project, a cross-national dataset on the interpretation of thermal perception scales [3] Tracking the human-building interaction: A longitudinal field study of occupant behavior in air-conditioned offices


2. Research Methodology 2.2. Feature Selection

1

1. Indoor Temperature(℃) 2. Indoor Relative Humidity(%) 3. Indoor Mean Radiant Temperature(℃) 4. Indoor Air Velocity(m/s) 5. Clothing (clo) 6. Metabolic Rate (met)

2

1. Indoor Temperature(℃) 2. Indoor Relative Humidity(%) 3. Indoor Mean Radiant Temperature(℃) 4. Indoor Air Velocity(m/s) 5. Clothing (clo) 6. Metabolic Rate (met) 7. Outdoor Temperature(℃)

Feature: Input Variable

6

7

3

1. Indoor Temperature(℃) 2. Indoor Relative Humidity(%) 3. Indoor Mean Radiant Temperature(℃) 4. Indoor Air Velocity(m/s) 5. Clothing (clo) 6. Metabolic Rate (met) 7. Outdoor Temperature(℃) 8. Gender (-) 9. Age (-)

9


2. Research Methodology 2.3. Data Preprocessing • Missing Data elimination • Feature labels and scales conversions Data Features

Label

Method Outdoor Temp.

• ASHRAE: Monthly average • Medium office: Temperature at the time of survey

Gender

1: Female, 2: Male

Age

1: <20, 2: 21–30, 3: 31–40, and 4: 40+

TSV

• • • • • •

Scales reclassified on a 5-point scale -3~-1.5 è -2 -1.5~-0.5 è -1 -0.5 ~ 0.5 è 0 0.5 ~ 1.5 è 1 1.5~3 è 2


2. Research Methodology 2.4. Data Balancing • When training with an imbalance dataset, a model may not classify a small number of classes properly, meaning the prediction can become biased. Therefore, oversampling is required in advance to prevent this tendency. • Over sampling: A technique that purposely increases the number of samples by creating a synthetic data in the minority class. Common oversampling techniques include random resampling, SMOTE, edited nearest neighbors, TabularGAN, TableGAN, etc. • In this study, SMOTE was used.


2. Research Methodology 2.5. Machine Learning Algorithms 1. RF: The ensemble method is set to bagging, the learner type is decision tree, the maximum number of partitions is 371, and the number of learners is set to 30. 2. SVM: Several studies show that SVM with RBF kernel is effective for thermal comfort modeling. In this study, the RBF kernel (kernel function: Gaussian(same as RBF), kernel scale: auto) was used. 3. MLP: A similar study constructed the MLP with 2 fully connected layers with 64 neurons in each layer, and another study constructed 2 fully connected layers with 32, 256, 512 neurons in each layer. In this study, 2 fully connected layers with 64 were used. Batch size set to 200, max epoch set to 500, and iteration limit set to 1000.


2. Research Methodology 2.6. Training/Test data split • Dataset split: Training: 80% and Test: 20% • To avoid overfitting and to improve the performance, cross-validation accompanied during training. 10-fold cross validation was used. 80%

20%

Training dataset

Test dataset

72% Training dataset Training dataset … Training dataset Stratified 10-fold cross validation

8%

Validation dataset


2. Research Methodology 2.7. Performance Evaluation Metrics • Accuracy: The most basic and used prediction evaluation metric. However, not suitable for evaluating the performance of the classification model built with the imbalanced dataset. • Weighted F1-score: F1 score considering the size of the dataset. Suitable for evaluating the performance of the classification model based on the imbalanced dataset. Several studies related to thermal comfort modeling also use this indicator to evaluate the performance of the model [3,4] • Matthews correlation coefficient (MCC): Also, suitable for evaluating the performance of the classification model based on the imbalanced dataset [1,2]

[1] A hybrid deep transfer learning strategy for thermal comfort prediction in buildings [2] Heterogeneous transfer learning for thermal comfort modeling [3] Transfer learning for thermal comfort prediction in multiple cities [4] Adaptive behavior and different thermal experiences of real people: A Bayesian neural network approach to thermal preference prediction and classification


3. Experiment Result 3.1. Performance Results Thermal Comfort prediction based on various feature sets and algorithms Feature set

1

2

3

Algorithm

Accuracy (%)

F1-score (%)

MCC

RF

54.29

53.60

0.36

SVM

53.10

50.72

0.33

MLP

50.22

49.70

0.31

PMV

34.63

32.04

0.1

RF

56.43

55.82

0.39

SVM

55.23

53.30

0.36

MLP

51.13

50.83

0.33

RF

60.73

60.19

0.46

SVM

57.30

55.81

0.4

MLP

58.24

58.15

0.43


3. Experiment Result 3.1. Performance Results (Feature set 1) Confusion Matrix

PMV

Random forest


4. Conclusion Is it possible to create machine learning-based thermal comfort prediction models suitable for the Korean climate? • It was able to create a machine learning-based thermal comfort model suitable for the domestic climate, • although it is unfortunate that there is no large-scale data set systematically collected in Korea. How does the performance of machine learning-based models compare to PMV? • Machine learning models outperformed PMV. • The prediction performance is vastly improved when Including an additional features in addition to the PMV 6 factors. Which machine learning algorithm performs the best? • Overall, the random forest model had the best thermal comfort prediction performance. • In the future, I plan to conduct an field-test to collect my own thermal comfort datasets and utilize more complex models, such as deep learning algorithms.


Reference • • • •

• • • • • •

Xie, J., Li, H., Li, C., Zhang, J., & Luo, M. (2020). Review on occupant-centric thermal comfort sensing, predicting, and controlling. Energy and Buildings, 110392. Pang, Z., Chen, Y., Zhang, J., O'Neill, Z., Cheng, H., & Dong, B. (2020). Nationwide HVAC energy-saving potential quantification for office buildings with occupant-centric controls in various climates. Applied Energy, 279, 115727. Sun, K., Zhao, Q., & Zou, J. (2020). A review of building occupancy measurement systems. Energy and Buildings, 216, 109965. Tse, R., Monti, L., Im, M., Mirri, S., Pau, G., & Salomoni, P. (2020). DeepClass: edge based class occupancy detection aided by deep learning and image cropping. In Twelfth International Conference on Digital Image Processing (ICDIP 2020) (Vol. 11519, p. 1151904). International Society for Optics and Photonics. Monti, L., Mirri, S., Prandi, C., & Salomoni, P. (2019). Smart Sensing Supporting Energy-Efficient Buildings: On Comparing Prototypes for People Counting. In Proceedings of the 5th EAI International Conference on Smart Objects and Technologies for Social Good (pp. 171-176). Chen, C., Ruan, Y., & Liao, Z. (2018). iOccupancy: An Investigation of Online Occupancy-driven HVAC Control in Campus Classrooms. In Proceedings of the 1st ACM International Workshop on Smart Cities and Fog Computing (pp. 25-28). Dino, I. G., Kalfaoglu, E., Sari, A. E., Akin, S., Iseri, O., Alatan, A., ... & Erdogan, B. (2019). Video content analysis-based detection of occupant presence for building energy modelling. Advances in ICT in Design, Construction and Management in Architecture, Engineering, Construction and Operations, Northumbria University, 974-985. Meng, Y. B., Li, T. Y., Liu, G. H., Xu, S. J., & Ji, T. (2020). Real-time dynamic estimation of occupancy load and an airconditioning predictive control method based on image information fusion. Building and Environment, 173, 106741. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788). Yang, J., Pantazaras, A., Chaturvedi, K. A., Chandran, A. K., Santamouris, M., Lee, S. E., & Tham, K. W. (2018). Comparison of different occupancy counting methods for single system-single zone applications. Energy and Buildings, 172, 221-234.


Collection and Analysis of Occupant Behavior in Thermal Environment Using Occupant Voting System : Observations of Office occupants specifically in Korean climate

3


1 •

Introduction

According to the global thermal comfort database developed in 2018, less than 80% of people are satisfied with the indoor thermal environment [1].

The occupant's thermal discomfort lowers productivity and sometimes leads to energy-consuming behaviors [2].

Recent research has paid attention to behavior of occupants to solve this problem.

Numerous active research on monitoring thermal conditions with respect to occupant behavior and controlling building thermal systems.

Study the occupant thermal behavior and condition in the offices in summer and analyze the data It can be used for future occupant thermal behavior prediction model development and occupant thermal behavior-based building system control.

[1] Ličina, V. F., Cheung, T., Zhang, H., De Dear, R., Parkinson, T., Arens, E., ... & Zhou, X. (2018). Development of the ASHRAE global thermal comfort database II. Building and Environment, 142, 502512.204,2021,108129, ISSN 0360-1323 [2] Kükrer, E., & Eskin, N. (2021). Effect of design and operational strategies on thermal comfort and productivity in a multipurpose school building. Journal of Building Engineering, 44, 102697.

45


2.1

Experiment settings Total Participants

Air Conditioner

12

Lab Location Data Collection •

Indoor thermal Env.

Occupant Behavior

2 offices in Seoul

Air Conditioner Air Conditioner

Experiment Duration 2021 July ~ 2021 September 10:00 am ~ 06:00 pm

Thermal Control

Air Conditioner

Manual control of air conditioners

Air Conditioner Air Conditioner

Testo 400 46


2.2

Indoor thermal environment measurement

Followed ASHRAE 55

[1]

Thermal measurement parameters • Air temperature, Mean Radiant Temperature, Relative Humidity, Air Velocity

Equipment Location •

Most populated spot

Center of the rooms

At least 1m away from the wall

Height •

Testo 400

Testo 400

Air temperature and average air speed measured at 0.6m height in the office

Testo 400

Measurement Period 5-minute time period for accurate measurements of indoor environments

Testo 400 Air Conditioner [1] ANSI/ASHRAE Standard 55-2017, Thermal Environmental Conditions for Human Occupancy, American Society of Heating, Refrigeration and Airconditioning Engineers, Inc., 2017.

Testo 400 47


2.2 Equipment

Indoor thermal environment measurement Parameter

compatibility

Measurement Period

Equipment

Parameter

Measurement Range • -20~ +60°C Air Temperature

compatibility

Measurement Period

Measurement Range: • 0 ~ +5 m/s

Accuracy • ±0.8 °C (-20 ~ 0°C) • ±0.5 °C (0 ~+60 °C)

Air Velocity

sensor resolution • 0.1 °C

Accuracy • ±(0.03 m/s + 4%) (0 ~ 5 m/s)

5 Minutes

Resolution • 0.01 m/s

Turbulence measurement probe (fixed cable)

5 Minutes Thermocouple Type K Range • 0~ +120°C

Measurement Range: • 0 ~ 100 %RH Testo 400 + Testo 605i – thermohydromet er operated via smartphone

Relative Humidity

Accuracy • 5 ~ 80 %RH: ±(1.8 %RH + 3%) at +25 °C Sensor resolution • 0.1 %RH

Mean Radiant Temperature

Ø 150mm black bulb temperature probe (Thermocouple K type)

Thermocouple Type K Accuracy • -40 to +1000°C (±1.5°C) or • ±t x 0.004

5 Minutes

48


2.3

Data collection Application of occupant behavior Voter’s Identity

Enter the voter's individual ID Age and gender can be used as variables ID can tell where a voter is in Office

Clothing Insulation (clo)

For more accurate values, input the clo amount value defined by ASHRAE 55 Add 0.05 to clo included in the chair

Metabolic(Activity) Rate

The list of activity for the behavior that occurs mainly in the office is entered.

Thermal comfort labels

Through TSV (Thermal Sensation Vote) and TP (Thermal Preference), the thermal preferences were identified. According to ASHRAE 55, TSV has 7-point scale, TP has 3-point scale

Following Behavior

When the TSV is greater than or less than 0, an individual's behavioral pattern is selected to increase the thermal comfort.

Occupant thermal comfort voting application developed by using Matlab

49


2.3

Data collection Application of occupant behavior

Categories of selected occupant behaviors

Personal activities

behaviors in a hot environment Activities that affect others

[1] Personal activities

Behaviors in a Cold Number of votes for thermal behavior through surveys [2]

Observed occupant behavior by category [3]

environment Activities that affect others

no special action

drink a cool drink

leave

undress

use of personal fan

speak to a colleague

air conditioning temperature control

Open or close windows/doors

blind adjustment

no special action

drink hot beverages

leave

speak to a colleague

air conditioning temperature control

Open or close windows/doors

blind adjustment

[1] Chen, C. F., De Simone, M., Yilmaz, S., Xu, X., Wang, Z., Hong, T., & Pan, Y. (2021). Intersecting heuristic adaptive strategies, building design and energy saving intentions when facing discomfort environment: A crosscountry analysis. Building and Environment, 204, 108129. [2] Langevin, J., Gurian, P. L., & Wen, J. (2015). Tracking the human-building interaction: A longitudinal field study of occupant behavior in air-conditioned offices. Journal of Environmental Psychology, 42, 94-115. [3] Langevin, J. (2019). Longitudinal dataset of human-building interactions in US offices. Scientific data, 6(1), 1-10.

50


2.4

Data Collection and Analysis Process

Comparison and Analysis

Voting system Data Storage

Field-measurement


3.1

Analysis of thermal preference PMV(classified) VS TSV 3

PMV/TSV -3

2

-1

0

1

2

3

SUM

-2

0

0

0

10

5

0

0

5

-1

0

1

11

144

33

2

0

191

0

0

0

3

25

399

99

7

1

534

-1

1

0

0

2

97

30

7

0

136

2

0

0

0

0

0

1

0

1

SUM

0

4

38

17

1

877

1

PMV

-2

-2

650 167

-3 -3

-2

-1

0

1

2

3

TSV •

The percentage of voting for 0 (Neutral) in the total TSV voting was 75.1(%), showing a concordance rate of about 50.2(%) with PMV.

There is a difference between PMV and actual occupants’ thermal sensation.


3.1

Analysis of thermal preference TSV(3-scaled) vs TP

2

TP

1

0

-1

TP/TSV

-1

0

1

SUM

-1

14

1

1

16

0

28

641

47

716

1

0

8

137

145

SUM

42

650

185

877

-2 -2

• •

-1

0

TSV

1

2

The individual’s thermal sensation (TSV) and preference (TP) match about 90% It shows that TSV and TP can be used together in occupant-centered temperature control


3.2

Occupant behavior in hot environment B

A %

%

17

%

53%

2

75

%

%

75

F

%

17

50% 25

%

100%

67

J %

%

K

% 40

No special action 특별한 행동 안함 개인of선풍기 use personal사용 fan 시원한 음료drink 마시기 drink a cool 100%

31

27%

L

40% 100%

%

46%

36%

%

20

16

12%

25%

22%

I

58%

H

G

11%

11

58%

1%

42%

6%

%

E 26%

Behaviors

18

17

D

C

Not hot

• • • •

People have different preferences for behavior Doing nothing was the highest with 35.7 (%) The rate of control through personal actions, such as using a cool drink or personal fan, was 51.7 (%). The behavior of actively controlling the temperature of the air conditioner was found to be much lower at 11.8 (%).


3.2

Occupant behavior in cold environment B

A

C

D

Behaviors 2.5%

40%

60% 100 %

F

100 %

100 %

E

6.3%

Not cold

Not cold

G

7.5%

H

16.3% 67.5%

J

K

100 %

29%

43%

L

15%

13%

74

%

No special action 특별한 행동 안함 Speak to a 말하기 colleague 동료에게

13%

14% 14%

100 %

100 %

100 %

100 %

I

75

자리 Leave뜨기 %

Not Cold

• • • •

People have different preferences for behavior For the behavior when it is cold, the behavior of doing nothing was the highest at 67.5 (%). Controlling the air conditioner temperature that affects others or talking to a co-worker accounted for 23.8 (%) Drinking a hot beverage or taking personal actions was relatively low at 8.8 (%).

A/C temperature control 에어컨 온도 조절 따뜻한 마시기 Drink hot음료 beverages


Behavior analysis over time TSV <0

TSV >0 35

45% 35%

20

30% 25%

15

20%

10 5 10

11

12

13

14

15

16

17

20

30%

15

20%

10%

5

25% 15% 10% 5%

0

0% 10

11

12

13

14

15

16

17

Time of Day

Time of Day

40% 35%

10

0%

45%

25

15% 5%

0

Frequency

40%

25

50%

30

Probability

Frequency

30

• •

35

50%

Probability

3.3

특별행동하지않음 Do no special action

개인적 조절 Personal actions

Do no special action 특별행동하지않음

Personal actions 개인적 조절

티인에게 영향 Affect others

TSV>0 투표율 Vote rate of TSV>0

타인에게 영향 Affect others

TSV<0 투표율 Vote rate of TSV>0

Analyzed voting and thermal behavior according to thermal and time change When it's hot (TSV>0), the number of votes reached highest when people just arrived at the offices. Additionally, they mainly chose to do personal activities such as using a personal fan or drinking a cool drink. In the case of cold (TSV<0) environment, the results show people normally took no special behavior.


4

Conclusion

Conclusion •

In this study, the behavior and condition of office occupants in summer of Korea were monitored and analyzed.

TSV showed the significant difference from PMV, but had a high concordance rate with TP.

When multiple people are present and thermally uncomfortable, the occupants have a strong tendency to do nothing.

Also, when it is hot, occupants tend to get rid of discomfort by taking personal actions.

The experiment was limited to summer and had a small number of participants.

In the long term, the behavior of more occupants will be tracked over a longer time period, and the experiment that systematically controls the indoor thermal environment based on the findings from this study will be conducted.


5

REFERENCE

[1] Ličina, V. F., Cheung, T., Zhang, H., De Dear, R., Parkinson, T., Arens, E., ... & Zhou, X. (2018). Development of the ASHRAE global thermal comfort database II. Building and Environment, 142, 502-512.204,2021,108129, ISSN 0360-1323 [2] Kükrer, E., & Eskin, N. (2021). Effect of design and operational strategies on thermal comfort and productivity in a multipurpose school building. Journal of Building Engineering, 44, 102697. [3] ANSI/ASHRAE Standard 55-2017, Thermal Environmental Conditions for Human Occupancy, American Society of Heating, Refrigeration and Airconditioning Engineers, Inc., 2017. [4] Chen, C. F., De Simone, M., Yilmaz, S., Xu, X., Wang, Z., Hong, T., & Pan, Y. (2021). Intersecting heuristic adaptive strategies, building design and energy saving intentions when facing discomfort environment: A cross-country analysis. Building and Environment, 204, 108129. [5] Langevin, J., Gurian, P. L., & Wen, J. (2015). Tracking the human-building interaction: A longitudinal field study of occupant behavior in airconditioned offices. Journal of Environmental Psychology, 42, 94-115. [6] Langevin, J. (2019). Longitudinal dataset of human-building interactions in US offices. Scientific data, 6(1), 1-10.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.