Journal of Automation, Mobile Robotics and Intelligent Systems, vol. 14, no. 4 (2020)

Page 1

VOLUME 14, N° 4, 2020 www.jamris.org

Indexed in SCOPUS

Journal of Automation, Mobile Robotics and Intelligent Systems pISSN 1897-8649 (PRINT) / eISSN 2080-2145

(ONLINE)

WWW.JAMRIS.ORG  •  pISSN 1897-8649 (PRINT) / eISSN 2080-2145 (ONLINE)  •  VOLUME  14, N° 4, 2020

logo podstawowe skrót

Publisher: ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP

logo podstawowe skrót

ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP


Journal of Automation, Mobile Robotics and Intelligent Systems A peer-reviewed quarterly focusing on new achievements in the following fields: •  Fundamentals of automation and robotics  •  Applied automatics  •  Mobile robots control  •  Distributed systems  •  Navigation •  Mechatronic systems in robotics  •  Sensors and actuators  •  Data transmission  •  Biomechatronics  •  Mobile computing Editor-in-Chief

Typesetting

Janusz Kacprzyk (Polish Academy of Sciences, Łukasiewicz-PIAP, Poland)

PanDawer, www.pandawer.pl

Advisory Board

Webmaster

Dimitar Filev (Research & Advenced Engineering, Ford Motor Company, USA) Piotr Ryszawa (Łukasiewicz-PIAP, Poland) Ministry of Science Kaoru Hirota (Japan Society for the Promotion of Science, Beijing Office) and Higher Education Editorial Office Witold Pedrycz (ECERF, University of Alberta, Canada) Republic of Poland ŁUKASIEWICZ Research Network Co-Editors – Industrial Research Institute for Automation and Measurements PIAP Roman Szewczyk (Łukasiewicz-PIAP, Warsaw University of Technology, Poland) Al. Jerozolimskie 202, 02-486 Warsaw, Poland (www.jamris.org) Oscar Castillo (Tijuana Institute of Technology, Mexico) tel. +48-22-8740109, e-mail: office@jamris.org Marek Zaremba (University of Quebec, Canada) Ministry The reference version of the journal is e-version. Printed in 100 copies.

Executive Editor

Ministry of Science ´ Katarzyna Rzeplinska-Rykała, e-mail: office@jamris.org Poland) and Higher(Łukasiewicz-PIAP, Education

Associate Editor

Republic of Poland

´ (Poznan ´ University of Technology, Poland) Piotr Skrzypczynski

of Science

Articles are reviewed, excluding advertisements and descriptions of products. and Higher

Education If in doubt about the proper edition of contributions, for copyright and reprint Republic of Poland permissions please contact the Executive Editor. P ublishing of “Journal of Automation, Mobile Robotics and Intelligent Systems” – the task Ministry of Science financed under contract 907/P-DUN/2019 and Higher Education from funds of the Ministry of Science and Republic of Poland Higher Education of the Republic of Poland allocated to science dissemination activities.

Statistical Editor ´ Małgorzata Kaliczynska (Łukasiewicz-PIAP, Poland)

Ministry of Science and Higher Education Republic of Poland

Editorial Board:

Mark Last (Ben-Gurion University, Israel) Anthony Maciejewski (Colorado State University, USA) Krzysztof Malinowski (Warsaw University of Technology, Poland) Andrzej Masłowski (Warsaw University of Technology, Poland) Patricia Melin (Tijuana Institute of Technology, Mexico) Fazel Naghdy (University of Wollongong, Australia) Zbigniew Nahorski (Polish Academy of Sciences, Poland) Nadia Nedjah (State University of Rio de Janeiro, Brazil) Dmitry A. Novikov (Institute of Control Sciences, Russian Academy of Sciences, Russia) Duc Truong Pham (Birmingham University, UK) Lech Polkowski (University of Warmia and Mazury, Poland) Alain Pruski (University of Metz, France) Rita Ribeiro (UNINOVA, Instituto de Desenvolvimento de Novas Tecnologias, Portugal) Imre Rudas (Óbuda University, Hungary) Leszek Rutkowski (Czestochowa University of Technology, Poland) Alessandro Saffiotti (Örebro University, Sweden) Klaus Schilling (Julius-Maximilians-University Wuerzburg, Germany) Vassil Sgurev (Bulgarian Academy of Sciences, Department of Intelligent Systems, Bulgaria) Helena Szczerbicka (Leibniz Universität, Germany) Ryszard Tadeusiewicz (AGH University of Science and Technology, Poland) Stanisław Tarasiewicz (University of Laval, Canada) Piotr Tatjewski (Warsaw University of Technology, Poland) Rene Wamkeue (University of Quebec, Canada) Janusz Zalewski (Florida Gulf Coast University, USA) ´ Teresa Zielinska (Warsaw University of Technology, Poland)

Chairman – Janusz Kacprzyk (Polish Academy of Sciences, Łukasiewicz-PIAP, Poland) Plamen Angelov (Lancaster University, UK) Adam Borkowski (Polish Academy of Sciences, Poland) Wolfgang Borutzky (Fachhochschule Bonn-Rhein-Sieg, Germany) Bice Cavallo (University of Naples Federico II, Italy) Chin Chen Chang (Feng Chia University, Taiwan) Jorge Manuel Miranda Dias (University of Coimbra, Portugal) Andries Engelbrecht (University of Pretoria, Republic of South Africa) Pablo Estévez (University of Chile) Bogdan Gabrys (Bournemouth University, UK) Fernando Gomide (University of Campinas, Brazil) Aboul Ella Hassanien (Cairo University, Egypt) Joachim Hertzberg (Osnabrück University, Germany) Evangelos V. Hristoforou (National Technical University of Athens, Greece) Ryszard Jachowicz (Warsaw University of Technology, Poland) Tadeusz Kaczorek (Białystok University of Technology, Poland) Nikola Kasabov (Auckland University of Technology, New Zealand) ´ Marian P. Kazmierkowski (Warsaw University of Technology, Poland) Laszlo T. Kóczy (Szechenyi Istvan University, Gyor and Budapest University of Technology and Economics, Hungary) Józef Korbicz (University of Zielona Góra, Poland) ´ University of Technology, Poland) Krzysztof Kozłowski (Poznan Eckart Kramer (Fachhochschule Eberswalde, Germany) Rudolf Kruse (Otto-von-Guericke-Universität, Germany) Ching-Teng Lin (National Chiao-Tung University, Taiwan) Piotr Kulczycki (AGH University of Science and Technology, Poland) Andrew Kusiak (University of Iowa, USA)

Publisher: ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP

All rights reserved ©

Articles

1


Journal of Automation, Mobile Robotics and Intelligent Systems Volume 14, N° 4, 2020 DOI: 10.14313/JAMRIS/4-2020

Contents 3

45

An Adversarial Explainable Artificial Intelligence (XAI) Based Approach for Action Forecasting Vibekananda Dutta, Teresa Zielińska DOI: 10.14313/JAMRIS/4-2020/38 11

Features of Control Processes in Organizational-Technical (Technological) Systems of Continuous Type Igor Korobiichuk, Anatoliy Ladanyuk, Regina Boiko, Serhii Hrybkov DOI: 10.14313/JAMRIS/4-2020/39 18

Using Brain-Computer Interface Technology for Modeling 3D Objects in Blender Software Mateusz Zając, Szczepan Paszkiel DOI: 10.14313/JAMRIS/4-2020/40 25

Analysis of the Surrounding Environment Using an Innovative Algorithm Based on Lidar Data on a Modular Mobile Robot Nicola Ivan Giannoccaro, Takeshi Nishida DOI: 10.14313/JAMRIS/4-2020/41 35

Preface to Special Issue on Modern Intelligent Systems Concepts II Abdellah Idrissi DOI: 10.14313/JAMRIS/4-2020/42 37

Facial Emotion Recognition Using Average Face Ratios and Fuzzy Hamming Distance Khalid Ounachad, Mohamed Oualla, Abdelalim Sadiq, Abdelghani Souhar DOI: 10.14313/JAMRIS/4-2020/43

2

Articles

Towards a New Deep Learning Algorithm Based on GRU and CNN: NGRU Abdelhamid Atassi, Ikram el Azami DOI: 10.14313/JAMRIS/4-2020/44 48

Some Efficient Algorithms to Deal With Redundancy Allocation Problems Mustapha Es-Sadqi, Abdellah Idrissi, Ahlem Benhassine DOI: 10.14313/JAMRIS/4-2020/45 58

Convolutional Neural Networks for P300 Signal Detection Applied to Brain Computer Interface Mouad Riyad, Mohammed Khalil, Abdellah Adib DOI: 10.14313/JAMRIS/4-2020/46 64

Cloud-Based Sentiment Analysis for Measuring Customer Satisfaction in the Moroccan Banking Sector Using Naïve Bayes and Stanford NLP Anouar Riadsolh, Imane Lasri, Mourad ElBelkacemi DOI: 10.14313/JAMRIS/4-2020/47 72

A Dropout Predictor System in MOOCs Based on Neural Networks Khaoula Mrhar, Otmane Douimi, Mounia Abik DOI: 10.14313/JAMRIS/4-2020/48


Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

AN ADVERSARIAL EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI) BASED APPROACH FOR ACTION FORECASTING Submitted: 6th April 2020; accepted: 21st August 2020

Vibekananda Dutta, Teresa Zielińska DOI: 10.14313/JAMRIS/4‐2020/38 Abstract: Despite the growing popularity of machine learning technology, vision‐based action recognition/forecasting systems are seen as black‐boxes by the user. The effecti‐ veness of such systems depends on the machine learning algorithms, it is difficult (or impossible) to explain the de‐ cisions making processes to the users. In this context, an approach that offers the user understanding of these re‐ asoning models is significant. To do this, we present an Explainable Artificial Intelligence (XAI) based approach to action forecasting using structured database and object affordances definition. The structured database is sup‐ porting the prediction process. The method allows to vi‐ sualize the components of the structured database. Later, the components of the base are used for forecasting the nominally possible motion goals. The object affordance explicated by the probability functions supports the se‐ lection of possible motion goals. The presented methodo‐ logy allows satisfactory explanations of the reasoning be‐ hind the inference mechanism. Experimental evaluation was conducted using the WUT‐18 dataset, the efficiency of the presented solution was compared to the other ba‐ seline algorithms. Keywords: Action prediction, Explainable artificial intelli‐ gence, Object affordances, Structured database, Motion trajectories

1. Introduction In the real‑world scenarios forecasting the human action, before it is executed is a crucial problem. Such forecasting tool is needed for a wide range of applicati‑ ons in assistive, and social robotics.The recent events due to the global pandemic emphasized the role of ser‑ vice robots as health care assistants. Moreover, the robots, supporting the therapy of children with au‑ tism are employed to carry out social and assistive tasks, e.g., rewarding the person (by „musical dance” or „words of appreciation”) if they performed the ex‑ pected assignment (i.e., activities) without debacle. Such service robots are also useful for the therapy pro‑ viding the guidance to the caretaker for avoiding any abnormal activities which can cause potential hazards. When developing the safe real‑time human‑robot inte‑ raction (HRI) it must be predicted what a person will do next [9]. Such ability requires tools and methods describing the temporal structure of human actions. For this purpose, several approaches such as the pro‑ babilistic methods, machine learning, or deep learning methods are widely used. Since the decision making is

shifted from humans to machines, transparency and interpretability with reliable explanations are signi�i‑ cant for getting a human trust in intelligent systems, the easiness of systems debugging and managing the ethical problems. With such capability (and with the transparency to the users) [15], the intelligent sys‑ tems will be, able to plan ahead the responses with avoiding potential accidents or system faults.

Recent Machine Learning (ML) based intelligent systems are becoming increasingly complex, what ma‑ kes dif�icult to the users to understand their acti‑ ons [3]. Machine learning methods turn out to be un‑interpretable „black boxes”, which causes the pro‑ blems with concluding about these systems robust‑ ness and reliability [1, 11]. �xplainable Arti�icial Intel‑ ligence (XAI) is the method that is capable of explai‑ ning its own behaviour. XAI is known to have a positive in�luence on user trust in the understanding of the in‑ telligent systems [14]. Fig. 1 illustrates the difference between traditional and XAI based reasoning. XAI tho‑ rough the explanations makes the underlying infe‑ rence mechanism of an intelligent system transparent and interpretable for both: (a) the expert users (sy‑ stem developers) and (b) the non‑expert users (end‑ users) [16,18,19]. It is worth mentioning that, the con‑ cept of XAI follows the work�low of the conventional machine learning approach in the „learning stage”. Ho‑ wever, the „application stage” offers interpretability of the learning mechanism, e.g., the signi�icance of the applied features in the training stage, and how these features are mapped to the corresponding class label. Next, the explainability of the inference mechanism presents the in�luence of the decision system w.r.t. the selected features during classi�ications. Forecasting human actions is a dif�icult problem that requires expertise in the area of robotics and ar‑ ti�icial intelligence (AI). It involves the use of cognitive capabilities, e.g. perception, reasoning, prediction, le‑ arning, and planning, etc. and requires the semantics of the observed behaviour. The goal of this work is to create such capabilities for the robots to enhance their potentials to perform human service tasks and can help human beings with everyday activities. Ha‑ ving said that, such capabilities require human accep‑ tance (i.e., trust, ethics, so on). Since such robotic plat‑ forms are lies at the interaction of human‑robot inte‑ raction and machine intelligence. Therefore, the work concerning explainability and transparency of the pre‑ diction system is introduced to enhance human trust in the autonomous service robots.

3

3


Journal Journal of of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME N°44 2020 2020 VOLUME 14,14, N°

Today’s Machine Learning

Unexplained Questions:

Task Macine Learning Process Learning Stage

Training Data

Learned Relation

Why such decision? What can be the other option? How can i check the correctness of the decision? How much I can be confident that the decision is correct?

Decision Application stage

XAI

User Explainable Outcomes:

Task

Training Data

Macine Learning Process

I can see all stages of decision making. I know clearly why such decision was taken. I can easily correct the learned relation. I can easily detect the source (reason) of errors.

Explainable Explanation Inference Model

Learning Stage

Application stage

User

Fig. 1. The need for Explainable AI method in terms of transparency and interpretability [12]

Temporal Segmentation

Output Data

Feature Extraction

Data Spliting

Relevent Feature sets

Training

Data pre-processing Method

Learning Proces

Explainable Model

Calculated quantaties

Structured

Motion Parameters

Database

Proposed Explainable Model

Unsatisfactory decision rate

Explainable Inference

Testing

offline dataset

Dataset Selection WUT-18

Inference Machanism

Object affordance

Predicted Trajectory

Fig. 2. General concept of proposed explainable method

2. Motivation and Objectives

4

4

Our work is motivated by the needs of trust sy‑ stem behaviour. The explainable systems are needed for actions forecasting in robotic autonomous servi‑ cing and care‑giving. To address this challenge, we of‑ fer an adversarial �xplainable Arti�icial Intelligence (XAI) approach for forecasting human actions using structured database and object affordances. The pro‑ posed approach was investigated in a supervised set‑ ting. Comparing to our previous work [5], this paper fo‑ cuses on the conceptual framework of explainability (i.e., XAI) on the following aspects. First, we give the short description of the formalization of the problem. Second, the de�inition of the structured database (ex‑ plainable model) is proposed. Third, the object affor‑ dance explicated by the probability functions and rea‑ Articles

soning (summarized in the graphs) is detailed. Finally, the comparison of the proposed method with the re‑ sults obtained for other baseline methods is made. The remaining part of the paper discusses this con‑ tribution in detail. Section 3 describes the proposed method focusing on the concept of the structured da‑ tabase and the object affordance. Section 4 presents the experimental results. The paper ends with the con‑ clusions.

3. Proposed Method 3.1. General Definition

A human action is a state of doing. Since human action is a broader concept, for the sake of simplicity, only human actions involving objects are considered in our work. The proposed adversarial explainable ar‑


Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

ti�icial intelligence‑based action forecasting consists of two phases: (a) the training phase (creating the acti‑ ons model through gathering and processing the data with storing it in the database), and (b) the inference (prediction). The block diagram of the proposed met‑ hod is depicted in Fig. 2. Following [6], in this section, we give a general overview of the action prediction system. The scene is observed and the objects in the human vicinity are recognized. The objects are used as the discriminates for indicating which actions may be nominally taken by a human being. An action can be performed invol‑ ving some objects. For example, a bottle, a cup, a box is placed on a table, therefore, for an action „reaching” in‑ volved object can be a bottle, a table, a cup, a box. The‑ refore, the objects are used as the discriminates for in‑ dicating which actions will be nominally taken by the human being. First, we delineate the applied notations employed in this manuscript: (a) the capital letter (i.e. S, O) de‑ notes a set, (b) the small letter (i.e. s, a, o) denotes an element of a set, (c) the upper script denotes the as‑ signation, e.g., S a means that the set S is assigned to a, (d) the lower script denotes the concrete element. An action ai is an elementary transformation of the human state. Therefore, potentially involved object oai belongs to the set of all objects which can be involved in that action Oai (oai ∈ Oai ). The action is described by ai = ai (saini , saf iin , Oai ). If the speci�ic object op is in‑ volved in action ai , we denote it as ai (op ). Naturally, each action has its initial and �inal state, what is deno‑ ai ai ai i ∈ Sin , sf in ∈ Sfain . It is to ted by (saini , saf iin ), where sin ai be noted that Sin is a set of possible initial states and i is a set of possible �inal states. Introducing S ai = Sfain ai i , the expression ai = ai (saini , saf iin , Oai ) can Sin ∪ Sfain be rewritten as ai = ai (S ai , Oai ). When forecasting an action, we consider the sce‑ nario (observed scene). The scenario delivers the vo‑ cabulary. First, the objects are identi�ied making the elements of vocabulary which is used in our database. Considering the set O of observed objects, based on the expression ai = ai (S ai , Oai ), all possible actions are indicated. Lets O = (op , ow , oz ), where op ∈ Oai , ow ∈ Oam , ow ∈ Oaj , oz ∈ Oai , then the actions ai , aj and am will be indicated as possible. 3.2. Data Processing and Building the Structured Data‐ base (Explainable Model) Referring to Fig. 2, the �irst step of the proposed method is the preprocessing of the recorded obser‑ vations. The temporal segmentation and features ex‑ traction are made in this step. The goal of temporal segmentation of the video records is partitioned the recorded data for „discretized” and extracting of the relevant features from obtained data segments. We extract three groups of features: (a) human position hp , (b) object position op , and (c) attributes descri‑ bing human‑object interaction: distance d, angle θ, and edge e. The temporal distance d and angle θ repre‑ sent the distance and angular variable between hu‑ man hand and the object of interest. Edge e which is the normalized distance obtained as the distance from

VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

the camera to the human hands normalized by the dis‑ tance between the human hand and the object of in‑ terest. The description of the data processing phase is detailed in our previous work [4,10]. Here we are sum‑ marizing this step. The mean value µi and variance σi2 of temporal at‑ tributes (d, θ, e) is obtained during the preprocessing (training) phase. For each object which can be mani‑ pulated and for each action performed on it, the values of µd , µθ , µe , σd2 , σθ2 , σe2 are calculated. Next, the data‑ base is created, the base is consisting of action lists ta‑ king into account the possible involved objects, as it is illustrated in Fig. 3a. As we can see, for each object, is given the list of possible actions. The database con‑ tains also the parameters (µi , σi2 ) obtained from the recorded data segments [10]. With each addition of a new action or an object, the model must be additio‑ nally „trained” and the additional parameters (µi , σi2 ) associated with the action (object) must be obtained. The quantities (µi , σi2 ) are applied later as the para‑ meters of the affordance (probability) functions used for prediction of a human hand motion trajectory and the motion target location. It brings clear interpreta‑ bility of inference mechanism. During the application (or testing) phase once the objects are identi�ied and recognized in the camera �ield of view, the database is accessed. Let us assume, that the objects oz = z and ow = w are noticed (to shorten the notation, the fol‑ lowing abbreviations were introduced). The parame‑ ters of probability functions assigned to those objects are accessed (see Fig. 3a). Thereafter, considering the object affordance functions for ow and oz , the actions probability is obtained. The action with the highest probability is selected as the prediction. The possible future trajectories to the goal of interests are visuali‑ zed by Bezier curves. Fig. 3b illustrates the replicated representation of the database in graphical form re�lecting the expressi‑ ons ai = ai (saini , saf iin , Oai ) and ai = ai (S ai , Oai ). The nodes represent the human states illustrated by initial states (i.e., sin ) or �inal states (i.e., sf in ) and the ed‑ ges illustrate the transformations (actions ‑ a(.) ). This graph built out of the sequence of consecutive actions ai and ai+1 . Therefore, the �inal state sf in of a pre‑ vious action makes naturally an initial state sin of a next action. The example graph is made for actions depicted in Fig. 3a. In this example possible sequen‑ ces of actions w.r.t. the objects are: (a) {ai (v), aj (v)} performed on an object v as it is shown in Fig. 3a, (b) {ab (z), ac (z), ad (z)} performed on one object z, (c) {an (w), ak (w), ad (w), ap (w)} performed on an object w respectively. Expanding the graph means the proper update of the database with properly feeding it with objects, actions and calculated quantities collected du‑ ring the training phase [7]. 3.3. Object Affordances Representation (Interpretabi‐ lity of the Inference Mechanism)

In this section, we discuss the inference mecha‑ nism of forecasting the human actions. The inputs are the depth information and the video data. Once the object is recognized and the features (d, θ, e) are Articles

5

5


Journal Journal of of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME N°44 2020 2020 VOLUME 14,14, N°

Structured Database Object w an

2 µ(dHo σ(d Ho ,θ ,e ) an ,θan ,ean ) an an a

ak

2 µ(dHo σ(d Ho ,θ a ,θak ,eak ) a a

ap

2 µ(dHo σ(d Ho ,θ ,e ) ap ,θap ,eap ) ap ap ap

ad

2 µ(dHo σ(d Ho ,θ a ,θad ,ead ) a a

sin

n

k

d

k

d

,eak )

k

d

an (w)

ai (v) ab (z)

,ead )

sf in /sin

sf in /sin

sf in /sin

ad (z)

Object z

ac (z)

ab

2 µ(dHo σ(d Ho ,θ a ,θab ,eab ) a a

ac

2 µ(dHo σ(d Ho ,θ ,e ) ac ,θac ,eac ) ac ac a

ad

2 µ(dHo σ(d Ho ,θ a ,θad ,ead ) a a

b

b

b

sf in

ak (w)

sf in

aj (v)

,eab )

sf in

ap (w)

sf in /sin

sf in

c

d

d

ad (w) d

,ead )

sf in

Object.....

(a) Structured database

(b) Graph structure

Fig. 3. Structured database and its replicated representation in the graphical form [7] obtained, the set of actions associated with this ob‑ ject is considered (Fig. 3). Then the probability (value of affordance function) is calculated considering these actions. The affordance in our case results from the an‑ gular, distance, and edge preference considering the �inal state of an action (what in our case means the hu‑ man hand position on the end of action). For the sake of simplicity, we can say that during the human hand motion as the most possible object to be manipulated (this is associated with the action) will be indicated. Such object to which the current distance d, angle θ, and the edge e are closest to (µd , µθ , µe ). More pre‑ cisely, applied probability functions will be delivering the probability of reaching each of the objects of inte‑ rest providing for each of them probability created on the basis of current value of d, θ, e and the set of µd , µθ , µe , σd2 , σθ2 , σe2 . For each possible ai for which the pro‑ bability p(ak ) is biggest is calculated using the Eq. 1. This is an action selection. Such action ak is selected among all possible actions ai (i = 1, ..., n).

for d > 20cm ai {i=1,..,n} for d ≤ 20cm (1) In our case, the threshold 20cm was selected heuris‑ tically noticing that for the hand being farther than 20cm from all the objects, any object can be targeted. p(ak ) =

6

6

{

Articles

max

(pai (e) · pai (θ)) (pai (d) · pai (θ))

Therefore, in this case the probability concerning the edge „preference” p(e) (related to the easiness of mo‑ tion) w.r.t. the action ai is used. For the distance not bigger than 20cm the distance to the object is more relevant, therefore the probability considering the dis‑ tance „preference” p(d) instead of p(e) is used, p(θ) takes into account the angular position towards the object (details are given in [7]) which represents the action ai is contemplated. The forecasted trajectory is obtained using the parameterized cubic equation of the Bezier curve [7]. Detailed description of the above functions together with its validation is presented in our publication [7, 10].

4. Experimental Results

The proposed solution was evaluated using two methods: (a) a comparison with the state‑of‑the‑art baseline algorithms (model test), (b) quality of prog‑ nosis depending on the amount of transparency of the decision system (explainability test). The method was implemented using Intel Core i7 3.10GHz machine with 16 GB of RAM, with 64‑bit Li‑ nux operating system. The C++ and python program‑ ming language (along with TensorFlow, Keras Packa‑ ges) were used as a programming means. We created publicly available dataset (named as WUT‑18) of the following daily activities: drinking wa‑


VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

ROC (ANN)

Correct answer

True positive fraction

Confusion matrix (ANN)

Predicted answer

False positive fraction (b) ANN

ROC (DNN)

Correct answer

True positive fraction

Confusion matrix (DNN)

Predicted answer

False positive fraction (d) DNN

ROC (Our)

Correct answer

True positive fraction

Confusion matrix (Our)

False positive fraction (e) Our Model

Predicted answer

the color scale depicting number of tested examples

(c) Deep Neural Network

the color scale depicting number of tested examples

(a) Artificial Neural Network

the color scale depicting number of tested examples

Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

(f) Proposed Method

Fig. 4. The ROC curves (a, c, e) together with confidence intervals on the level of 95%. For better reference the straight dotted lines indicate the equal fraction of false and correct responses. The error matrices (b, d, f) for our method and the other baseline algorithms

Articles

7

7


Journal Journal of of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME N°44 2020 2020 VOLUME 14,14, N°

scenario frame = 573 objects = [table, phone]

Phone:0.84

action = [table] reaching = 0.87 placing = 0.19 (a) Failed scenario

action = [phone] reaching = 0.0 placing = 0.0 talking = 0.0

(b) Decision tree

scenario frame = 392 objects = [montr, comp.]

montr:0.82 comp.:0.89

action = [montr] reaching = 0.91 pressing = 0.0 (c) Successful scenario

action = [comp.] reaching = 0.87 trun on = 0.0

(d) Decision tree

Fig. 5. Experimental results of different scenarios: (a) misclassified action, (c) correctly classified action. Results of the explanation process for inference mechanism using decision trees are depicted in b and d respectively

8

8

ter, activating computer, talking to phone, etc. These activities were performed by 6 participants in 3 dif‑ ferent settings (a) an of�ice, (b) a home, (c) a kitchen. The participants had neither prior knowledge of the purpose of the study nor instructions how to perform each activity. The data sets were collected under RGB‑ D settings, at the rate of 60fps. The cameras range for human observations was �ixed and covered the rele‑ vant space. Evaluating the proposed method we used the ben‑ chmark WUT‑18 human activity dataset, and compa‑ red our method with two baseline algorithms: (a) Ar‑ ti�icial Neural Network (ANN) based approach [13], and (b) Deep Neural Network (DNN) based appro‑ ach [17]. Based on the trials and errors, we selected the parameters needed in the methods used for com‑ parisons. The following parameters of ANN were de‑ �ined: two hidden layers with 784 nodes each, activa‑ tion function (ReLu, Softmax), optimizer (Adam), ba‑ Articles

tch size (64), epochs (500), loss function (categorical‑ crossentropy). Similarly, in DNN the chosen parame‑ ters were: four hidden layers with following amount of nodes (560, 560, 256, 120), activation function (ReLu, Softmax), optimizer (Adam), batch size (64), epochs (500), loss function (categorical‑crossentropy). The description of the parameters used in the baseline al‑ gorithms is given in [2, 8]. All these algorithms were tested using the same observation dataset. Evaluation results with respect to Receiver Operating Characteristic (ROC) curves and confusion matrices are shown in Fig. 4. True positive fraction means the number of correct responses nor‑ malised over the number of all samples(the decision is yes and the true response is yes as well), false positive fraction means the number of wrong responses nor‑ malised over the number of all samples (the decision is yes but should be not). As it is seen in Fig. 4a, 4c, 4e the amount of the correct decisions is signi�icantly gre‑


Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

ater than the wrong decisions. Fig. 4b, 4d, 4f is showing the so‑called confusion matrix. The matrix contains the normalised numbers of predicted answers taking into account the correct answers. The color scale is visually depicting the num‑ ber of samples for each result. The best accuracy was achieved for drinking action and the worst accuracy was achieved for placing action. Fig. 5 shows the example images (left‑hand side) together with visualization of the prediction process (right‑hand side). This �igure justi�ies the applied ex‑ plainable approach which is visualizing the forecas‑ ting process. Fig. 5 illustrates the prediction for both: correct prediction and failed scenario. As it is seen the inference mechanism is commented displaying the correct decision. All objects considered as possible for performing an action are given (list – objects), moreo‑ ver in each time instant the names of all possible acti‑ ons are displayed (list – actions).

5. Conclusion

Due to the inability of explaining the decisions and actions, non‑transparent machine learning algorithms should not be directly used in critical applications such as assistive robots and servicel robots. The wrong decisions of the system can result in harmful conse‑ quences. An explainability is needed when addressing such problems. To do this, an adversarial Explainable Arti�icial Intelligence (�AI) based method was propo‑ sed and discussed in this paper, the emphasis is laid on the explainability in the training and application sta‑ ges. The paper concerns very important and up‑to‑ date problem of the broadly perceived arti�icial intel‑ ligence, namely the so‑called explainable AI in which the reasoning process, analyses and actions underta‑ ken are clearly visible and understandable for the hu‑ man being. In such a way the models and procedures, as well as the results obtained, are trustworthy and hence much easier implementable. This paper can be viewed as proposing a conceptual framework and its proof, but not a complete �inal implementation. The applied method was tested using a benchmark dataset WUT‑18. Fig 5 delineates the use of propo‑ sed probabilistic approach in conjunction with explai‑ nability and interpretability. It offers enhancement in the transparency of the prediction system which ma‑ kes our solution more comprehensive to the end‑user. From the series of conducted experiments, it is also inferred that the proposed approach provides a sig‑ ni�icant improvement in terms of evaluation metrics when validated against pre‑speci�ied testing sets. Mo‑ reover, the following statistical signi�icance tests are depicted in Fig. 4, we came to the verdict that the sug‑ gested approach outperforms the state‑of‑the‑art ba‑ seline machine learning classi�iers. Brie�ly, the propo‑ sed framework can eliminate the challenge of provi‑ ding transparency of the decision system and offer an acceptable accuracy to forecast human actions. Obtai‑ ned results were quanti�ied and the method was vali‑ dated as satisfactory. For selecting the possible actions

VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

we considered the probability functions based on the normal distributions. We expect to broaden the scope of applications focusing on the needs of a wide range of possible end‑users.

AUTHORS

Vibekananda Dutta∗ – Institute of Microme‑ chanics and Photonics, Faculty of Mechatro‑ nics, Warsaw University of Technology, ul. S� w. Andrzeja Boboli 8, 02‑525 Warsaw, Po‑ land, e‑mail: vibek@meil.pw.edu.pl, www: https://ztmir.meil.pw.edu.pl/web/Pracownicy/dr‑ Vibekananda‑Dutta. Teresa Zielińska – Institute of Aeronautics and Applied Mechanics, Faculty of Power and Ae‑ ronautical Engineering, Warsaw University of Technology, ul.Nowowiejska 24, 00‑665 Warsaw, Poland, e‑mail: teresaz@meil.pw.edu.pl, www: https://ztmir.meil.pw.edu.pl/web/Pracownicy/prof.‑ Teresa‑Zielinska. ∗

Corresponding author

ACKNOWLEDGEMENTS The research was supported by the funds of the Insti‑ tute of Aeronautics and Applied Mechanics, Faculty of Power and Aeronautical Engineering, Warsaw Univer‑ sity of Technology. The work on this manuscript and part of the research was also supported by the Prelu‑ dium 11 (Grant No. 2016/21/ N/ST7/ 01614) funded by National Science Center (NCN), Poland.

REFERENCES

[1] S. Anjomshoae, A. Najjar, D. Calvaresi, and K. Fr‑ ä mling, “Explainable Agents and Robots: Results from a Systematic Literature Review”. In: Procee‑ dings of the 18th International Conference on Au‑ tonomous Agents and MultiAgent Systems, Rich‑ land, SC, 2019, 1078–1088, Montreal QC, Canada. [2] F. Chollet, Deep Learning mit Python und Keras: Das Praxis‑Handbuch vom Entwickler der Keras‑ Bibliothek, MITP Verlags GmbH: Frechen, 2018.

[3] M. G. Core, H. C. Lane, M. van Lent, D. Gomboc, S. Solomon, and M. Rosenberg, “Building explai‑ nable arti�icial intelligence systems”. In: Procee‑ dings of the 21st National Conference on Arti�i‑ cial Intelligence and the 18th Innovative Applica‑ tions of Arti�icial Intelligence Conference� AAAI‑ 06/IAAI‑06, 2006, 1766–1773.

[4] V. Dutta and T. Zielinska, “Action prediction ba‑ sed on physically grounded object affordances in human‑object interactions”. In: 2017 11th International Workshop on Robot Motion and Control (RoMoCo), 2017, 47–52, 10.1109/Ro‑ MoCo.2017.8003891. [5] V. Dutta and T. Zielinska, “Action based acti‑ vities prediction by considering human‑object relation”, Prace Naukowe Politechniki Warszaw‑ skiej. Elektronika, vol. 196, 2018. Articles

9

9


Journal Journal of of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

[6] V. Dutta and T. Zielinska, “Activities Prediction Using Structured Data Base”. In: 2019 12th International Workshop on Robot Motion and Control (RoMoCo), 2019, 80–85, 10.1109/Ro‑ MoCo.2019.8787354. [7] V. Dutta and T. Zielinska, “Prognosing Human Activity Using Actions Forecast and Structured Database”, IEEE Access, vol. 8, 2020, 6098–6116, 10.1109/ACCESS.2020.2963933. [8] V. Dutta, M. Choraś , M. Pawlicki, and R. Kozik, “A Deep Learning Ensemble for Network Anomaly and Cyber‑Attack Detection”, Sensors, vol. 20, no. 16, 2020, 4583, 10.3390/s20164583.

[9] V. Dutta and T. Zielinska. “Predicting the Inten‑ tion of Human Activities for Real‑Time Human‑ Robot Interaction (HRI)”. In: A. Agah, J.‑J. Cabibi‑ han, A. M. Howard, M. A. Salichs, and H. He, eds., Social Robotics, vol. 9979, 723–734. Springer, Cham, 2016, 10.1007/978‑3‑319‑47437‑3_71.

[10] V. Dutta and T. Zielinska, “Predicting Human Actions Taking into Account Object Affordances”, Journal of Intelligent & Robotic Systems, vol. 93, no. 3‑4, 2019, 745–761, 10.1007/s10846‑018‑ 0815‑7. [11] R. Goebel, A. Chander, K. Holzinger, F. Lecue, Z. Akata, S. Stumpf, P. Kieseberg, and A. Holzin‑ ger. “Explainable AI: The New 42?”. In: A. Holzin‑ ger, P. Kieseberg, A. M. Tjoa, and E. Weippl, eds., Machine Learning and Knowledge Extraction, vo‑ lume 11015, 295–303. Springer, Cham, 2018. [12] D. Gunning, “Explainable arti�icial intelligence (XAI)”, Defense Advanced Research Projects Agency (DARPA), nd Web, vol. 2, 2017.

[13] M. T. Hagan, H. B. Demuth, and M. Beale, Neural Network Design, PWS Publishing Co.: USA, 1997. [14] M. Harbers, J. Broekens, K. van den Bosch, and J.‑J. Meyer, “Guidelines for Developing Explaina‑ ble Cognitive Models”. In: Proceedings of the 10th International Conference on Cognitive Modeling (ICCM 2010), 2010, 85–90.

[15] T. Lan, T.‑C. Chen, and S. Savarese. “A Hierarchical Representation for Future Action Prediction”. In: D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds., Computer Vision – ECCV 2014, volume 8691, 689–704. Springer, Cham, 2014, 10.1007/978‑ 3‑319‑10578‑9_45. [16] Z. C. Lipton, “The mythos of model interpretabi‑ lity”, Queue, vol. 16, no. 3, 2018, 31–57.

[17] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A survey of deep neural net‑ work architectures and their applications”, Neurocomputing, vol. 234, 2017, 11 – 26, https://doi.org/10.1016/j.neucom.2016.12.038.

10

10

[18] T. Miller, “Explanation in arti�icial intelli‑ gence: Insights from the social sciences”, Arti�icial Intelligence, vol. 267, 2019, 1–38, 10.1016/j.artint.2018.07.007. Articles

VOLUME N°44 2020 2020 VOLUME 14,14, N°

[19] M. T. Ribeiro, S. Singh, and C. Guestrin, “”Why Should I Trust You?”: Explaining the Predictions of Any Classi�ier”. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016, 1135–1144, 10.1145/2939672.2939778.


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Features of Control Processes in Organizational-Technical (Technological) Systems of Continuous Type Submitted: 28th February 2020; accepted: 17th August 2020

Igor Korobiichuk, Anatoliy Ladanyuk, Regina Boiko, Serhii Hrybkov DOI: 10.14313/JAMRIS/4-2020/39 Abstract: Technological complexes of various industries are characterized by certain modes of operation (technological regulations), which correspond to the set of variables of different nature, which have a high-dynamics of change and determine the main technical and economic performance of the object. The aim of the research is to identify information software approaches to support decision-making in organizational-technical (technological) systems. Research results are obtained through grouping, generalization and comparison methods. The scientific significance of the results are to determine the objective need to use intelligent decision support subsystems to quickly manage complex organizational-technical systems based on both: clear and formalized data and knowledge and high-quality fuzzy estimates. Keywords: sugar production, control object, technological monitoring, automated complexes, uncertainty of information, neuro-fuzzy networks

1. Introduction For today information technologies have radically Organizational-technical (technological) systems of continuous type are distinguished by the fact that for the implementation of technological processes they support significant flows of raw materials, energy and energy resources, which in turn generates significant amounts of information that must be processed promptly to obtain effective management actions during preset operating periods, which may be weeks, months, seasons, etc [1]. Most food, processing, chemical and other industries operate in this mode. At the same time an improvement of information support significantly changes the modes of operation, increasing the resource- and energy efficiency of the enterprise [2–4]. One of the trends in the development of modern theory and practice of managing complex objects is the allocation in a separate class of the organizational-technical (technological) systems (OTS), which include both technological complexes, and the enterprise as a whole [5–8]. In fact, the most effective approach to ensuring the operational management of OTS is the development and use of intelligent systems of various levels and purposes. In the structure of intelligent control systems (ICS) one can use the

most advanced methods and methods of forming control actions at different levels of the hierarchy: on the lower (executive) the standard automatic regulators function, the most used is the PID-regulator, which in recent years is supplemented by fuzzy (logic) regulators, which use, for example, production rules such as “IF ... THEN”. In the general structure of the ICS, there are necessarily databases and knowledge bases, for which the technological monitoring subsystems are used and facilities for the operative assessment of the object’s state [9], and for the identification of the interconnections between the process variables – neural network. Technological complexes of various industries are characterized by certain modes of operation (technological regulations), which correspond to the set of variables of different nature, which have a highdynamics of change and determine the main technical and economic performance of the object. The aim of the research is to identify information software approaches to support decision-making in organizational-technical (technological) systems based on clear and formalized data and knowledge, as well as high-quality fuzzy estimates.

2. Materials and Methods of Research The authors investigated the works of [5–9] devoted to information software approaches to support decision-making in organizational-technical (technological) systems. Tool environment Matlab and the Fuzzy Logic Toolbox extension (fuzzy logic package) were used, which includes the subsystem for the development of neuro-fuzzy structures ANFIS (Adaptive-NetworkBased Fuzzy Inference System). Neuro-fuzzy networks use differentiated realizations of triangular norms (multiplication and probabilistic OR), as well as smooth membership functions. This allows to apply fast neural network learning algorithms to configure neural-fuzzy networks based on reverse error propagation method. The architecture and rules for each layer of the ANFIS network are described below. ANFIS implements Sugeno’s fuzzy inference system as a five-layer direct propagation neural network. The purpose of the layers is as follows: the first layer – is the terms of the input variables; the second layer – antecedents (parcels) of fuzzy rules; the third layer

11


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

is the normalization of the degrees of implementation of the rules; the fourth layer – the conclusion of the rules; fifth layer – aggregation of the result obtained by different rules. Network inputs to a separate layer are not highlighted. For the linguistic evaluation of the input variable x1– 3 terms are used, for the variable x2 – 3 terms, for the variable x3 – 3 terms. The following notations are required for further presentation: x1, x2, ..., xn – network inputs; y – network output; R1 : IF x1 = a1, r AND...AND xn = an, r THEN y = b0, r + b1, r  x1 + ... + bn,r xn – fuzzy rule with ordinal number r; m – number of rules, r = 1, m; an,r – fuzzy term with membership function μr (xi), used for linguistic evaluation of a variable xi in r-thrule = (r 1,= m, i 1, n) ; bq r – real numbers in the output of ,  = (r 1,= m), q 0, n) . r-thrule The ANFIS network works like this. Layer 1. Each node of the first layer represents one term with a bell-shaped membership function. The network inputs x1, x2, ..., xn are only connected to their thermals. The number of nodes in the first layer is equal to the sum of the power of the term-sets of the input variables. The output of the node is the degree of belonging to the value of the input variable corresponding to the fuzzy term (1). µr ( x i ) =

1+

1

xi − c a

2b

(1)

where: a, b and c – options for configuring membership functions. Layer 2. The number of nodes in the second layer is equal to m. Each node in this layer corresponds to one fuzzy rule. The node of the second layer is connected to those nodes of the first layer that form the antecedents of the corresponding rule. Therefore, each node of the second layer can receive from 1 to n input signals. The output of the node is the degree of execution of the rule, which is calculated as the product of the input signals. Denote the outputs of the nodes of this layer by τr , r = 1, m. Layer 3. The number of nodes in the third layer is also equal to m. Each node in this layer calculates the relative degree of implementation of the fuzzy rule (2).

τ r∗ =

τr

j =1,m

τj

(2)

Layer 4. The number of nodes of the fourth layer is also equal to m. Each node is connected to one node of the third layer. The fourth layer node calculates the contribution of one fuzzy rule to the network output: yr = τ r∗ × (b0,r + b1,r x1 + ... + bn ,r x n ) (3)

Layer 5. A single node in this layer summarizes the contributions of all the rules:

12

y = y1 + ...yr  ... + ym (4)

Articles

N° 4

2020

Typical neural network training procedures can be used to set up an ANFIS network because it uses only differentiated functions. Typically, a combination of gradient descent is used in the form of algorithm of reverse error propagation and method of least squares. The algorithm of reverse error propagation adjusts the antecedents of the rules, that is, the membership functions. The method of least squares estimates the coefficients of the rules because they are linearly related to the output of the network. Each iteration of the tuning procedure is performed in two steps. In the first stage, a training sample is submitted to the inputs, and the optimum parameters of the nodes of the fourth layer are determined by the deviation between the desired and actual network behavior by the iterative method of least squares.In the second step, the residual deviation is transmitted from the output of the network to the inputs, and the parameters of the first layer nodes are modified by the method of reverse error propagation.At the same time, the coefficients of conclusions of the rules found in the first stage do not change.The iterative tuning procedure continues until the residual exceeds the preset value.

3. The Results of Research Complex technological objects, which are part of the OTS, always function in the conditions of intense perturbations and changing characteristics of the external environment, which leads to the need to take into account significant uncertainties and their inadequate decisions and the formation of management actions that do not correspond to the production situation and lead to significant risks. Considering the information provision of management processes it is advisable to take as a basis subsystems of decision support, in particular, in intelligent control systems where management actions are taken or controlled by the person who makes the decision (PMD). In this regard, the uncertainty is interpreted as incomplete, vague, obscure or ambiguous information about the object and its state in different operating modes. At the same time, the preparation of management actions is clearly related to risk as a certain possibility of obtaining a planned result, while the risk and uncertainty have the same essence and are measured in some units, for example, in percent, that is, uncertainty can become a risk in the implementation of management actions, and the implementation of a risky decision can lead to uncertainty. Thus, uncertainty as a phenomenon – fuzzy and blurry, contradictory descriptions of the object and production situations, insufficient and mutually exclusive information. When managing complex objects, it is necessary to take into account force majeure events that arise independently of the will and human consciousness and change the course of the control processes. It should also be taken into account that the actions of PMD, in it turn, can lead to uncertainty


Journal of Automation, Mobile Robotics and Intelligent Systems

– incompetence or unjustified decision without consideration of the problem situation and lack of time. In the management process, uncertainty at any level leads to risk, and the greater the uncertainty, the higher the risks. The technical literature gives enough places to classify variants and types of uncertainty of information, although in many ways it depends on the object of management and the peculiarities of its operation [10–16]. First of all, it is advisable to distinguish between two main classes: structural and parametric uncertainty and each of them, for the tasks of controlling and controlling the state of the object is complemented by many indicators, for example: • low accuracy of operational information about the state of the object (low accuracy of technical means for receiving and processing of data, failure of communication channels, transmission latency between levels of management, etc.); • inaccuracy of object models and description of information arrays, in particular systemic errors: incorrectly carried out decomposition (allocation of subsystems), relating both to object and task of management, linearization of models and their sampling; • obscurity of information due to the difficulties of formalizing a number of indicators, the use of linguistic variables and “soft” calculations; • the presence of a large number of management criteria (optimization) and the need to take into account a variety of constraints of different nature; • decision-making in multilevel hierarchical systems of an iterative nature with the need to solve numerous auxiliary tasks of coordinating decisions between levels. Finally, with the current development of information technologies separately identified uncertainty as information resource (knowledge) that in developed countries in the gross domestic product (GDP) is up to 30% due to the creation of value added. To characterize information resources, the term “content” is used as the content of a particular resource, which is considered outside the forms of its material representation – audio and video texts, individual images, verbal texts, symbols, etc.In a modern approach, the content may not be structured (Webcontent) or structured, for example, at the enterprise – tables, databases and knowledge bases, data warehouses, content displayed in a certain form – informative resource as a collection of data, information and knowledge that accumulate in the process of human activity and the operation of a particular object. In the information idea (K. Shennon, 1948), the amount of information is considered as a measure of certainty or uncertainty of the state of the system,

VOLUME 14,

N° 4

2020

which is used without mediation in the management process. Determining this uncertainty as entropy, it is assumed that obtaining information (its increase) reduces entropy. The measured uncertainty, in fact, the risk, is so different from the uncertain that it is not generally uncertain [13–16]. When managing technological complexes in the OTS class, the presentation of data and knowledge in the form of fuzzy sets with different membership functions becomes of particular importance. At the same time, the level of uncertainty is unambiguously connected with the value of the membership function, the decrease of which characterizes the deterioration of the quality of management decisions based on the fuzzy conclusion with the operations of merging and crossing fuzzy sets. The most important are the fuzzy rules “IF...THEN...” which form control poor (vague) defined objects. The comparison of the fuzzy set A with some base A* whose level of fuzziness is zero are used. To bring the comparison result in a range from 0 to 1, the formula [14–18] is used (5): 2 n

ρ( A, A*) = d( A, A*), (5)

where: ρ(A,A*) – the uncertainty level of the fuzzy set A in comparison with the fuzzy set A*; d(A,A*) – Hamming’s line distance between sets A and A* (6).

= d( A, A*)

where:

n

∑µ i =1

A

( x i ) − µ A∀ ( x i ) (6)

µ A ( xi ), µ A ( xi ) ∀

– the functions of the membership of the set A and A*; n – number of items compared in sets. As can be seen from Fig. 1 – result of unions of fuzzy sets, the uncertainty of the output (result) of the set increases [15–16].

μ(х) 1,0

2

0,5

1

3 0

5

9

х

a)

Articles

13


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

μ(х)

μ(х)

1,0 TM

1,0

TC

TB

0,5 4 3 5

0

9

х

b) Fig. 1. Functions of membership: a – initial, b – result of association

μ(х) 1,0 D2

D1

Z

J1

0

J2

1

х

Fig. 2. Term of the set of linguistic variable “cost change”

14

Technological complexes of various industries are characterized by certain operating regimes (technological regulations), which correspond to the set of variables of different nature – temperature, level, pressure, etc., and in many cases material flows are acquired (raw materials, intermediate liquid flows, pairs, etc.) which have a high dynamic change and determine the main technical and economic performance of the object. An approach based on fuzzy logic is increasingly used to evaluate and manage material flows. For example, Fig. 2 and Fig. 3 show the functions of the inclusion of linguistic variables “cost change” and “cost”. On Fig. 2 marked: D2 – “significantly reduce”, D1– “reduce somewhat”, Z – “do not change”, J1– “increase somewhat”, J2 – “significantly increase”. On Fig. 3 marked: TM – “small”, TC – “medium”, TB – “big”. On Fig. 2 and Fig. 3 the membership functions have a triangular shape, and in specific cases there could be another form, for example, exponential. One of the most powerful sources of information for decision-making tasks is time rows [19–20], as a set of consistent observational results. Time rows, as a rule, arise as a result of measuring a certain Articles

х

0 Fig. 3. Term of the set of linguistic variable “cost change”

indicator, for example, technological variables. In the time rows, for each reference, the measurement time or measurement number must be specified in order. The time rows significantly differ from the sample by the fact that it is necessary to take into account the connection of measurements with time. Time rows consist of two elements: the time for which the time values are given, and the actual numerical values of the indicator being analyzed (row level). Time rows allow objectively evaluate the state of an object, analyze its behavior, perform forecasting (extrapolation), determine the prospects for its development, identify characteristic manifestations of the behavior of the object (patterns), to form a base of precedents. The solution of the above tasks will, of course, sharply increase the efficiency of decision-making [21]. Time rows filtering have two components in its composition: a useful signal and noise, whose parameters are unknown and whose intensity is situational depending on various factors [22]. An effective approach for filtering time rows in which unstable noise characteristics are is the use of wavelet analysis [19–24]. The problem of identifying production processes based on a neuro-fuzzy network (NFN) is relevant, based on the conceptual foundations of the complexity of interconnections between input and output variables. In this case, we consider the technological complex of sugar production, the parametric scheme of which is shown in Fig. 4. Productivity

Dispense of diffusion juice Costs Ca(OH) 2

Technologacal complex of sugar production MILK

Sugart output Costs of sugar in the process

Costs of washing water

Fig. 4. Parametric scheme of the main flows


Journal of Automation, Mobile Robotics and Intelligent Systems

We propose a solution for this problem by solving two problems – two steps respectively. For correct representation of data and based on the peculiarities of the implementation of the algorithm of the logical Sugeno’s conclusion will be constructed two neuro-fuzzy networks. Solving this problem is done using the MatLab toolkit. In order to realize the solution of this problem, it is proposed to use the internal subsystem of Matlab environment – the subsystem of the development of neuro-fuzzy structures ANFIS. ANFIS is the Adaptive-Network-Based Fuzzy Inference System abbreviation – an adaptive network of fuzzy output. ANFIS is one of the first variants of hybrid neuro-fuzzy networks – a special type of direct propagation of the neural network. The architecture of a neuro-fuzzy network is isomorphic to a fuzzy knowledge base. In neuro-fuzzy networks, differentiated realizations of triangular norms are used (multiplication and probabilistic OR), as well as smooth membership functions. This allows the use of fast neural network training algorithms based on the method of reverse error propagation to set up neuro-fuzzy networks. The following describes the architecture and rules for the operation of each layer of the ANFIS network. ANFIS implements Sugeno’s fuzzy detection system in the form of a five-layer neural network of direct signal propagation. Assign layers as follows: • the first layer –the terms of the input variables; • the second layer – antecedents (parcels) of fuzzy rules; • the third layer – normalization of the degree of execution of the rules; • the fourth layer – the conclusion of the rules; • the fifth layer – the aggregation of the result obtained by different rules. To explain the content of the feedback surface of the knowledge base, we introduce the following designations of the graphs: Input 1 – the productivity of the plant; Input 2 – pump diffusion juice; Input 3 – consumption of lime milk; Input 4 – flow of flushing water; Output – sugar output. At the second stage of the problem’s solution, namely, to identify the main cause-effect relationships that allow to analyze the impact of such factors as productivity, discharge of diffusion juice, the cost of lime milk, the cost of washing water on sugar loss in the production process (Fig. 5). input

inputmf

rule

outputmf

output

Productivity Dispense of diffusion juice Costs Ca(OH) 2 Costs of washing water

Sugar output Logical Operations and or not

Fig. 5. The structure of the neuro-fuzzy network to identify the causal relationships of assessing the loss of sugar in the production process

VOLUME 14,

N° 4

2020

Information for time series is obtained from real enterprises for the same time intervals, namely for the last 2 years. Time series from six months to two years were used in the construction of neural networks and their testing. The above approach allows to simplify the work of expert people to identify the main relationships between the input and output variables of any technological complex, to generate information material for person who makes decisions (PMD). Solving the problem of identification on the basis of neuron-fuzzy approach has shown the possibility of establishing causal relationships between input and output variables processes sugar production in the form of fuzzy rules [25]. An effective modern approach to the creation of information management systems is a combination of methods for accounting both uncertainties and risks [22]. A particular importance is the formation of management (managerial) solutions in large systems, where the choice of options for their implementation is calculated by economic efficiency within the financial capabilities of the enterprise. The formalization of the choice of decision options is based on the introduction of an evaluation target function, when each alternative xi PMD attributes a probabilistic result P(xi) that characterizes all the consequences of this solution. There are a number of approaches in the technical literature, for the implementation of which various criteria are used, for example: • maximum mathematical expectation (hope) of gain, when the ODA can actually be estimated on the basis of the probability distribution of the state of the object and the environment; • minimal dispersion, which reduces the risk of obtaining a slight gain with a significant mathematical expectation of the range of its change; • the marginal level of gain at the existing interval; • the most probable result when it is possible to obtain all the necessary information for decision-making; • minimum average risk, when each of its values corresponds to the state of the environment. In order to use the statistical approach, a significant representative sample of observations or expert knowledge must be available for the formation of control actions with changing characteristics of the object and the environment. Modeling and testing were performed using Matlab and Fuzzy Logic Toolbox extension (fuzzy logic package), but it is not possible to include it in real OTS control systems. The authors created software modules on C# language in which the proposed methods and neuro-fuzzy structures ANFIS are fully implemented. Created software modules can be included in existing information systems and those that are being created. Due to the fact that the created software modules are created on C#, integration goes smoothly. Articles

15


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

4. Conclusion The scientific significance of the results are to determine the objective need to use intelligent decision support subsystems to quickly manage complex OTS based on both: clear and formalized data and knowledge and high-quality fuzzy estimates. Uncertainty of information is highlighted, technological monitoring values and the possibility of using neural networks to assess the status of control objects are shown. For efficient management of the OTS of different nature and purpose, the priority task is to create the necessary information support for the assessment of which the information resource (content) is used as its content beyond the forms of its material representation. It is shown that the formation of management actions in the OTS is unambiguously connected with the uncertainty of the information and the caused risks, the types of uncertainty of information are given. For the operational management of complex OTS the objective necessity of using intelligent subsystems of decision support based on both clear formalized data and knowledge and qualitative fuzzy evaluations is determined. Are given examples of use for the estimation of fuzzy knowledge of membership functions, and for the identification of the state of an object is an adaptive neuro-fuzzy network. In the conditions of long functioning of technological complexes in the OTS, a number of criteria for statistical analysis have been selected for the assessment of technical and economic indicators.

AUTHORS Igor Korobiichuk* – ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP, Jerozolimskie 202, 02-486 Warsaw, Poland, e-mail: igor.korobiichuk@piap.lukasiewicz.gov.pl Anatoliy Ladanyuk – National University of Food Technologies, 68 Volodymyrska Street, 01033, Kyiv, Ukraine, e-mail: ladanyuk@ukr.net. Regina Boiko – National University of Food Technologies, 68 Volodymyrska Street, 01033, Kyiv, Ukraine, e-mail: rela@ukr.net. Serhii Hrybkov – National University of Food Technologies, 68 Volodymyrska Street, 01033, Kyiv, Ukraine, e-mail: sergio_nuft@nuft.edu.ua. *Corresponding author

References

16

[1] I. Korobiichuk, A. Ladanyuk, L. Vlasenko and N. Zaiets, “Modern Development Technologies and Investigation of Food Production Technological Complex Automated Systems”. In: ProArticles

[2]

[3]

[4]

[5]  [6]  [7]  [8]

[9] [10] [11]

N° 4

2020

ceedings of the 2nd International Conference on Mechatronics Systems and Control Engineering – ICMSCE 2018, 2018, 52–56, DOI: 10.1145/3185066.3185075. I. Korobiichuk, N. Lutskaya, A. Ladanyuk, S. Naku, M. Kachniarz, M. Nowicki and R. Szewczyk, “Synthesis of Optimal Robust Regulator for Food Processing Facilities”. In: R. Szewczyk, C. Zieliński and M. Kaliczyńska (eds.), Automation 2017, vol. 550, 2017, 58–66, DOI: 10.1007/978-3-319-54042-9_5. V. Tregub, I. Korobiichuk, O. Klymenko, A. Byrchenko and K. Rzeplińska-Rykała, “Neural Network Control Systems for Objects of Periodic Action with Non-linear Time Programs”. In: R. Szewczyk, C. Zieliński and M. Kaliczyńska (eds.), Automation 2019, vol. 920, 2020, 155–164, DOI: 10.1007/978-3-030-13273-6_16. V. Sidletskyi, I. Korobiichuk, A. Ladaniuk, I. Elperin and K. Rzeplińska-Rykała, “Development of the Structure of an Automated Control System Using Tensor Techniques for a Diffusion Station”. In: R. Szewczyk, C. Zieliński and M. Kaliczyńska (eds.), Automation 2019, vol. 920, 2020, 175–185, DOI: 10.1007/978-3-030-13273-6_18. A. P. Ladanyuk, N. M. Lutska, V. D. Kishenko, L. O. Vlasenko and V. V. Ivaschuk, Methods of conventional theory of management, Lira: Kyiv, 2018 (in Ukrainian). J. Stahre, L. Mårtensson (eds.), Proceedings of 8th IFAC Symposium on Automated Systems Based on Human Skill and Knowledge, Elsevier, 2003. M. P. Kazmierkowski, “Integration Technologies for Industrial Automated Systems”, IEEE Industrial Electronics Magazine, vol. 1, no. 1, 2007, 51–52, DOI: 10.1109/MIE.2007.357179. A. Colombo, T. Bangemann, S. Karnouskos, J. Delsing, P. Stluka, R. Harrison, F. Jammes, and J. L. Lastra (eds.), Industrial Cloud-Based CyberPhysical Systems: The IMC-AESOP Approach, Springer International Publishing, 2014, DOI: 10.1007/978-3-319-05624-1. W. Q. Yan, Introduction to Intelligent Surveillance: Surveillance Data Capture, Transmission, and Analytics, Springer International Publishing, 2019, DOI: 10.1007/978-3-030-10713-0. T. O. Prokopenko and A. P. Ladanyuk, Information technology management organizational and technological systems, Vertikal: Cherkasi, 2015 (in Ukrainian). J. Yu, Y. Li, M. Chen, B. Zhang and W. Xu, “Decision-theoretic rough set in lattice-valued decision information system”, Journal of Intelligent & Fuzzy Systems, vol. 36, no. 4, 2019, 3289– 3301, DOI: 10.3233/JIFS-172111.


Journal of Automation, Mobile Robotics and Intelligent Systems

[12] S. Hrybkov and H. Oliinyk, “Modeling of the decision support system structure in the planning and controlling of contracts implementation”, Ukrainian Journal of Food Science, vol. 3, no. 1, 2015, 123–130. [13] T. Bakshi, B. Sarkar and S. K. Sanyal, “An Evolutionary Algorithm for Multi-criteria Resource Constrained Project Scheduling Problem based On PSO”, Procedia Technology, vol. 6, 2012, 231–238, DOI: 10.1016/j.protcy.2012.10.028. [14] S. Hrybkov, H. Oliinyk and V. Litvinov, “Web­ oriented decision support system for planning agreements execution”, Eastern-European Journal of Enterprise Technologies, vol. 3, no. 2 (93), 2018, 13–24, DOI: 10.15587/1729-4061.2018.132604. [15] N. M. Lutska and A. P. Ladanyuk, Optimal and robust control systems for technological objects, Lira: Kyiv, 2015 (in Ukrainian). [16] F. Jabari, S. Nojavan, B. Mohammadi-Ivatloo, H. Ghaebi and M.-B. Bannae-Sharifian, “Robust Unit Commitment Using Information Gap Decision Theory”. In: B. Mohammadi-ivatloo and M. Nazari-Heris (eds.), Robust Optimal Planning and Operation of Electrical Energy Systems, 2019, 79–93, DOI: 10.1007/978-3-030-04296-7_5. [17] H.-J. Zimmermann, “Fuzzy set theory”, WIREs Computational Statistics, vol. 2, no. 3, 2010, 317–332, DOI: 10.1002/wics.82. [18] W. Pedrycz and P. Rai, “Collaborative clustering with the use of Fuzzy C-Means and its quantification”, Fuzzy Sets and Systems, vol. 159, no. 18, 2008, 2399–2427, DOI: 10.1016/j.fss.2007.12.030.

VOLUME 14,

N° 4

2020

[19] E. Lughofer, A.-C. Zavoianu, M. Pratama and T. Radauer, “Automated Process Optimization in Manufacturing Systems Based on Static and Dynamic Prediction Models”. In: E. Lughofer and M. Sayed-Mouchaweh (eds.), Predictive Maintenance in Dynamic Systems, 2019, 485–531, DOI: 10.1007/978-3-030-05645-2_17. [20] J. Beran, Mathematical Foundations of Time Series Analysis: A Concise Introduction, Springer International Publishing, 2017, DOI: 10.1007/978-3-319-74380-6. [21] N. T. Son and D. T. Anh, “Discovering Time Series Motifs Based on Multidimensional Index and Early Abandoning”. In: N.-T. Nguyen, K. Hoang and P. Jȩdrzejowicz (eds.), Computational Collective Intelligence. Technologies and Applications, 2012, 72–82, DOI: 10.1007/978-3-642-34630-9_8. [22] A. Tapinos and P. Mendes, “A Method for Comparing Multivariate Time Series with Different Dimensions”, PLOS ONE, vol. 8, no. 2, 2013, DOI: 10.1371/journal.pone.0054201. [23] L. Wei and E. Keogh, “Semi-Supervised Time Series Classification”. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, 748–753, DOI: 10.1145/1150402.1150498. [24] S. Hira and P. S. Deshpande, “Data Analysis using Multidimensional Modeling, Statistical Analysis and Data Mining on Agriculture Parameters”, Procedia Computer Science, vol. 54, 2015, 431–439, DOI: 10.1016/j.procs.2015.06.050. [25] J. Burgess, Wavelets: Principles, Analysis and Applications, Nova Science Publishers, Incorporated, 2018.

Articles

17


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Using Brain-Computer Interface Technology for Modeling 3D Objects in Blender Software Submitted: 6th February 2020; accepted: 17th August 2020

Mateusz Zając, Szczepan Paszkiel DOI: 10.14313/JAMRIS/4-2020/40 Abstract: Computer 3D modeling has primarily relied on the Windows, Icons, Menus, Pointer (WIMP) interface in which user input is in form of pointer movements and keystrokes, since its beginning. The brain-computer interface (BCI) is a technology which allows users to take action in computer by using their brain signals. This paper presents the usage of EMOTIV EPOC+ Neuroheadset in Blender software for executing specific Blender’s functions for editing 3D objects. The purpose of this paper is to briefly, yet illustratively, present the application of EMOTIV EPOC+ Neuroheadset in an intersting application in Blender software for the editing of 3D objects. Keywords: BCI, EMOTIV EPOC+ Neuroheadset, Blender, 3D modeling

1. Introduction

18

3D modeling is a process of creation and modification of three-dimensional objects using a specialized computer program that provides a set of necessary tools for user. 3D modeling usually starts with basic shapes (primitives) such as cubes, spheres, torus and others. These shapes are then modified by different functions provided in software. The user activates these function usually by pressing combination of keys on keyboard or by selecting them from the user interface. Nowadays, there are many powerful 3D modeling software which allows to create 3D assets, aminations, special effects and render images. The most popular paid applications are Autodesk Maya, Autodesk 3ds Max and Cinema 4D. There are also many free applications available but the most popular one is called Blender. Blender is a free open-source 3D computer graphics software toolset. It is written in C, C++ and Python programming languages. Blender Foundation is a non-profit organization that is responsible for the Blender development. Blender is also developed by the community which creates additional plugins written in Python (called add-ons). Add-ons add new or improved functionality to Blender. Blender has recently gained a big financial support by Epic Games, Nvidia or Intel due to the creation of Blender Development Fund. It allowed Blender Foundation to recruit new team members and in the result to develop Blender faster.

The main features included in Blender are: 3D modeling, texturing, animating and rendering. It mainly relies on Windows, Icons, Menus, Pointer (WIMP) interface which forces users to memorize multiple hotkeys or use the graphic icon to finish the action. Because of that the user’s capabilities and performance in modeling process are limited. User interaction with computer should include the human–human interaction (HHI) aspects in the interface [1]. HHI involves the simultaneous application and perception of behavioral signals such as gestures, speaking or moving [2]. However, there have already been many interfaces that took this approach in order to combine speech with gesture, speech with mouse or speech with touch input [3–5]. A very promising technology is developing nowadays. It is called Brain-Computer Interface (BCI). Because of BCIs’ big possibilities, the usage of connection between human brain and a computer might have a great impact on a way the people communicate with a computer [1]. It can find an application in 3D modeling but also in many other areas. BCI is a field of research that is developed to collect human brainwaves and relate patterns in brain signals to the users’ thoughts and intentions via electroencephalography (EEG). BCI is also known as Brain Machine Interface (BMI) or sometimes known as Direct Neural Interface [6]. BCI appears in many areas including control of various software applications (including video games, web browsers or typewriters), modern smart home applications or environment control, but mainly targets disabled people (wheelchairs, neural prosthetics, robotic arms) [7–11]. The big BCI’s advantage is its non-invasive nature [12]. BCI uses electroencephalography which is a way to record and analyse human brainwaves. Brain waves are divided into groups such as Gamma, Beta, Alpha, Mu, Theta, Delta. As shown in Table 1, each of the brain waves has a different frequency and function [13]. Tab. 1. Brainwave rhythms and their functions Rhythm (frequency)

Function

Gamma (>30 Hz)

Awake with large brain activity

Beta (12 – 28 Hz) Awake with mental activity

SMR (13 – 15 Hz) Awake with physical activity Alpha (8 – 12 Hz) Awake and resting


Journal of Automation, Mobile Robotics and Intelligent Systems

Rhythm (frequency) Mu (7 – 11 Hz)

Theta (4 – 7 Hz) Delta (1 – 3 Hz)

VOLUME 14,

N° 4

2020

Function Awake and resting Sleeping

Deep sleep

EEG is not the only input that can be used to communicate with a computer. Electromyography (EMG) is an electrodiagnostic medicine technique for evaluating and recording the electrical activity produced by skeletal muscles. There are many low cost, non-invasive EEG devices available now. The research has been performed using EPOC+ Neuroheadset developed by EMOTIV (According to [14] the best commercial neuroheadset in terms of usability). The use of EMOTIV EPOC+ Neuroheadset involves using EEG signals simultaneously. Every user has individual thought pattern when performing the same task. All thoughts can be assigned to a specific Blender functions. The user-specific signals are converted into intended operations performed on editing object. Those operations include, moving, scaling, rotating objects or extruding. This paper describes the research process followed by software and overall system used in the study. It presents measurements results followed by implementation concept, summarize tests and conclusions at the end.

Fig. 1. Sensors distribution on human head The third step was to train some action to manipulate the virtual 3D cube. Starting with a neutral state where user had to relax and clear his thoughts. Then, the first action was selected (how the cube should behave) and training began. During a training user tried to imagine some things (for example moving the cube to left). After a training the attempt of moving the cube to the left was done (Fig. 2). The goal was to achieve all four levels in order to manipulate the cube in four possible ways. During the whole research session all brain waves were monitored via EMOTIV Xavier TestBench (Fig. 3).

2. Research Methodology In the study carried out electrical signals based on user’s brain activity were used, i.e. electroencephalography (EEG), and facial muscles for interfacing with the EPOC Control Panel suites which later are used to work with Blender. First of all the EMOTIV EPOC+ Neuroheadset had to be prepared to be ready to work. The battery had to be charged. It took about 20 minutes to charge the headset. In order to acquire accurate signal the sensors had to be hydrated using saline solution. After that, the sensors were installed in headset. Then the headset were fitted on the head. EMOTIV EPOC+ Neuroheadset uses Bluetooth connection to communicate with a computer. The USB Bluetooth receiver was plugged in the USB port. Secondly, the EPOC Control Panel application was opened to check contact quality of each sensor. The sensor electrode figures correspond to the quality of contact with a skin: green, yellow, orange, red and black indicating high to low quality (Fig. 1). Figure 1 also shows the intermediate states showing the lack of correlation between the electrode and the scalp at the calibration stage.

Fig. 2. Interface of EPOC Control Panel application

Fig. 3. Interface of EMOTIV Xavier TestBench application

Articles

19


Journal of Automation, Mobile Robotics and Intelligent Systems

3. Measuring Device The EMOTIV EPOC+ Neuroheadset is equipped with 14 biopotential sensors with gold plated connectors. These sensors are optimally positioned on the head to record signal from many brain areas. The sensors locations are: AF3, AF4, F3, F4, F7, F8, FC5, FC6, T7, T8, P7, P8, O1, O2. The communication between EMOTIV EPOC+ Neuroheadset and computer is based on wireless Bluetooth 2.4 GHz connection via USB Bluetooth receiver which does not require custom drivers [15]. The headset is powered by rechargeable Lithium Polymer 640 mAh battery. It can capture brain activity up to 12 hours. EMOTIV EPOC+ Neuroheadset is equipped with 3-axis accelerometer and 3-axis magnetometer. It gives users total range of motion. The EMOTIV EPOC+ Neuroheadset has an inbuilt he EEG is filtered by the EPOC+ hardware with a fifth order digital Sinc filter using bandpass of 0.16–43 Hz and notch at 50 Hz and 60 Hz. The sampling rate of the headset is 128 Hz [16]. It is supported on Windows, MAC, iOS and Android. It recognizes facial expressions like blink, wink, surprise, smile and more. It also measures emotions like excitement, stress, focus or relaxation [17].

VOLUME 14,

N° 4

2020

During the second training (Easy) the selected action was moving the cube to the right. In the result, minor variations appeared on charts in many areas. The most significant changes were detected in AF3, AF4, F7 and F8 brain areas as shown in Fig. 3. Attempt of moving the cube to the right was successful but some problems appeared. The cube was not fully controllable. It was performing chosen action in random moments. The cube was not fully controllable because of many thoughts in a brain which were interfering the signal. It took about 5 minutes to start controlling the cube. It was achieved by focusing more on the cube and trying to stay calm. After training another action (moving cube to the left) the level of the complexity of controlling the cube increased. Many unintended actions i.e. random movement to the right or to the left appeared. The time of the adaptation to fully control the cube increased to about 8 minutes. The third action (moving up the cube) was partly achieved. The level of complexity and the time needed to adapt to control the cube raised to 10 minutes. The cube was partly controllable. Many unintended actions appeared. The full control was not achieved due to limited battery charge percentage and available laboratory session time.

Fig. 4. EMOTIV EPOC+ Neuroheadset [17]

4. Measurement Results The various results recorded from an EEG-based cognitive BCI experiment designed to convert human thoughts on specific virtual cube manipulation actions via electroencephalography are presented and described in this section. The results shown here are for one subject (an author). The first part of an experiment includes training of neutral state of a brain and training of four different actions performed on the virtual cube. The whole process is supported by monitoring brain activity in real time represented by chart records. During neutral state training the thoughts were cleared out and big variations were not noticed on the EMOTIV Xavier TestBench charts. All of 14 charts were oscillating around zero.

20

Articles

Fig. 5. EEG signal from (respectively) AF3, AF4, F7 and F8 sensors

Fig. 6. EEG signal as an artifacts from F7 sensor


Journal of Automation, Mobile Robotics and Intelligent Systems

Fig. 7. EEG signal as an artifacts from F8 sensor The second experiment concerns the usage of facial muscles. The one difference between the first and the second experiment is that facial expressions were recorded during a training instead of thought pattern. The significant changes have been detected in some areas corresponding to a specific face muscles movements. Fig. 6 shows signal strength from F7 and F8 sensors after rising eyebrows. The signal strength is very big comparing to the EEG thoughts signal. It appears as a artifacts on charts. After training the first action, the cube moved to the right instantly with a maximum power. All of four available levels (Easy, Medium, Hard and Extreme) were achieved in approximately 7 minutes. Due to that the virtual cube could be manipulated in four possible ways by performing assigned face action (Table 2).

VOLUME 14,

N° 4

2020

suite and Mouse emulator. The most significant are Cognitiv and Expressive suites. The Cognitiv suite is responsible for acquiring EEG signal and associating it with a specific action. During the cognitive training, the user has to imagine manipulating a cube in the virtual environment (see Fig. 2). The expressive suite uses signals that are generated from the movement of facial muscles. It offers assigning specific keys/strings into seven different facial expressions (look left/right, blink, left wink, right wink, raise brow, clench teeth, smile). It is possible to adjust the sensitivity of triggering actions assigned to specific facial expression. Another used function is EMOTIV Xavier EmoKey. EmoKey is a background process that runs behind applications. It is used to emulate keys or keystrokes which are assigned to a specific user’s though or facial expression via user interface. During the assignment of EEG signals it is possible to set specific conditions which defines when the specific action will be triggered. These conditions are: ‘greater than’, ‘equal to’, ‘lesser than’, and ‘occurs’. It enables user to calibrate the strength of signal which is required to perform an action. There is also a possibility to choose the window of the application in which the key will be emulated. Therefore, EMOTIV Xavier EmoKey can be a user’s configuration tool which allows to assign specific EEG signals to specific keys that will activate intended functions in Blender.

Tab. 2. Mapping of face actions used cube manipulation Cube manipulation

Face action

moving right

Close right eye

moving left moving up rotate

Close left eye

Raise eyebrows Clench teeth

EEG brain thoughts training took more time than facial expression training to fully control the cube and the complexity of manipulating it was much higher.

5. Implementation Concept The implementation concept involves connecting corresponding user thoughts and facial movements with a specific Blender functions. The processes that would allow this to happen are training though patterns/facial expression and assigning them to a specific key/keystroke. It could be done by using some of EMOTIV applications. EMOTIV APIs are used as a connector between user and Blender. The functions used in this study are EPOC Control Panel and EMOTIV Xavier EmoKey. The EPOC Control Panel is used as a training platform in which user is able to configure the headset in a desired way. User can display there the sensors’ signal quality or battery power. EPOC Control Panel consist of Expressive suite, Affective suite, Cognitiv

Fig. 8. EMOTIV Xavier EmoKey application interface [18]

6. Tests Keystrokes showed in Table 3 are used to execute specific functions in Blender. Every action showed in Table 3 was assigned to a specific key/keystroke during the training session described in “Measurement results” section. Six function in total were assigned to specific actions. Two of them use EEG signal and four of them use signal generated by facial expressions. As shown in Fig.5. every action performed by user is detected by BCI system and processed by EMOTIV APIs. The output of this process is in form of an emulated key/keystroke which next is used to execute specific function in Blender. Articles

21


Journal of Automation, Mobile Robotics and Intelligent Systems

As shown in Fig. 6 the Rotate function is executed by pressing “R” key on the keyboard. In order to press “R” without using a keyboard the EPOC Control Panel Cognitiv Suite was used. User tried to imagine the rotation of an object just like during the training. The “Rotate” function was then executed in Blender without any physical interaction with a keyboard or a mouse. Then it was possible to specify the rotation axis (X,Y,Z) and the angle of rotation. The rotation angle could be set using the mouse movement or by typing the specific value using keyboard. The “Scale” function was assigned to facial expression action. After rising eyebrows the scale function was executed in Blender. The scale value could be specified by typing a value on a keyboard or by moving a mouse. The one problem occurred when using the neuroheadset. At the beginning of the laboratory session the neuroheadset was not fully charged. The battery run out after 1 hour and 40 minutes of working. EMOTIV EPOC+ Neuroheadset was less response for about ten to fifteen minutes before the battery run out. The battery power status was displayed as “Critical” in the EPOC Control Panel. It has affected the attempts of manipulating the cube in EPOC Control Panel Congitiv suite i.e. after completing the training on any level, the virtual cube did not perform any action. The signal variations displayed on charts in EMOTIV Xavier TestBench were a lot weaker than before. The variation were detected only after facial muscles movement. Tab. 3. Sample mapping of keystrokes using EMOTIV Xavier EmoKey Function

Action

Grab

Though 1 (Easy)

Extrude

Clench teeth

Rotate Scale

Undo Redo

Though 2 (Medium) Raise eyebrows Close left eye

Close right eye

Key/Keystroke G

R S

E

CTRL+Z

SHIFT+CTRL+Z

Fig. 9. A process of converting brain signal to emulated key/keystroke 22

Articles

VOLUME 14,

N° 4

2020

Fig. 10. The performed functions on monkey’s head model in Blender using emulated keys

7. Conclusion In this paper the BCI-based system is evaluated to be a tool in a 3D modeling process in Blender. However, the presented concept can be used in many other applications. Blender is only one example. Many 3D applications uses WIMP interface based on mouse and keyboard and requires memorizing many keystrokes to perform specific actions just like in Blender. The application of this concept in popular 3D software might have a big influence on the 3D industry and change a way to model 3D objects in many applications. The research was focused on recording EEG signals and converting them into user-specified actions using EMOTIV EPOC+ Neuroheadset and inbuilt APIs. The processed information is then used to carry out 3D modeling in Blender. Several tests were performed, in order to evaluate the usability of this concept. Tests involved a training of some action to manipulate the virtual 3D cube in EPOC Control Panel using brain signals and facial expressions. After tests it can be concluded that the level of complexity of manipulating the cube increase with the number of trained actions. Controlling the cube by using the facial muscles movement is much more responsive and intuitive for user and also requires less time to train than using brain signals. However, in case of cognitiv control, the number of possible functions that could be assigned to a human thoughts is almost limitless. This is the biggest advantage over the face expressions control. Nevertheless, the BCI can be used in 3D modeling. The main issue to focus on in the future is to decrease the time needed by user to configurate the BCI system and the time needed to adapt to it. The more time user uses the BCI, the more controllable and useful it becomes. The other issue to improve is better rec-


Journal of Automation, Mobile Robotics and Intelligent Systems

ognition and analysis of thoughts patterns. The more precise study in this field in a future might allow to increase the usability of BCI in 3D modeling but also in many other areas. The development and progress in BCI might be essential in the future for human interaction with the electronic devices. The effect of this might be that people will not need a mouse and keyboard to control the computer. The pace and efficiency of their work will then increase.

AUTHORS Mateusz Zając* – Opole University of Technology, Faculty of Electrical Engineering, Automatic Control & Informatics, 76 Prószkowska Street, 45-758 Opole, Poland, e-mail: mateusz.zajac@student.po.edu.pl. Szczepan Paszkiel – Opole University of Technology, Faculty of Electrical Engineering, Automatic Control & Informatics, 76 Prószkowska Street, 45-758 Opole, Poland, e-mail: s.paszkiel@po.edu.pl. *Corresponding author

References  [1] S. S. Shankar and R. Rai, “Human factors study on the usage of BCI headset for 3D CAD modeling”, Computer-Aided Design, vol. 54, 2014, 51–55, DOI: 10.1016/j.cad.2014.01.006.  [2] H. Gürkök and A. Nijholt, “Brain–Computer Interfaces for Multimodal Interaction: A Survey and Principles”, International Journal of Human-Computer Interaction, vol. 28, no. 5, 2012, 292–307, DOI: 10.1080/10447318.2011.582022.  [3] R. A. Bolt, “Put-that-there”: Voice and gesture at the graphics interface”, ACM SIGGRAPH Computer Graphics, vol. 14, no. 3, 1980, 262–270, DOI: 10.1145/965105.807503.  [4] M. W. Salisbury, J. H. Hendrickson, T. L. Lammers, C. Fu and S. A. Moody, “Talk and draw: bundling speech and graphics”, Computer, IEEE, vol. 23, no. 8, 1990, 59–65, DOI: 10.1109/2.56872.  [5] A. Sharma, S. Madhvanath, A. Shekhawat and M. Billinghurst, “MozArt: a multimodal interface for conceptual 3D modeling”. In: Proceedings of the 13th international conference on multimodal interfaces – ICMI ‘11, 2011, DOI: 10.1145/2070481.2070538.  [6] M. A. A. Kasim, C. Y. Low, M. A. Ayub, N. A. C. Zakaria, M. H. M. Salleh, K. Johar and H. Hamli, “User-Friendly LabVIEW GUI for Prosthetic Hand Control Using Emotiv EEG Headset”, Procedia Computer Science, vol. 105, 2017, 276– 281, DOI: 10.1016/j.procs.2017.01.222.

VOLUME 14,

N° 4

2020

[7] J. d. R. Millán, R. Rupp, G. Mueller-Putz, R. Murray-Smith, C. Giugliemma, M. Tangermann, C. Vidaurre, F. Cincotti, A. Kubler, R. Leeb, C. Neuper, K. R. Mueller and D. Mattia, “Combining Brain–Computer Interfaces and Assistive Technologies: State-of-the-Art and Challenges”, Frontiers in Neuroscience, vol. 4, 2010, DOI: 10.3389/fnins.2010.00161.  [8] K. J. De Laurentis, Y. Arbel, R. Dubey and E. Donchin, “Implementation of a P-300 Brain Computer Interface for the Control of a Wheelchair Mounted Robotic Arm System”. In: ASME 2008 Summer Bioengineering Conference, Parts A and B, 2008, 721–722, DOI: 10.1115/SBC2008-193253.  [9] B. Obermaier, G. R. Muller and G. Pfurtscheller, ““Virtual keyboard” controlled by spontaneous EEG activity”, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 11, no. 4, 2003, 422–426, DOI: 10.1109/TNSRE.2003.816866. [10] L. Bianchi, L. Quitadamo, G. Garreffa, G. Cardarilli and M. Marciani, “Performances Evaluation and Optimization of Brain Computer Interface Systems in a Copy Spelling Task”, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 15, no. 2, 2007, 207–216, DOI: 10.1109/TNSRE.2007.897024. [11] C. Wang, B. Xia, J. Li, W. Yang, Dianyun, A. C. Velez and H. Yang, “Motor imagery BCI-based robot arm system”. In: 2011 Seventh International Conference on Natural Computation, 2011, 181–184, DOI: 10.1109/ICNC.2011.6021923. [12] K.-R. Muller and B. Blankertz, “Toward noninvasive brain-computer interfaces”, IEEE Signal Processing Magazine, vol. 23, no. 5, 2006, 128– 126, DOI: 10.1109/MSP.2006.1708426. [13] S. Paszkiel, “Data Acquisition Methods for Human Brain Activity”. In: Analysis and Classification of EEG Signals for Brain–Computer Interfaces, vol. 852, 2020, DOI: 10.1007/978-3-030-30581-9_2. [14] K. Stamps and Y. Hamam, “Towards Inexpensive BCI Control for Wheelchair Navigation in the Enabled Environment – A Hardware Survey”. In: Y. Yao, R. Sun, T. Poggio, J. Liu, N. Zhong and J. Huang (eds.), Brain Informatics, 2010, 336–345, DOI: 10.1007/978-3-642-15314-3_32. [15] T. D. Sunny, T. Aparna, P. Neethu, J. Venka­tes­ waran, V. Vishnupriya and P. S. Vyas, “Robotic Arm with Brain – Computer Interfacing”, Procedia Technology, vol. 24, 2016, 1089–1096, DOI: 10.1016/j.protcy.2016.05.241. [16] Z. Koudelkova and R. Jasek, “Capturing Brain Activity During Driving Automobile”, Transportation Research Procedia, vol. 40, 2019, 1434– 1440, DOI: 10.1016/j.trpro.2019.07.198. Articles

23


Journal of Automation, Mobile Robotics and Intelligent Systems

[17] S. Paszkiel, “Augmented Reality of Technological Environment in Correlation with Brain Computer Interfaces for Control Processes”. In: R. Szew­ czyk, C. Zieliński and M. Kaliczyńska (eds.), Recent Advances in Automation, Robotics and Measuring Techniques, vol. 267, 2014, 197–203, DOI: 10.1007/978-3-319-05353-0_20. [18] S. Paszkiel and M. Sikora, “The Use of BrainComputer Interface to Control Unmanned Aerial Vehicle”. In: R. Szewczyk, C. Zieliński and M. Kaliczyńska (eds.), Automation 2019, vol. 920, 2020, 583–598, DOI: 10.1007/978-3-030-13273-6_54.

24

Articles

VOLUME 14,

N° 4

2020


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Analysis of the Surrounding Environment Using an Innovative Algorithm Based on Lidar Data on a Modular Mobile Robot Submitted: 18th September 2019; accepted: 18th March 2020

Nicola Ivan Giannoccaro, Takeshi Nishida DOI: 10.14313/JAMRIS/4-2020/41 Abstract: In this paper, a low cost mobile robot with a modular design that permits the easy change of the number of wheels is considered for generation of 3D digital maps by using ROS tools and a 3D light detection and distance measurement (LiDAR) sensor. The modular robot is thought for travelling through several environments with saving the energy by changing the number and arrangement of the wheels according to the environment. The presented robot can construct a 3D map in particular structured environment and the running performance was investigated by an extensive characterization. Furthermore, in light of the experimental tests, a new simple algorithm based exclusively of the processing of the LiDAR data is proposed with the aim of characterizing the surrounding environment with fixed landmarks and mobile targets. Finally, the limits of this prototype and of the proposed algorithm have been analyzed, highlighting new improvements in the future perspective development for permitting an autonomous environment perception with a simple, modular and low-cost device. Keywords: mobile robot, driving module, 3D digital map, LiDAR, ROS, SLAM

1. Introduction In several robotic applications, it is necessary to analyze the surrounding environment to get information about the presence of objects and about the trajectory of the robot respect to landmarks or mobile targets (i.e. the cut trees in forestry where it is necessary to measure the number of trees [1]). Often, in several applications, it is necessary to measure the environment in detail also when it is difficult to identify the position by global navigation satellite system (GNSS) signals; in these cases, the generation of three dimensional digital map by light detection and distance measurement (LiDAR) has been studied [2–4]. Currently, research based on the application of simultaneous localization and mapping (SLAM) by using only LiDAR data is widely conducted. Several methods of measuring the surrounding environment using 2D LiDAR [5, 6] have been proposed applied to four-wheeled mobile robots that installs these algorithms and generate digital maps with real-time

SLAM. In addition, unmanned aerial vehicle (UAV) equipped with LiDAR [7] and measurement by a sixwheeled robot having a rocker bogie mechanism have been developed and their performance has been experimented [8]. On the other hand, it is difficult to predetermine the mechanism and size of mobile robots suitable for various environments and tasks. In each environment, heterogeneity of the ground, aspects of the ground surface, and so on are different. Increasing the contact area by increasing the number of wheels improves the running performance of the robot but reduces the energy efficiency for the movement. Reduction of energy efficiency should be avoided in designing autonomous mobile robots that must carry limited energy sources. Namely, there is a trade-off relationship between the number of wheels and the running energy. If it is a relatively flat ground, a robot can travel sufficiently with an independent two-wheel mechanism and has high energy efficiency. For traveling on a rough meadow, running performance can be improved by running on four or more wheels. In addition, in the case of driving in the forest and sand, a six-wheel rocker bogie mechanism for ensuring runnability is suitable, however its energy efficiency is low [9]. Recent works [10–11] have highlighted that robot operating system (ROS) platform is particularly advantageous for environment mapping generation by using a LiDAR mounted on a mobile robot. Moreover, the possibility of using the LiDAR data (point cloud data) with efficient algorithms for simultaneous robot localization and mapping has also been recently demonstrated (i.e. [12]). In this research, a simple low-cost robot specially designed, driven by ROS and realized with modularized wheel and a frame for connecting them and for supporting a scanning LiDAR is presented with an extensive characterization of its performance in relation with the LiDAR vibrations respect to the type of surface with which robot is in contact. This is important because the 3D LiDAR sensor mounted on the robot depending on the vibrations acting on the system, can scan the surrounding environment more efficiently with lower oscillations by using recent algorithms introduced in literature. The developed robot, which can be constructed by combining the modules, can adaptively change the number and arrangement of the wheels according to the environment to be measured and by taking into account the results of the present research.

25


Journal of Automation, Mobile Robotics and Intelligent Systems

The paper has two further aims: 1) To provide an algorithm for tracking the odometry of the mobile robot. This practice is very common in the robotic field and several SLAM algorithms have already been implemented. In most cases, however, the input data of the algorithms are taken from IMU and GPS sensors or, more rarely, from fixed cameras, which inspect a certain control environment. It is clear, though, that both methods have their limitations: in the first case, the application of the method appears to be difficult in shielded environments, due to the impossibility to reach the GPS signal. While in the second case, the equipment needed (such as cameras) would allow the control only over very limited conditions (light, brightness). The algorithm proposed here, on the other hand, uses only the data produced by LiDAR sensor. In fact, LiDAR can provide frames with Point Cloud data which characterize the surrounding environment; hence, the method could become more versatile and able to confront the critical issues described above. 2) To link the movement of the proposed modular robot with the surrounding environment, forecasting the kinematics of possible targets and collisions between the moving elements in the examined space. This algorithm, as a future development, aims to support the research studies related to autonomous driving, a topic of actual and great interest.

VOLUME 14,

N° 4

2020

2.2. Frame of the Robot The frame of the robot is designed so that the number of drive wheels can be changed according to the environment where the robot is traveling. Depending on the environment, the robot can be easily changed by user to energy-efficient wheel versions, six wheel versions with high running ability against complicated terrain, four wheel versions with intermediate capabilities of them or two wheels. Therefore, by designing the frame as an x shape as shown in Figure 2(a), since the width and length of the robot are the same, the robot can execute a spin turn by reverse rotation of each wheel. In the two wheels version, two driving wheels and one slave wheel are used. In the four wheels version, four driving wheels are connected in the same direction to the frame. In addition, for the six wheels version, an attachment frame was developed so that it becomes a rocker bogie mechanism. The proposed driving wheels, the frame, and the attachment frame are shown in Figure 2(b).

(a)

2. Design of Robot and Features 2.1. Devices of Robot In order to control the robot, one laptop computer with Intel Core m5 (1.10 GHz) and Ubuntu 14.04 LTS has been used. In addition, the robot operation system (ROS) [13] was adopted as middleware. Then, the following devices were incorporated into the robot: motor driver (MDD10A) which generates PWM signal, a motor controller (iMCs01) including pulse counter, a 3D LiDAR (YVT-X002, Hokuyo), and a 24 V battery. The robot system scheme is shown Figure 1. It is possible to control two driving modules with a set of motor driver and motor controller by a signal from the laptop with USB connection. The LiDAR uses power directly from a 24 V battery.

Fig. 1. Signal and electric scheme of the developed robot system 26

Articles

(b) Fig. 2. Combination of the driving modules with developed frames

2.3. Driving Module The driving module, shown Figure 3, consists of a DC motor (TE-38F16-24-64), a rotary encoder (E6A2-CW3E), and two wheels connected shaft and gears. The total length is 382.5 mm, the total width is 217 mm, and the weight is 2500 g. The wheel’s diameter is 150 mm and the maximum velocity is 0.4 m/s. This module has waterproof performance. Also, by equipping two tires in parallel, the driving module became less likely to catch on weeds and branches while the robot was moving. After connecting the connectors of each module and the frame, fix it by screwing with the frame. By measuring the rotation angle of the tire with a rotary encoder, the rotation speed of the tire is controlled. Also, since the sponges are inside the tires, they do not puncture [14].


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

The case anterior part has been designed for receiving and fixing in a correct way the LiDAR sensor, giving it also an adequate protection structure for facing eventual impacts during the robot motion (Fig. 5). The solid angle that is possible to inspect with this sensor model (Hokuyo XVT- 35 LX 3D LiDAR) is quite extensive in both the directions (Fig. 6).

Structure

(b) Overview

Fig. 3. The driving module consists of a DC motor, a rotary encoder, four gears and two wheels. (a) Structure and (b) over view of a driving module

2.4. Configuration of the Robot The developed robot can be changed to three different configurations by changing the combination of drive modules as shown in Figure 4. The first type consists of six driving modules and two rocker bogie joint attachment. The second one has four driving wheels, the third type has two driving wheels and one non-driving wheel. Each types are designed depending on an assuming following fields. • The six-wheel type: this is for traveling across all different conditions of the road (mud, grass, asphalt) and can climb obstacles [15]. This is influenced by the Rocker-bogie mechanism [16-18]. These characteristics are perfect to drive the Robot trough forest environment. • The four-wheel type: this keeps a great stability during up-hill and cross-hill, because of its symmetrical body, anyway the Robot presents some problems to climb obstacles due to no-bogie joint and no flexible structure. • The two-wheel type: this is the simplest configuration; Robot has a good stability but risks to tip over the ground if it overcomes obstacles or travels across rough ground.

(a)

(b)

(a)

(b) Fig. 5. a) Structure for placing the LiDAR sensor, b) The Lidar sensor mounted

(c)

Fig. 4. Three types of the robot configuration with changing drive wheel combinations: (a) two-wheel type with one non-driving wheel, (b) four-wheel type, and (c) six-wheel type with rocker bogie joint attachment

Fig. 6. Serviceable solid angle range of the LiDAR sensor

Articles

27


Journal of Automation, Mobile Robotics and Intelligent Systems

3. Preliminary Tests 3.1. Relationship Between the Number of Tires and Running Performance The vibrations induced on the LiDAR sensor, placed on the top of the robot, during its movement could compromise the accuracy of the acquired cloud of points and also could ease the robot organs of unscrewing (as verified during the experiments). For this reason, a reduction of the vibrations choosing the best robot configuration (2, 4 or 6 wheels) could become a strategic factor for the accuracy and the maintenance of the robot. At this proposal, an extensive experimental analysis has been carried out considering the data of the value of the vertical acceleration applied to the LiDAR by means of a Inertial Measurement Unit (IMU) placed inside the LiDAR box. The tests have been conducted with a first phase of calibration, by considering a preliminary evaluation of the characteristics of the used accelerometer; in the preliminary tests the robot was still on the floor (see Fig.7) and the accelerometer acquired the vertical oscillations for a period of 30 seconds and sampling frequency 50 Hz. The standard deviations σ, expressed by Eq. (1), for 3 different tests are reported in Table 1, demonstrating the similarity of the standard deviation value and giving a confidence value of the vertical accelerometer accuracy.

∑ a ( i ) − a (1) N

σ=

i =1

N

In (1), N is the total number of acceleration samples a(i) and a ̅ is the mean value, that is considered as accelerometer offset. The parameter introduced in (1) will be considered as the comparison parameter for the vertical oscillation for all the following experimental tests presented.

VOLUME 14,

N° 4

2020

All the possible robot configuration, in relation with the possible surfaces scenarios have been measured when the robot was running at a constant speed. When the acceleration is small, it indicates that the plurality of wheels are appropriately grounded to the environment. If the acceleration is large, the wheels do not properly contact the ground, and a jump accompanying the rotation of the wheels occurs. That is, by comparing the acceleration in the vertical direction for each type of robot, it is possible to evaluate the running performance against the running environment. The values of a vertical acceleration sensor installed in the LiDAR at that time by running the three types of the robot with three types of environment of grassland and gravel road at a constant speed were measured. The three environment are shown in Fig. 8. In each environments at least 3 experiments for each robot configuration (2, 4 or 6 wheels) have been carried out in order to characterize the robot behavior and also to be guaranteed about the repeatability characteristics and a sampling frequency of 50 Hz has been used. The robot has been drive by a joystick at its maximum speed in the longitudinal direction. In Fig. 9 an example of the results referred to the environment b) in Fig. 8, asphalt; it is evident a heavy difference of vertical acceleration for the 3 considered configurations. A more clear view may be carried out by Tables 2–4, where for each type of environment, asphalt in Table 2, grassland in Table 3, Gravel in Table 4 the standard deviation of 3 different tests, named Test1, Test2 and Test 3, for each configuration (2, 4 or 6 wheels) are reported. For the tests on gravel surface the configuration with 2 wheels was not able to move the robot, so only the results (2 tests) with others configurations have been reported.

a) Gravel

b) Asphalt

c) Grass

Fig. 8. Three environments for experiments

Fig. 7. Setup of the preliminary calibration tests Tab. 1. Standard deviation of the vertical acceleration for 3 tests with still robot First test Vertical acceleration standard deviation 28

Articles

0.138

Second test 0.156

Third test 0.188

Fig. 9. Vertical acceleration (in m/s2) of Test 1 on the asphalt for 3 configurations without gravity


Journal of Automation, Mobile Robotics and Intelligent Systems

Several considerations may be carried out analyzing the results in Tables 2–4 and considering several other experiments that have been conducted also on mixed surfaces or hill-profile and so on: – The parameter σ standard deviation gives effectively a precise trend for the different tests in the same configuration/surface and can assist in the choice of the optimized modular size with respect to the surface of movement; – Globally, the configuration with 2 wheels, nevertheless the lowest weight, has the worst behavior with vibrations amplified of 2 times for every surface; even, the movement is not allowed with 2 wheels configuration on very irregular surfaces (like the gravel here considered); – The configurations with 4 or 6 wheels, have similar behavior for all the tests, only a light preference may be given to the configuration with 6 wheels on soft surfaces (such as the grassland here considered), where probably the use of 2 more wheels may reduce the fluctuations and may improve the LiDAR scanning. Tab. 2. Standard deviation σ of the vertical acceleration Environment: Two-wheel Asphalt type

Four-wheel type

Six-wheel type

Test 1

0.99

0.42

0.37

Test 3

1.1

0.37

0.39

Test 2

0.97

0.35

0.37

Tab. 3. Standard deviation σ of the vertical acceleration Environment: Two-wheel Four-wheel Grassland type type

Six-wheel type

Test 1

0.37

Test 2

Test 3

0.95

1.1

0.93

0.42

0.45

0.41

0.35

0.37

Tab. 4. Standard deviation σ of the vertical acceleration Environment: Two-wheel Four-wheel Gravel type type

Six-wheel type

Test 1

0.48

Test 2

-

-

0.53

0.48

0.49

In order to conclude this analysis about the different configurations also an energetic evaluation of the new robot has been carried out evaluating, from the motor power, the total robot weight and plausible hypothesis of movement, the autonomy of the robot. It has been estimated an autonomy equal to 25 minutes for the configurations with 6 wheels, and 32 minutes for the 4 wheels robot with the actual battery of 3800mAh at 24 Volt. These estimated autonomy times have been confirmed by the experiments conducted.

VOLUME 14,

N° 4

2020

3.2. T he Use of LiDAR Sensor: Preliminary Tests and Pre-Processing of the Data

The LiDAR sensor may generate frames containing Point Cloud Data (PCD) mapping the neighboring environment with a sampling frequency of about 5 Hz (5 frames for second). Preliminary tests have been carried out by using the LiDAR data produced by the robot during its exploration. In Fig. 10 an environment placed inside the University Campus (Kyu­shu Institute of Technology, Tobata Campus), also localized in Fig. 10 from satellite image, is shown. In Fig. 11 b photo of the test setup where is evident that in the environment there is also a second robot (Fig. 11 a) that has the function of tracking object to be detected by the proposed robot, used in the configuration with 4 wheels in order to test the proposed strategy in conditions similar to the real ones. The extracted point clouds related to a frame are shown in Fig. 12a. In Fig. 12b the same frame is visualized after a preliminary pre-processing of the point clouds. The preliminary pre-processing makes the following operations: – modifying the frame orientation in such a way to align the image to the robot, with respect to the Lidar position on it; – executing a preliminary filtering of the data eliminating the outliers and the support ground; – visualization of the resulting pre-processed PCD. In Fig. 12b it is possible to note the absence of the supporting ground and of outliers. The PCDs after pre-processing could be more useful for giving correct information about the surrounding environment. The pre-processing phase has the aim to eliminate noise sources from the PCDs for facilitating the following phase of features and target search and it will be applied to all the analysis presented in the following section. After the alignment of the LiDAR to the robot (i.e. z axis has to be inverted due to the sensor position), the outliers are removed by considering a maximum distance threshold from the other points. Then, the ground points are eliminated, by considering a threshold of minimum distance by the fitting xy plane that permits to eliminate the points referred to the ground. Finally, the points included in a sphere, having the center in the axis origin (center of the robot) and radius equal to 1 meter, have been eliminated because the LiDAR sensor detects some lateral projections of the sensor included in the visive field of the sensor.

Fig. 10. Environment of preliminary tests

Articles

29


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

(b) Fig. 12. a) Extracted point clouds, b) Point clouds after pre-processing (a)

4. Algorithm for Robot Odometry and Target Following Based on LiDAR Data

(b) Fig. 11. a) target robot b) test setup

(a) 30

Articles

The use of a LiDAR sensor alone or in conjunction with other sensors, may be very useful for determining with accuracy the position of the robot and/or the presence and relative position of external obstacles. The sensor has been installed (see Fig. 5) on the external surface of the robot permitting a simplicity in the signals receptions and in the landmarks chosen. The basis principle of the algorithm here introduced is similar to that used for solving Simultaneous Localization and Mapping (SLAM) [19-20] problems but considering as landmark of the signals extracted by the LiDAR frames, fixed structures such as walls. The proposed algorithm is divided in two steps: – the first step operates on each single frame acquired from the LiDAR sensor during the robot exploration giving as result the position and pose of the mobile robot with respect to the researched landmark; – the second step operates on the sequence of frames related to a specific mapping test and gives as results the trajectory of the robot during the test with respect to landmarks (in fixed positions) and targets(in fixed or also mobile positions). The algorithm is mainly efficient for tests indoor or where it is easy to individuate some fixed reference structures (walls) that may be used as landmarks for estimating the robot pose. The idea of the proposed algorithm is to identify vertical walls, estimating their intersection and considering this line as the reference for defining the spatial coordinates of the proposed robot. In this way, considering a sequence of n frames recorded by the LiDAR sensor, applying sequentially the algorithm by means of a ‘while’ cycle, it is possible to estimate the relative coordinates (x,y,z) of the landmark with respect to the sensor position, placed on the robot. The relative coordinates (x,y,z) of the identified landmark are stored in a matrix n x 3 (where n is the number of PCDs saved during the test) and then processed for calculating the robot route. The post-processing proposed forecast the recognition of two vertical walls included in the considered PCD


Journal of Automation, Mobile Robotics and Intelligent Systems

[21]; the maximum distance between inliers and fitting plane has been fixed at 0.5 meters. After some tests, this chosen has been considered not robust and another condition has been added: the iterative plane fitting process is concluded only if the fitting error (from command ‘pcfit’ [21]) is lower than a threshold (chosen equal to 0.065 in the considered tests). Moreover, in order to make the algorithm more robust in indoor tests, a further condition of orthogonality between the identified planes is added when the presence of orthogonal walls is guaranteed. Preliminarily to the walls identification, a partition of the points of the analyzed PCD is necessary; at this purpose a clustering procedure has been considered in the proposed algorithm. The particularity of the clustering procedure here proposed lies in repeating iteratively the procedure for 10 times in such a way to have a robust partition of the data and to consider as starting points for each frame considered the final centroids obtained for the previous frame. This second shrewdness is plausible considering that the target objects (walls and the second robot) have a low velocity of movement with respect to the frame frequency and so, likely, the position of the centroids of each object does not change so much between one frame and the following one (after 0.2 seconds). In this way, the computational efficiency of the clustering procedure increases and gives stables results. The most important problem in the clustering operation is linked to the necessity to select the number of clusters in which the data has to be classified. All the tests presented in this research have a fixed number of clusters equal to 3, (2 for the walls and one for the robot target) and the clustering is realized by means of the ‘kmeans’ [19] command. Anyway, an interesting future development of the research is that of finding an auto-adaptive procedure in such a way to automatize the choose of the number of clusters useful for the classification. In Fig. 13 the fitted plane, Plane 1 from Cluster 1 and Plane 2 from Cluster 2, obtained by applying the described procedure to the frame shown in Fig. 12b. It is possible to note that the clustering has efficiently divided the data in 3 groups; red points (Cluster 1), blue points (Cluster 2) and green points (Cluster 3, target robot) for the point clouds in Fig. 12 b. In Fig. 13 are also shown, with a black cross, the centroids positions for each cluster and with a black sign. Once estimated the two planes from the point clouds, the algorithm calculates the intersection point between these points and the ground fitting plane calculated in the pre-processing phase (paragraph 3.2). The coordinates (x,y,z) of this point are saved and compared with the robot spatial coordinates at the previous step and indicated with a black sign in Fig. 13. The algorithm has been implemented in such a way to separate the clusters that identify the landmarks (Cluster 1 and Cluster 2) with the cluster related to the target robot (Cluster 3) by checking the distance of the target centroid with respect to the fitting landmark planes.

VOLUME 14,

N° 4

2020

Fig. 13. Algorithm results on a singular frame The algorithm has been conceived with the aim of solving two tasks: – Locating, in the surrounding environment, the fix and mobile targets; – Evaluating the mobile target kinematic in such a way to define the trajectory and foreseeing the future evaluation of the trajectory from the previous frames analysis. The algorithm has been generalized in such a way that it can process a sequence of consecutive PCD with the described procedure and estimating, for each frame, the fitted plane 1 and 2 and the robot target. . In this way it is possible to estimates the movement of the robot with respect to the fixed reference landmarks that are the walls of the room or of the space where the robot is moving and with respect to the target robot.

4.1. Use of the Proposed Algorithm for Robot Odometry With Stationary Target Robot

One application of the presented algorithm is shown in this test, where the robot in a 4 wheels configuration is moved by a joystick with a forward movement (along x axis). In the reality, the robot has moved of about 5 meters along x-axis, with a small deviation along y-axis and with a null deviation along z axis because the movement were along a flat surface. The application of the proposed procedure give the results depicted in Fig. 15, where the trajectory of the robot is clearly reconstructed (red circles) respecting the experimental behavior, and the target robot position is stationary and corresponds to the points of the figure not included in Plane 1 and Plane 2. The projection of the algorithm results on the xy plane is shown in Fig. 15 where is possible to emphasize the correct reconstruction of the robot trajectory. About the detection of the position of the target robot, the elements of the displacements from the landmark of the robot are equal and opposite to the displacements from the centroid of the target cluster, demonstrating the stationary condition of the target. Several experimental tests have been carried out, demonstrating, in all the cases, the capacity of the algorithm to estimate the robot trajectory and to detect the stationarity of the target. It is important to underline that the IMU data collected in all the tests, show important noise, making impossible any trajectory reconstruction close to the reality. Articles

31


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

Fig. 14. Algorithm results on consecutive frames

N° 4

2020

(a)

(b) Fig. 15. Projection of the results on xy plane

4.2. Use of the Proposed Algorithm for Robot Odometry With Moving Target Robot

32

In order to test the capacity of the proposed algorithm of tracking mobile targets moving in the visual field of the LiDAR sensor, other tests have been carried out moving the mobile target robot (shown in Fig. 10a) during the movement of the robot equipped with LiDAR. In the following Figures. 16-18 the results of the proposed algorithm referred to one test where the two mobile robots started to move after 10 seconds and stopped after 20 seconds. The first robot (we can name it First robot) moves with a trajectory directed along x axis, the second one, starting from a distance of about 10 meters, is manually moved almost perpendicularly to the direction of First Robot almost crossing its trajectory and stopping very close to it. The point clouds referred to the starting situation and the final one are reported in Fig. 16, the trajectories reconstructed with the proposed algorithm are depicted in Fig. 17 (red for First robot, blue for the target robot), and the distances component along the reference axis between the two robot, easily calculated after the trajectory reconstruction, are shown in Fig. 18. The proposed algorithm demonstrated its ability to detect mobile elements within the inspection zone and to evaluate the route by using the LiDAR data and the PCDs processing. The possibility of estimating in real time the trajectories of the target robot can also open interesting scenarios for predicting with a good accuracy its position if its movement continues without brusque accelerations. This possibility is very interesting considering autonomous inspection of autonomous machines equipped with a LiDAR sensor. Articles

Fig. 16. a) Point clouds related to the starting position of the test, b) Point clouds related to the final position of the test

Fig. 17. Algorithm results showing the calculated trajectories with mobile target robot

Fig. 18. Components of the distances between the robots in the test with mobile target robot

5. Conclusion In this paper, a modularized driving wheels robot able to have a balance between the mobile robot’s runnability and efficiency according to the environ-


Journal of Automation, Mobile Robotics and Intelligent Systems

ment has been presented together with an efficient algorithm using LiDAR data for robot self-localization and detection of mobile targets. The developed robot, which can be constructed by combining modules, may travel through several environment with saving the energy by adaptively change the number and arrangement of the wheels according to the environment. Moreover, several experiments have been conducted for evaluating the performance and the characteristics of the developed robot. First, it was shown that the developed mobile robot could easily change the three types of mechanisms according to the environment by changing the number of modularized driving wheels and their combination. The experimental tests confirmed that the six-wheel type is suitable for an uneven and soft environment and the configuration with 4 wheels has similar characteristics and bigger autonomy for hard and rigid surfaces. Next, it was shown by testing this mobile robot that it is possible to use the LiDAR data for constructing a 3D map by running on structured environment. In addition, a specific algorithm for automatically analyzing the LiDAR data has been presented and tested, demonstrating that, in specific structures scenarios, the robot can self-localize its position compared to fixed landmarks and may also evaluate the movement of eventual mobile targets, foreseeing their next displacements.

Acknowledgements

The authors thank the students Luigi Ricciato, Alessio Monaco and Kakeru Yamashita for their precious work related to this research.

AUTHORS Nicola Ivan Giannoccaro* – Department of Innovation Engineering, University of Salento, Lecce, 73100, Italy, e-mail: ivan.giannoccaro@unisalento.it.

Takeshi Nishida – Department of Control Engineering, Kyushu Institute of Technology, Kitakyushu, Japan, e-mail: nishida@cntl.kyutech.ac.jp. *Corresponding author

References  [1] N. Koenig and A. Howard, “Design and use paradigms for Gazebo, an open-source multi-robot simulator”. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3, 2004, 2149–2154, DOI: 10.1109/IROS.2004.1389727.  [2] P. Forsman and A. Halme, “3-D mapping of na­ tural environments with trees by means of mobile perception”, IEEE Transactions on Robotics, vol. 21, no. 3, 2005, 482–490, DOI: 10.1109/TRO.2004.838003.

VOLUME 14,

N° 4

2020

[3] T. Tsubouchi, A. Asano, T. Mochizuki, S. Kondou, K. Shiozawa, M. Matsumoto, S. Tomimura, S. Nakanishi, A. Mochizuki, Y. Chiba, K. Sasaki and T. Hayami, “Forest 3D Mapping and Tree Sizes Measurement for Forest Management Based on Sensing Technology for Mobile Robots”. In: K. Yoshida and S. Tadokoro (eds.), Field and Service Robotics: Results of the 8th International Conference, 2014, 357–368, DOI: 10.1007/978-3-642-40686-7_24.  [4] J. Billingsley, A. Visala, M. Dunn, “Robotics in Agriculture and Forestry”. In: B. Siciliano and O. Khatib (eds.), Springer Handbook of Robotics, 2008, DOI: 10.1007/978-3-540-30301-5_47.  [5] X. Liang, P. Litkey, J. Hyyppa, H. Kaartinen, M. Vastaranta and M. Holopainen, “Automatic Stem Mapping Using Single-Scan Terrestrial Laser Scanning”, IEEE Transactions on Geo­ science and Remote Sensing, vol. 50, no. 2, 2012, 661–670, DOI: 10.1109/TGRS.2011.2161613.  [6] M. A. Juman, Y. W. Wong, R. K. Rajkumar and L. J. Goh, “A novel tree trunk detection method for oil-palm plantation navigation”, Computers and Electronics in Agriculture, vol. 128, 2016, 172–180, DOI: 10.1016/j.compag.2016.09.002.  [7] R. A. Chisholm, J. Cui, S. K. Y. Lum and B. M. Chen, “UAV LiDAR for below-canopy forest surveys”, Journal of Unmanned Vehicle Systems, 2013, DOI: 10.1139/juvs-2013-0017.  [8] M. Morita, T. Nishida, Y. Arita, M. Shige-eda, E. di Maria, R. Gallone and N. I. Giannoccaro, “Development of Robot for 3D Measurement of Forest Environment”, Journal of Robotics and Mechatronics, vol. 30, no. 1, 2018, 145–154, DOI: 10.20965/jrm.2018.p0145.  [9] K. Kamikawa, T. Arai, K. Inoue and Y. Mae, “Omni-directional gait of multi-legged rescue robot”. In: IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA ‘04. 2004, vol. 3, 2004, 2171–2176, DOI: 10.1109/ROBOT.2004.1307384. [10] S.-A. Li, H.-M. Feng, K.-H. Chen, J.-M. Lin and L.-H. Chou, “Auto-maps-generation through Self-path-generation in ROS-based Robot Navigation”, Journal of Applied Science and Engineering, vol. 21, no. 3, 2018, 351–360, DOI: 10.6180/jase.201809_21(3).0006. [11] M. G. Ocando, N. Certad, S. Alvarado and Á. Terrones, “Autonomous 2D SLAM and 3D mapping of an environment using a single 2D LIDAR and ROS”. In: 2017 Latin American Robotics Symposium (LARS) and 2017 Brazilian Symposium on Robotics (SBR), 2017, DOI: 10.1109/SBR-LARS-R.2017.8215333. [12] Y.-T. Wang, C.-C. Peng, A. A. Ravankar and A. Ravankar, “A Single LiDAR-Based Feature Fusion Indoor Localization Algorithm”, Sensors, vol. 18, no. 4, 2018, DOI: 10.3390/s18041294. Articles

33


Journal of Automation, Mobile Robotics and Intelligent Systems

[13] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler and A. Ng, “ROS: an opensource Robot Operating System”, Proceedings of ICRA Workshop on Open Source Software, 2009. [14] D. Shrivastava, “Designing of All Terrain Vehicle (ATV)”, International Journal of Scientific and Research Publications, vol. 4, no. 12, 2014. [15] D. Pradhan, J. Sen and N. B. Hui, “Design and development of an automated all-terrain wheeled robot”, Advances in robotics research, 2014, 21–39, DOI: 10.12989/arr.2014.1.1.021. [16] R. Shah, S. Ozcelik and R. Challoo, “Design of a Highly Maneuverable Mobile Robot”, Procedia Computer Science, vol. 12, 2012, 170–175, DOI: 10.1016/j.procs.2012.09.049. [17] D. B. Bickler, US Patent Number 4,840,394, ”Articulated Suspension Systems”, US Patent Office, Washington, DC, 1989. [18] B. D. Harrington and C. Voorhees, “The Challenges of Designing the Rocker-Bogie Suspension for the Mars Exploration Rover”. In: Proceedings of the 37th Aerospace Mechanisms Symposium, 2004. [19] S. Thrun, W. Burgard and D. Fox, Probabilistic robotics, MIT Press, 2005. [20] J. Tang, Y. Chen, A. Kukko, H. Kaartinen, A. Jaakkola, E. Khoramshahi, T. Hakala, J. Hyyppä, M. Holopainen and H. Hyyppä, “SLAM-Aided Stem Mapping for Forest Inventory with SmallFootprint Mobile LiDAR”, Forests, vol. 6, no. 12, 2015, 4588–4606, DOI: 10.3390/f6124390. [21] “Computer Vision Toolbox”. Mathworks, https:// www.mathworks.com/products/computervision.html. Accessed on: 2021-02-03.

34

Articles

VOLUME 14,

N° 4

2020


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Preface to Special Issue on Modern Intelligent Systems Concepts II

DOI: 10.14313/JAMRIS/4-2020/42 Over these last decades, technological advancements have broken their records. We are witnessing an unprecedented technological revolution. The use of Intelligent Systems, IoT, Cloud Computing, Big Data, and more particularly Artificial Intelligence including Deep Learning and Machine Learning are becoming the norm in almost all activities. These disciplines are seeing their applications developed in all sectors including education, industry, health, agriculture, economy, finance, energy, environment, security, transport, societal challenges, and in general, in any field that one can think of. Our goal, as researchers, is to guide all these proposed disciplines to streamline and facilitate human daily activities. This JAMRIS special issue offers a variety of ideas discussing different subjects having as common denominator important technical proposals to try to contribute to the evolutions in intelligent systems. The papers included in this session are selected from the International Conference on Modern Intelligent Systems Concepts, which was held in Morocco in December 2018 (MISC’2018). In this special issue, six papers are presented. In the first paper, K. Ounachad, M. Oualla, A. Sadiq and A. Souhar propose a novel method for human Face Emotion Recognition (FER). FER systems aim to recognize a face emotion in a dataset. The authors explain how they generate seven referential faces suitable for each kind of facial emotion based on perfect face ratios and five classical averages: arithmetic mean, geometric mean, harmonic mean, contraharmonic mean and quadratic mean. The authors’ idea is to extract perfect face ratios for emotional face and for each referential face as features, they calculate the distance between them using fuzzy hamming distance. To do so, they used the point landmarks in the face and sixteen features based on the distance between different portions of the perfect face. The authors concluded that their method gives the very promising results. In the second paper, A. Atassi and I. El Azami propose a hybrid deep learning system based on a combining between Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN). Firstly, they studied a system based on CNN. Then, they compare it with a second system that uses GRU. This hybrid system takes into account the positive points of the two previous systems, and takes the right choice of hyper-parameters recommended by the authors of both systems. In addition, the authors propose an approach to apply their system to the dataset of different languages, as used particularly in socials networks. As for the third paper, A. Riadsolh, I. Lasri and M. ElBelkacemi tackle the problem of measuring customer’ satisfaction in the banking sector using Naï�ve Bayes and Stanford NLP based on sentiment analysis. The authors propose and evaluate a real-time processing pipeline using the open-source tools in Microsoft Azure. They used Apache Kafka as data ingestion system, Apache Spark as a real-time data processing system, Apache Hbase for persistent distributed storage, and ElasticSearch and Kibana for visualization. In their paper, M. Es-Sadqi, A. Idrissi and A. Benhassine presented different methods for solving optimization problems of redundancy in multi-state systems. Those methods are based on genetic algorithm and constraint satisfaction methods including Forward Checking algorithm. The authors propose an extension of the Forward checking algorithm. This last gave the best results in the comparative case studied, while allowing also the verification of the results of the cost and availability obtained. The configurations obtained are simple and homogeneous and rarely varied. The authors states that the orientation of the selection according to the application is a strong point of those methods. Their work has also helped not only proving that constraints based research improves complexity but also allows finding various and high quality solutions. The authors M. Riyad, M. Khalil and A. Adib tackle the Convolutional Neural Networks (CNN) applied to Brain Computer Interface (BCI). A BCI is an instrument capable of commanding machine with brain signal. They propose an EEG classification system applied to BCI using the CNN for P300 problem. The system consists of three stages. The first stage is a Spatiotemporal convolutional layer which is a succession of temporal and spatial convolutions. The second stage contains 5 standard convolutional layers. Finally, a logistic regression is applied to classify the input EEG signal. The authors justify experimentally the necessity of a deep neural network rather than a shallow neural network. They state that their model allows the visualizing of the learned features and finally show that their approach is outperforming the existing models. The last and not the least paper authored by K. Mrhar, O. Douimi and M. Abik, deals with the important rate of dropout in MOOCs. This high dropout rate in MOOCs can be related to diverse aspects, like the motivation of the learners, their expectations and the lack of social interactions. To tackle this problem, the authors present a dropout

35


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

predictor model based on a neural network algorithm and sentiment analysis feature using the clickstream log and forum post data. Thus, they choose nine features that are the most correlated with learner’s dropout. Then, they construct an artificial neural network ANN model to predict if the learner will dropout the next week. According to their study, they conclude that the neural network is more preferment than other baseline algorithm, including KNN, SVM and Decision Tree. We consider that this special session presents some real advances in the field of Intelligent Systems. It contributes to its evolution, its development, its emergence, and particularly in its orientation and good practice in the service of the humanity. We would like to thank all the authors for their interactions and interesting contributions as well as all the reviewers for their time, advices and suggestions. In addition, we will not close this preface without warmly acknowledging the great efforts of the Editors, especially Professor Janusz Kacprzyk and the Managing Editor Katarzyna Rzeplinska-Rykala for their great help and support and to any person whom contribute to promote the International Journal JAMRIS. Editor: Abdellah Idrissi MISC’2018 General Chair Artificial Intelligence Group, Intelligent Processing Systems Team (IPSS), Computer Science Laboratory (LRI), Computer Science Department, Faculty of Sciences Mohammed V University in Rabat, Morocco email: idrissi@fsr.ac.ma, idrissi@ieee.org

36

Articles


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Facial Emotion Recognition Using Average Face Ratios and Fuzzy Hamming Distance Submitted: 26th June 2019; accepted: 25th March 2020

Khalid Ounachad, Mohamed Oualla, Abdelalim Sadiq, Abdelghani Souhar DOI: 10.14313/JAMRIS/4-2020/43 Abstract: Facial emotion recognition (FER) is an important topic in the fields of computer vision and artificial intelligence owing to its significant academic and commercial potential. Nowadays, emotional factors are important as classic functional aspects of customer purchasing behavior. Purchasing choices and decisions making are the result of a careful analysis of the product advantages and disadvantages and of affective and emotional aspects. This paper presents a novel method for human emotion classification and recognition. We generate seven referential faces suitable for each kind of facial emotion based on perfect face ratios and some classical averages. The basic idea is to extract perfect face ratios for emotional face and for each referential face as features and calculate the distance between them by using fuzzy hamming distance. To extract perfect face ratios, we use the point landmarks in the face then sixteen features will be extract. An experimental evaluation demonstrates the satisfactory performance of our approach on WSEFEP dataset. It can be applied with any existing facial emotion dataset. The proposed algorithm will be a competitor of the other proposed relative approaches. The recognition rate reaches more than 90%. Keywords: Average Face Ratios, Facial Emotion Recognition, Fuzzy Hamming distance, Perfect Face Ratios

1. Introduction People’s face is the most exposed part of body. Emotion is one of the most important attributes of face. Emotions are used in the process of non-verbal communication. They help us to understand the interactions of others. People, can immediately recognize an emotional state of a person. They are seven basics kinds of emotions: Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise [1]. The accurate emotion recognition enables you to perform emotion based, sophisticated tasks, such as advertising customers satisfaction analysis. Face Emotion Recognition systems (FER) aim to recognize a face emotion in a dataset of photos images. The task of emotion recognition is particularly difficult: There doesn’t exist a large dataset of training images and classifying emotion can be also difficult. In a face emotion recognition system,

the feature extraction is the core block. Matching is used to recognize the right kind of emotion using the precomputed features. The preprocessing step can boost the final performance of the system considerably. Feature extraction aims to transform the input face emotion image into a set of features. Matching is a general concept to describe the task of finding correspondences between two elements that has to be carried out for recognition or classification. It can attempt a simple comparison between the features extracted or more complex comparison systems by using some distances. The objective of this paper is to propose our method based on fuzzy hamming distance with average face ratios to recognizing a face emotion. This paper presents a novel method for human emotion classification and recognition. We generate seven referential faces suitable for each kind of facial emotion based on perfect face ratios and five classical averages: arithmetic mean, geometric mean, harmonic mean, contraharmonic mean and quadratic mean. The basic idea is to extract perfect face ratios for emotional face and for each referential face as features and calculate the distance between them by using fuzzy hamming distance. To extract perfect face ratios, we use the point landmarks in the face then sixteen features will be extract. To compute distances between the input facial emotion image and each image in a set of our referential faces, the hamming distance based on logical exclusive-or (XOR) function is used. The Hamming distance evaluates the number of bits that differ from two binary vectors. The Fuzzy Hamming distance [2] has been published to solve Hamming distance limitations on real numbers. This distance is used in this work because real values features are computed, not binary numbers. It ensures great performances in terms of speed and accuracy. This article is organized as follows. Section 2 related works and background information about four classical averages, perfect face ratios and fuzzy hamming distance, their formalisms and their definitions, section 3, the proposed architecture of our approach is explained in depth. Experimental results are given in section 4. In section 5, a conclusion is presented.

2. Related Works Over the last decades, there has been a wealth of research in hamming distance, as well as in its ap-

37


Journal of Automation, Mobile Robotics and Intelligent Systems

38

plication in computer vision [10] especially in Face Sketch recognition [3], in Banknote Validator [4] and in Content-Based Image Retrieval System (CBIRS) [5]: – In [3] we presented a new facial sketch recognition method based on fuzzy hamming distance with geometric relationships (face ratios). We proposed to simplify the procedure based on fuzzy hamming distance to use only vectors with simple real values of characteristics. An interesting contribution of [3] is that can accurately recognize the photo of the sketch’s face. The proposed algorithm will be a competitor of the other proposed relative approaches. The recognition rate reaches 100% especially in the CUHK dataset. – The Fuzzy Hamming Distance based approach, proposed in Banknote Validator [4] and combines the versatility of an automatic system with basic banknote specific information. Subsequently, the system can be updated to use in-depth security features provided by an expert. Fuzzy Hamming distance is used to measure the similarity between banknotes – The study suggested by M. Ionescu et al. [5, 6] presented initial results on a new approach to measure similarity between images using the notion of Fuzzy Hamming Distance (FHD) and its use to CBIR. The main advantage of the FHD is that the extent to which two different images are considered indeed different can be tuned to become more context dependent and to capture (implicit) semantic image information. The study shows good results using complete linkage agglomerative clustering. Fuzzy Hamming Distance proved to be efficient in a Content Based Image Retrieval system that output the closest images in the database given a query image. In [6], the obtained results for image retrieval based on using hamming distance showed effectiveness of their approach. – In [7], character recognition is carried out using template-matching scheme using correlation. In their approach, in addition to correlation, the hamming distance was applied in the scenarios where the correlation fails, the text image is segmented into lines and then characters by basic pre-processing techniques. After that, the character image is classified and converted into text by template matching scheme using either correlation or hamming distance. For test stage, four text images were created with total of 560 characters. Their system works successfully with average recognition rates of 72.39% and 94.90% for correlation only and correlation plus hamming distance, respectively. Contribution. This article proposes a new facial emotion recognition method based on fuzzy Hamming distance with face ratios and their geometric relationships. Our work is inspired in part by the recent Articles

VOLUME 14,

N° 4

2020

and successful method that has shown that relatively simple benchmark features could be used to perform well in a fuzzy Hamming distance-based face emotion recognition framework. In this paper, we simplify the procedure based on fuzzy hamming distance to use only vectors with simple real values of characteristics. A key technical contribution of our paper is a method for recognizing a facial emotion based on these simple features that can accurately recognize the right kind of the emotion’s face. This method achieves our goal by producing a recognition rate reaches more than 93%.

3. Background Information Symbolically, we have a data set containing the values x1, x2,..., xn. The arithmetic mean (Average) is equal to the sum of all numerical values of a set divided by the number of items in that set, then the average A is defined by the formula:

1 n s (1) = ∑ x n i =1 i N

= A

N = The number of items being averaged. S = The sum of the numbers being averaged. The geometric mean is the nth root of the product of n numbers, for a set of numbers, then the average G is defined by the formula:

= G

1 n

= x) (∏ n

i

i =1

n

x1 x2 ...x n (2)

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the given numbers, then the average H is defined by the formula: H=

n

n i =1

1 xi

(3)

The contraharmonic mean is the arithmetic mean of the squares of the values divided by the arithmetic mean of tge values, then the average C is defined by the formula:

x12 + x22 + …+ x n2 x12 + x22 + …+ x n2 n (4) = C = x1 + x2 …+ x n x1 + x2 …+ x n n

The quadratic mean is calculated as the square root of the mean of the of the given numbers, then the average Q is defined by the formula: x12 + x22 + …+ x n2 n

(5) Let is a finite field with q elements. The Hamming distance [9] d (x, y) between two vectors x, y ∈ F (n) is the number of coefficients in which they differ,

Q=

= d ( x , y)

Card {i / x i ≠ yi } (6)


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

The Degree of difference [9]: Given the real values x and y, the degree of difference between x and y, modulated by α > 0, denoted by dα (x, y) is defined as:

dα ( x , y )= 1 − e −α ( x − y ) (7) 2

The parameter α modulates the degree of difference in the sense that for the same value of ǀ x – y ǀ different values of will result in different values of dα (x, y) The Difference fuzzy set for two vectors [9] Let x and y be two-dimensional real vectors, and let xi, yi denote their corresponding ith component. The degree of difference between x and y along the component i, modulated by the parameter α is dα (xi, yi). The difference fuzzy set corresponding to dα (xi, yi) is Dα (xi, yi) with membership function: µ Dα ( x , y )( i ) : {1,…, n} → [0,1]

µ Dα ( x , y )( i ) = dα ( xi , yi )

(8)

μDα (x, y)(i) is the degree to which the vectors and are different along there component. The Cardinality of a fuzzy set [9]: Let: A ≡ ∑ n xi / µi i =1

denote the discrete fuzzy set A over the universe of discourse {x1, ... , xn} where μi = μA(xi) denotes the degree of membership for xi to A. The cardinality, CardA, of A is a fuzzy set:

n

CardA ≡ ∑

where: µCard = ( A)

i =1

i

µCard ( A) ( i )

(9)

µ(i ) ^ (1− µ(i +1) )

and where μ(i) denotes the ith largest value of μi, the values μ(0) = 1 and μ(n +1) = 0 are introduced for convenience, and ^ denotes the min operation. the non-fuzzy cardinality nCard (A), is:

({

})

= nCard ( A) card x ; µ( A) ( x ) > 0.5 (10)

Where for a set S, S denotes the closure of . The fuzzy Hamming distance [9] between x and y, denoted by FHDα (x, y) is the fuzzy cardinality of the difference fuzzy set, Dα (x, y):

µFHD( x , y ) ( .α ) : {1,…, n} → [0,1]

µFHD( x , y ) ( k ,α ) = µCardDα ( x , y ) ( k )

(11)

For k ∈ {1, ..., n} where n= ǀ SupportDα (x, y) ǀ for a given value k, μFHD  (x, y) (k, α) is the degree to which the vectors and are different on exactly k components (with the modulation constant α). Based on the rules used to choose the perfect face, based on the ideas of the famous Italian doctor one of the founder of criminology and based also on the ideal proportions of the face described by rhino plastic

N° 4

2020

doctors, the ratios of the distances taken as the scientific measuring guns of the beauty in a perfect face are calculated. the same distances are computed for each face sketch in dataset of the face sketches.

Fig. 1. Photos of Colgate, an 18-year-old Briton, won a beauty pageant where the contestants were judged on scientific criteria. The scientifcs rules used from left to right and from top to bottom: AB=1/5CD, AB=1/10CD, AB=1/5CD, AB=1/2CD, BC=CD=DE=1/3 BE; AB=1/2 BC, CD=1/2 AB,AB=5 eyes, nose surface<= 5% total of facial surface Fig. 1 illustrated the scientific rules used to select the beauty queen. The different distances used to calculate the proportions of the face are inspired by cannons in a perfect face. We define the following distances as: d0:

The face width

d2:

The length of the right eye

d1: d3: d4: d5: d6: d7: d8: d9:

d10: d11: d12: d13: d14: d15: d16: d17: d18: d19: d20:

The distance between the end of the right eye and the right end of the face The distance between the eyes The length of the left eye

The distance between the end of the left eye and the left end of the face The distance between the centers of the pupils The mouth length

The distance between eyes The nose width

The mouth length The jaw length

The distance between the eyes and the last point of the head The distance between the eyes and the chin

distance between the center of the forehead and the last point of the head

The distance between the center of the forehead and the nose The distance between nose and the chin

The distance between the eye and the eyebrow The length of one eye

The distance between the low lip and the chin The facial length

Articles

39


Journal of Automation, Mobile Robotics and Intelligent Systems

4. Approach Our system has two modes, in them, the input facial emotion photo and all faces emotion photos of the dataset are converted to a Gray level, they are resized and cropped into 200x250 pixels. These dimensions are chosen: It’s the proposed default choice of the datasets used and it’s also the dimensions used in related works. The Haar-cascades [11] are used for detecting the face in each photo image. The first step of the system is to pretrain and to normalize all the emotional photos in the offline phase. For that they have been transformed into a gray level image and are all cropped to 200×250 pixels. The same technique is thus used to online mode. After this step, we projected the famous algorithm of viola and jones to detect the faces of the images. The process that follows this second step is used to locate the 68_point_landmarks [8] in each face. These 68 points will be the parameter of our descriptor, which allows to extract an identity of each face via the calculation of the rations of the perfect face. A vector will be dedicated to group these harmonious distances in order. This vector represents a real proportionality with any other similar vector. The ratios of these distances in the vector have been stored as already detailed in this section. An overview of our proposed Fuzzy Hamming Distance based framework for face sketch recognition is shown in Fig. 2. In online process of the Face Emotion Recognition System, given a facial emotion, sixteen features are extracted. The series of these characteristics composes a vector of real values. This vector is considered as an identifier of the face from which the values have been extracted and calculated. In offline process of the Facial Emotion Recognition System, we grouped the dataset of facial emotion photos into seven data subsets based on the kind of emotion that it represents. The same distances (used in online process) are extracted and calculated for each facial emotion photo for each data subset in the facial emotion dataset. We generate seven referential faces suitable for each kind of facial emotion based on the classical means or averages: Arthritic mean, Contra harmonic mean, Geometric mean, Harmonic mean and quadratic mean. We use the landmarks points the generate all referential faces. After such a facial emotion photos and referential faces transformation to vectors of real values, Facial emotion recognition becomes straightforward. We can compare the facial emotion with the referential facials emotions using the Fuzzy Hamming Distance. In fact, we first compute the landmarks points for each facial emotion image and also for all referential faces. The face ratios for each one is then used as feature vectors for final classification and recognition. The detail algorithm can be summarized as follows. 1. 2. 40

To pretrain and to normalize all the facial emotion photos

Extract the landmarks points of the facial emotion photos according to the model “The 68_face_landmarks”

Articles

VOLUME 14,

N° 4

2020

3.

Calculate the distances d0 to d20 as defined in section 3. we assume that:

4.

Calculate S: the surface of the nose (Fig. 2)

Fig. 2. The principle used for the calculation of the surface of the nose S is based on the Manhattan distance used here to calculate the sides of the triangle encompassing the nose The Manhattan distance is used here to calculate the sides of the triangle encompassing the nose. = S

1 4

( a + b + c)( a + b − c)( −a + b + c)( a − b + c) (12)

we define: x15 = S.

5. Let∶ E= The tragus center of the ear I= Center inter eyebrow F= Center of the front C= The chin N= Dorsum of nose The following angles are calculated: θ1= FEI=

θ= 2 IEN = θ= 3 NEC = x16 =

arccos(EF2 + EI2 − FI2 ) (13) 2 × EF × EI arccos(EN2 + EI2 − NI2 ) (14) 2 × EN × EI

arccos(EN2 + EC2 − NC2 ) (15) 2 × EN × EC

θ1 + θ2 (16) θ3

Then:

arccos(EF2 + EI2 − FI2 ) arccos(EN2 + EI2 − NI2 ) + 2 × EF × EI 2 × EN × EI x16 = (17) 2 arccos(EN + EC2 − NC2 ) 2 × EN × EC


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Fig. 3. An overview of our proposed framework based on fuzzy hamming distance with face ratios for facial emotion recognition: Given an input facial emotion: Converted to gray level and cropped to 200x250 the facial emotion (200x250 is the standard size chosen in the relative database) , detected the face and localized 68 landmarks points. Sixteen features are calculated based on the portions of the perfect face and an array of them is generated. These steps are similar between the two modes of facial emotion recognition system FER. In offline process of FER, the system can associate the right kind emotion for each input image: The parameters of Fuzzy Hamming distance are the extracted features from the facial emotion image and the referential faces, on the other side of the system, the generated vector from the facial emotion is involved to recognize its kind of emotion. The output result of our FER is the probe kind of emotion

5. Experiments and Results To demonstrate the effectiveness of the proposed method, we proceeded for acquiring the dataset of emotional facial expression pictures. We used WSEFEP [11] dataset, the Warsaw Set of Emotional Facial Expression Pictures. The WSEFEP is comprised of those pictures that received the highest recognition marks (e.g., accuracy with intended display) from independent judges, totaling 210 high quality photographs of 30 individuals. We divide the dataset to seven data subsets suitable for each kind of basics kinds of emotions: Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise, Tab. 1 shown the percentage of each kind of emotions. The WSEFEP is used to training and to testing our approach. Fig. 5 clearly illustrates step by step the

Fear

Happiness

Neutral

Sadness

Surprise

%

Disgust

8. Recognition. The output is the kind of emotion of the min distance for the referential face parameter.

Tab.1. The percentage of each kind of emotions from WSEFEP dataset Anger

7. Compute FHD between Vi (the vector of the input facial emotion photo) and each vector Vf in the list of referential faces vectors; Their FHD is the same as that between the vector 0 and |x–y|. Therefore, the FHD between x and y is the cardinality of the fuzzy Return setD(|x–y|,0).

results obtained as we progress in the process of the framework already described previously. In the first step the extract emotional photos of the dataset. In the second: The cropped photos.

Emotion

6. Create the vector: V = xi,/i ∈ {1, ... , 16}

14.28

14.28

14.28

14.28

14.28

14.28

14.28

In the third: The extract 68 face landmarks points. In the fourth: calculate of the sixteen features and in the last step: the vector generated from the facial emotion photos, it is involved to recognized the kind of the facial emotion in the input photo. The output result of FER is the probe kind of emotion. We generate seven referential faces suitable for each kind of facial emotion based on perfect face ratios and five classical averages: arithmetic mean, geometric mean, harmonic mean, contraharmonic mean and quadratic mean. The result is 35 referential faces. Fig. 4 shown our referential face landmarks photo for each type of average. The images for the usual means are in bleu and they are in red for the author means. Using our new method, we realized five experiments according to the average used during the generation of the referential face. In each one of them, we Articles

41


N° 4

2020

Quadratic mean

Contrahar monic mean

VOLUME 14,

Harmonic mean

Emotion

Geometric mean

Means

Arithmetic mean

Journal of Automation, Mobile Robotics and Intelligent Systems

Anger

Disgust

Fear

Happiness

Neutral

Sadness

Surprise

Fig. 4. Our referentiels 68_face emotion_landmarks attributed to their kind of emotion and the kind of mean used to calculate its, each model

Fig. 4. Our referentiels 68_face emotion_landmarks attributed to "Nose" their kind of emotion the kind of mean used containbed: "Mouth", "Right_Eyebrow", "Left_Eyebrow", "Right_Eye","Left_Eye", and "Jaw".The images and for the usuel means are in bleu and to they are in red the auther means calculate its,foreach model containbed: "Mouth", "Right_Eyebrow", "Left_Eyebrow", "Right_Eye","Left_Eye", "Nose" and "Jaw".The images for the usuel means are in bleu and they are in red for the auther means

42

compare the input facial emotion image (features) with the seven referential faces, for that we used the Fuzzy Hamming distance. The output probe kind of emotion is that proper to the referential face having the minimal distance. Tab. 1 shows the cumulative match scores for our approach based on Fuzzy Hamming Distance (FHD) and Average Face Ratios. The result can be considered as a benchmark for the facial emotion recognition system to compare. All experArticles

imental results of tests are shown in Fig. 6. The cumulative match score is used to evaluate the performance of the algorithms. It measures the percentage of the probe emotion. Tab. 2 reports the facial emotion recognition accuracies using five different methods: arithmetic mean, geometric mean, harmonic mean, contra harmonic mean and quadratic mean. Our algorithm proves that, the recognition rate reaches more than 63% for angry


each type of average. The images for the usual means are in bleu and they are in red for the author means.

Journal of Automation, Mobile Robotics and Intelligent Systems

and fear emotions, more than 93% for happiness emotion, more than 73% for neutral emotion, 50% for sadness emotion and more than 86.67%. VOLUME 14, N° 4 2020

Fig.5. The process of our FSR, line1: extract of the dataset photos/input facial emotion. Line2: cropped emotional photos. Fig.5. The process FSR, line1: extract the 4: dataset photos/input facial emotion. Line2: emotionalfrom the Line3: extract theof68our face landmarks points.ofLine calculate of sixteen features and line5: thecropped vector generated photos. Line3: extract the 68itface landmarks points. Line calculate of sixteen and line5: vector generated facial emotion photos, is involved to recognize the4:kind of emotion. The features output result of FERthe is the probe kind of from the facial emotion photos, it is involved emotion to recognize theinput kind facial of emotion. of the photosThe output result of FER is the probe kind of emotion of the input facial photos

emotion, more than 46% for disgust and fear emotions, more than 93% for happiness emotion, more than 73% for neutral emotion, 50% for sadness emotion and more than 86.67%. Fig. 6 shows the cumulative match scores for our approach. The x-axis represents the kind of emotion and the y-axis represents the recognition rate. The results clearly demonstrate the superiority of our algorithm to recognize the happiness emotion but the last rate of recognition is that of sadness emotion. The quadratic mean helps to better recognize the happiness emotion but not able to better recognize the disgust and sadness emotions. The contraharmonic mean helps to better recognize the angry emotion. The geometric mean helps to better recognize disgust and fear emotions. The arithmetic mean helps to better recognize the sadness emotion and neutral and surprise emotions are recognized with the same rate for the all averages. The arithmetic mean helps to better recognize the sadness emotion and neutral and surprise emotions are recognized with the same rate for the all averages. We tested our approach with five different facial emotion recognition methods using classical averages and Fuzzy Hamming distance. In these methods, the cumulative match score proves the performance of the algorithms. The recognition rate reaches more than 93% in WSEFEP dataset. The algorithmic complexity of our approach is O(n).

Fig. 6. Comparison of cumulative match scores between our various facial emotion recognition methods using five classical averages

6. Conclusion This paper proposes a new geometrical method for facial emotion recognition. The methods are based on Fuzzy Hamming Distance and the Referential Face Ratios. We used sixteen features based on the distance between different portions of the perfect face. We tested our method on WSEFEP dataset and the results is very satisfactory. Our work is inspired by the recent successful methods that showed that relatively simArticles

43


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

ple features could be used to give good performance in a Fuzzy Hamming Distance-based framework. The future work will include a decrease in the number of the features used.

Disgust

Fear

Happiness

Neutral

Sadness

Surprise

Emotion

Anger

Tab. 2. The cumulative match scores for our approach based on Fuzzy Hamming Distance (FHD) and Average Face Ratios (AFR)

50.00

36.67

43.34

86.67

73.34

50.00

86.67

56.67

46.67

46.67

90.00

73.34

33.34

86.67

60.00

40.00

36.67

93.34

73.34

36.67

86.67

63.34

36.67

40.00

93.34

73.34

36.67

86.67

60.00

36.67

40.00

93.34

73.34

33.34

86.67

Averages Arithmetic Mean

Geometric Mean

Harmonic Mean

Contra_

Harmonic mean

Quadratic Mean

AUTHORS Khalid Ounachad* – Department of Informatics, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco, e-mail: khalid.ounachad@uit.ac.ma. Mohamed Oualla – Department of Informatics, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco, e-mail: mohamedoualla76@gmail.com.

Abdelalim Sadiq – Department of Informatics, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco, e-mail: a.sadiq@uit.ac.ma.

Abdelghani Souhar, Department of Informatics, Faculty of Sciences, Ibn Tofail University, Kenitra, Morocco, e-mail: houssouhar@gmail.com. *Corresponding author

References

44

[1] P. Ekman, W. V. Friesen, M. O’Sullivan, A. Chan, I. Diacoyanni-Tarlatzis, K. Heider, R. Krause, W. A. LeCompte, T. Pitcairn and P. E. Ricci-Bitti, “Universals and cultural differences in the judgments of facial expressions of emotion”, Journal of Personality and Social Psychology, vol. 53, no. 4, 1987, 712–717, DOI: 10.1037//0022-3514.53.4.712.  [2] A. Bookstein, S. T. Klein and T. Raita, “Fuzzy Hamming Distance: A New Dissimilarity Measure”. In: G. M. Landau and A. Amir (eds.), Combinatorial Pattern Matching, 2001, 86–97, DOI: 10.1007/3-540-48194-X_7. Articles

N° 4

2020

[3] K. Ounachad, A. Sadiq and A. Souhar, “Fuzzy Hamming Distance and Perfect Face Ratios Based Face Sketch Recognition”. In: 2018 IEEE 5th International Congress on Information Scien­ ce and Technology (CiSt), 2018, 317–322, DOI: 10.1109/CIST.2018.8596665.  [4] M. Ionescu and A. Ralescu, “Fuzzy Hamming Distance Based Banknote Validator”. In: The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ ‘05, 2005, 300–305, DOI: 10.1109/FUZZY.2005.1452410.  [5] M. Ionescu, “Image clustering for a fuzzy hamming distance based cbir system”. In: Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, 2005, 102–108.  [6] M. Ionescu and A. Ralescu, “Fuzzy hamming distance in a content-based image retrieval system”. In: 2004 IEEE International Conference on Fuzzy Systems, vol. 3, 2004, 1721–1726, DOI: 10.1109/FUZZY.2004.1375443.  [7] G. S. Shehu, A. M. Ashir and A. Eleyan, “Character recognition using correlation hamming distance”. In: 2015 23nd Signal Processing and Communications Applications Conference (SIU), 2015, 755–758, DOI: 10.1109/SIU.2015.7129937.  [8] S. Xiao, S. Yan, and A. A. Kassim, “Facial Landmark Detection via Progressive Initialization”. In: 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), 2016, 986– 993, DOI: 10.1109/ICCVW.2015.130.  [9] K. Ounachad, M. Oualla, A. Souhar and A. Sadiq, “Structured learning and prediction in face sketch gender classification and recognition”, International Journal of Computational Vision and Robotics, vol. 10, no. 6, 2020, DOI: 10.1504/IJCVR.2020.110645. [10] R. Chellappa, C. L. Wilson and S. Sirohey, “Human and machine recognition of faces: a survey”, Proceedings of the IEEE, vol. 83, no. 5, 1995, 705–741, DOI: 10.1109/5.381842. [11] M. Olszanowski, G. Pochwatko, K. Kuklinski, M. Scibor-Rylski, P. Lewinski and R. K. Ohme, “Warsaw set of emotional facial expression pictures: a validation study of facial display photographs”, Frontiers in Psychology, vol. 5, 2015, DOI: 10.3389/fpsyg.2014.01516.


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Towards a New Deep Learning Algorithm Based on GRU and CNN: NGRU Submitted: 26th June 2019; accepted: 25th March 2020

Abdelhamid Atassi, Ikram el Azami DOI: 10.14313/JAMRIS/4-2020/44 Abstract: This paper describes our new deep learning system based on a comparison between GRU and CNN. Initially we start with the first system which uses Convolutional Neural Network (CNN) which we will compare with the second system which uses Gated Recurrent Unit (GRU). And through this comparison we propose a new system based on the positive points of the two previous systems. Therefore, this new system will take the right choice of hyper-parameters recommended by the authors of both systems. At the final stage we propose a method to apply this new system to the dataset of different languages (used especially in socials networks). Keywords: Convolutional Neural Network, CNN, Gated Recurrent Unit, GRU, SemEval,Twitter, word2vec, Keras, TensorFlow, Adadelta, Adam, soft-max, deep learning

1. Introduction The vast use of the micro blog such as Twitter with more than 500 million tweets per day, make the sentiment analysis the one way to deal e-reputation and many problem should be taken into account, namely: 1) informal language used by the users; 2) users can switch between languages; 3) users can use multiple languages in one word or one sentence; 4) emoticons; 5) hashtags; 6) usernames used to call or notify other users. 7) URL; 8) Image; 9) Video. So this merger requires pre-treatment to normalize the text before starting the learning for both systems, the first use CNN model [1, 2] and the second use GRU model [3] for that we compare the two approaches to make sentiment analysis and we propose a new method “New GRU” (NGRU) to more efficient system applies to Arabic tweets. The introduction of the Arabic language in Sem­ Eval-2017, because of the large number of tweets in recent years, but in our case we have the Moroccan dialectal and not the official Arabic language, which leads us to collect and assign polarity manually.

2. Data Preparation Both systems use word embeddings but with difference, in first CNN system’s [4], so the word embeddings with 50M unsupervised to training word2vec [5,

6], and 10M supervised to train CNN, but in the second, the GRU with the first word embeddings with 20.5M to train and the second set was obtained by training on supervised data using another GRU model, and add method for splitting hashtags and insert them in the body, before forwarding data in this step [7]. Therefore the amount of training data is very important without forgetting the quality with the treatment of tweets before the passes to the learning phase, CNN has taken a step forward with the amount of data on the other hand there is an equality with the pre-treatment of the text e.g. suppression of the words without value, and the treatment of the hashtags with tokenizing. Fig. 1 shows the architecture of our deep learning model, our combined method the last two methods CNN and GRU, and we add three types of treatment on input tweets. Manual sentiment analysis, the user assigns the polarization manually to the data. Emoticons treatment, we apply a treatment on tweets contains emoticons for affect polarities, eg. :), :-),: =) à Happy emotions (+),:( , :-(, :=( à Sad emotions (-). The third type takes the results of the last two types and adds new tweets, to affect polarity by similarity.

Fig. 1. The architecture of the NGRU deep Learning model To collect more than 20k of tweet from different languages we need a lot of time and to assign polarity to these tweets we need more time. This is why we do not have the data today, but they are in preparation.

3. The Architecture of the GRU, CNN and NGRU Deep Learning Models The GRU layer is the core of the GRU network, it is more computational efficient than CNN models [8]

45


Journal of Automation, Mobile Robotics and Intelligent Systems

Which it can capture long semantic patterns without tuning the model parameter, unlike CNN models where the model depends on the length of the convolutional feature maps for capturing long patterns, Which it achieved superior performance to CNNs [7]. And the network architecture is composed of a word embeddings layer, a merge layer, dropout layers, a GRU layer, a hyperbolic tangent tang layer and a soft-max classification layer. The architecture of the CNN, it is fundamentally inspired by the architectures used in [1, 2] for performing various sentence classification tasks. This architecture requires large corpus. Therefore, different from [2] that presents an architecture with several layers of convolutional feature maps, as for the author [4] adopt a single level architecture, who has been shown in [1] to perform equally well. Our architecture Fig. 1 is inspired by GRU [7], with changes in the input data, we use the method Adadelta to the learning rate, and the method Adam for optimization of the weights and the application of NGRU will be on the tweets from different languages.

4. Network Parameters and Training

46

Before seeing the weights starts with the hyper parameters on the first position we find the CNN, the author use stochastic gradient descent to train the network and use back propagation algorithm to compute the gradients and then use Adadelta [9] to automatically tune learning rate, as for the convolutional feature map at 300, in the second system GRU use Adam [10] its new and computationally efficient for optimizing weights, for that all the experiments have been developed using Keras With the dimension of word embedding at 100. We use TensorFlow, an open source software library, used for research and production at Google, we used to implement our system as soon as possible and to focus on architecture and performance tuning and parameter optimization. The aim is not to implement both methods but to use it, because they already implement at the TensorFlow framework, by [9] Adadelta does not use a manual adjustment of a rate of learning and insensitive to hyperparameters, it’s for adjust the hyperparameters without the intervention of the user. Even though [11] demonstrates on his “An overview of gradient descent optimization”, that the different gradient descent algorithms are the same, Adadelta proves that it is faster than SGD, Momentum, NAG, Adagrad and RMSProp, see Fig. 2. Regarding the Adam (Adaptive Moment Estimation) method, we used this method to optimize weights, it combines the benefits of momentum with the benefits of RMSProp, and it differs mainly in two ways. First, the order moment moving average coefficient is decayed over time. Second, because the first and second order moment estimates are initialized to zero, some bias-correction is used to counteract the resulting bias towards zero. Articles

VOLUME 14,

N° 4

2020

Fig. 2. A comparison of the different methods, which shows the speed of the Adadelta method

5. Initializing the Model Parameters The CNN system will be initialized from word embeddings already trainer with unsupervised data, and the GRU system will be training with two word embeddings, the first with unsupervised data and the second obtained from another GRU. The NGRU requires three data sources with the use of both methods Adadelta to the learning rate, and the method Adam for optimization of the weights.

6. Evaluation The CNN system offers very best performance results in SemEval-2015 because he classification and demonstrates the amount of data the most important at adjustments at the level of hyper parameters and pre-treatment before starting training. The GRU system is the newest type of Deep Learning, despite these results are not at the hiring point because the results were weak and the hyper parameter was insufficient, Even if we used the Adam method for optimizing weights. Our model requires in a part a manual treatment, for this reason we have not yet results.

7. Conclusion We propose to use the best points of the two systems, with a feeding a large data for training our new model, we use the method Adadelta to the learning rate, and the method Adam for optimization of the weights and the application of this proposal will be on the tweets contains the mix of the Modern Standard Arabic Language and Dialectal Arabic Language.

AUTHORS Abdelhamid Atassi* – LaRI Laboratory, Faculty of Sciences, Ibn Tofail University, Kénitra, Morocco, email: abdelhamid.atassi@gmail.com.


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Ikram el Azami – LaRI Laboratory, Faculty of Scien­ ces, Ibn Tofail University, Kénitra, Morocco, e-mail: akram_elazami@yahoo.fr. *Corresponding author

References  [1] Y. Kim, “Convolutional Neural Networks for Sentence Classification”, http://arxiv.org/abs /1408.5882. Accessed on: 2021-02-03.  [2] N. Kalchbrenner, E. Grefenstette and P. Blunsom, “A Convolutional Neural Network for Modelling Sentences”. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, 655–665, DOI: 10.3115/v1/P14-1062.  [3] D. Bahdanau, K. Cho and Y. Bengio, “Neural Machine Translation by Jointly Learning to Align and Translate”, http://arxiv.org/ abs/1409.0473. Accessed on: 2021-02-03.  [4] A. Severyn and A. Moschitti, “UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification”. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), 2015, 464–469, DOI: 10.18653/v1/S15-2079.  [5] R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning”. In: Proceedings of the 25th international conference on Machine learning, 2008, 160–167, DOI: 10.1145/1390156.1390177.  [6] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality”, Advances in Neural Information Processing Systems, vol. 26, 2013, 3111–3119.  [7] M. Nabil, A. Atyia and M. Aly, “CUFE at SemEval-2016 Task 4: A Gated Recurrent Model for Sentiment Classification”. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, 52–57, DOI: 10.18653/v1/S16-1005.  [8] S. Lai, L. Xu, K. Liu and J. Zhao, “Recurrent convolutional neural networks for text classification”. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015, 2267–2273.  [9] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method”, http://arxiv.org/abs/ 1212. 5701. Accessed on: 2021-02-03. [10] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, http://arxiv.org/abs/ 1412.6980. Accessed on: 2021-02-03. [11] S. Ruder, “An overview of gradient descent optimization algorithms”. http://arxiv.org/abs/ 1609.04747. Accessed on: 2021-02-03.

Articles

47


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Some Efficient Algorithms to Deal With Redundancy Allocation Problems Submitted: 26th June 2019; accepted: 25th March 2020

Mustapha Es-Sadqi, Abdellah Idrissi, Ahlem Benhassine DOI: 10.14313/JAMRIS/4-2020/45 Abstract: In this paper, we will discuss some algorithms in order to better optimize the problems of redundancy allocation in multi-state systems. The goal is to find the optimal configuration of the system that maximizes the availability and minimizes the investment cost. The availability will be evaluated using the universal generating function. In first step, our contribution consists in improving the genetic algorithm. In a second step, in the framework of the Constraint Programming, we propose a new method of optimization based on the Forward Checking as solver. Finally, we used the top-k method in our choice that helps us to get the best k elements from all possible values with highest availability. In comparison with the chosen study, our methods yield better results that satisfy the constraints of the problem in a shorter time. Keywords: Redundancy Allocation Problem, Constraint Programming, Forward Checking, Optimization, Genetic Algorithm, Top_k

1. Introduction

48

We will treat in our study the electrical network whose structure is multi states. In addition, this structure is serie-parallel. The primary function of a power grid is to provide electricity to its customers at optimal operating costs with the assurance of quality and reasonable continuity at all times [1]. To plan an electrical system, it is imperative to identify the variables and the constraints to model it [2]. Because of their sensitivity to defects, these systems have increasing complexity. So it is necessary to improve their reliability and install redundant components in parallel [3]. As we know, the energy supply process is a high-level complex installation (production, transport distribution and consumption). The process requires several interconnected subsystems to achieve the high level objectives expected. The components used in each step often work in essentially different operating modes, characterized by varying loads and performance, demand management of a different nature, or different environmental conditions [4]. These modes result in different failure rates and life distributions. However, in terms of reliability analysis, this is a problem that is not officially solved without quantifying the effect of multiple loads on systemic reliability [5]. The architecture of these

systems is a series-parallel structure consisting of several components whose performance states vary from nominal to full failure, so the state of the global system is described by the state of its components and such systems are called (multi-state systems MSS). The most efficient tool for this study is the universal generating function UGF [6].

Fig. 1. Architecture of an electrical parallel-series system [6] In the next section, we will describe the mathematical formulation of the problem; our new methods (and the various parameters associated) are reported in Section III, followed by the experimental results and their comparison with those of the literature. Discussion and a conclusion complete the text.

2. Mathematical Formulation of the Problem In this paper, we will adopt the mathematical tool detailed by the authors in [7, 8, 9, 10]. A serial-parallel multi-state system is often made up of n subsystems in series. Each subsystem i (1 ≤ i ≤ n) contains ki parallel components with versions Vi. The version of each component is v such that (1 ≤ v ≤ Vi). All these components are characterized by their availability Aiv, their performance Giv, their cost Civ in the market. The structure of each subsystem i, is defined by the number Kiv of parallel components for each version [10].

Fig. 2. Series-parallel system composed of n subsystems each with ki components


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

2.1. Cost Since the cost function is linear, the global cost of the whole system is defined by the vector:

{ }

k i k ivi

 1≤ i≤ n 1 ≤ v ≤ Vi

with 

Given the set of vectors {ki =k1, k2, ..., kn}, the cost of the entire system is given by the formula:

C == ∑i

n

vi

1= v 1

K ivCiv (1)

2.2. Availability and Demand

We define the availability of a component as its ability to be operational at a given time t. It can be formulated mathematically by the following formula:

a (t) = p [s functioning at time t].

(2)

The system must provide a g0 demand predicted by a cumulative curve distributed often over four periods. so to meet this demand the availability of the system must be greater than or equal to this demand [11]. If a(t) represents the instantaneous availability of the system, g(t) its performance and g0 its required demand for a period tm, we can write using (2):

a = prob [g(t) ≥ g0]. (3)

The demand described above, is used to divide the operation period t in m time intervals tm (1 ≤ m ≤ m), the availability of the mss can be written as follows:

= A

1

∑ m=1Tm M

M

m =1

Prob (G ( t ) ≥ Gm ) Tm (4)

In engineering and for a given system S, the availability A is related to the index of load loss probability (LOLP) and is defined by [12,13]: LOLP= Prob(G < G0) This allows us to write:

LOLP = 1 – A.

(5)

LOLP represents the probability that the system cannot provide the given demand G0. In the case of components with total failure that we consider in this study, each component j is characterized by its nominal performance Gj and availability Aj. Thus we can write:

Pr=  Aj = 0  G j W (6)     Pr G j= 0= 1 − A j

2.3. Universal Generating Function UGF The authors in [10] have largely represented the UGF technique that we will describe below: UGF were introduced by I. Ushakov in 1986, and since then many scientists have proved the effectiveness of the method such as G. Levitin and A. Lisnianski [14–16].

N° 4

2020

Let n be the number of discrete and independent random variables X1, …, Xn and let us assume that each variable Xi can be represented by vectors xi and pi, such as:

(

)

 x i x i1 ,… , x ik i =  (7)  pi1 ,… , pik i ,s.t. pij = Pr X i == x ij with j 0,… pi =

(

)

{

}

The 𝒵 Transform of the variable X i is defined by the distribution function of the polynomial form:

ui ( z) = ∑ j piji Z iji (8) ki

X

i

Let ⊗f be a multiplicative operator, this operator acts differently on the function ui(z), on the one hand: iji ⊗f ∑ j p= = ∑j iji Z

ki

k1

X

i=1

1

k2

...∑

kn

1= j2 =2 jn 1

(∏

n

p Z

i =1 iji

f(x iji ,…,.x ijn )

) (9)

If the random variable is identical to a given performance as is the case in our study Xi = Gi , and according to (9), the resulting 𝒰-function from the combination of a set of m components is:

⊗f ( u1 ( z ) ,u2 ( z ) ,...,um ( z ) ) = == ∑j

k1

1

k2

1= j2 2

...∑ j =

km m

1

(∏

m

p Z

i =1 iji

f(Giji ,…,.Gijm )

)

(10)

Note that f(G1, G2, …, Gm) represents the equivalent productivity of m components. When these components are connected in series, the function takes the following form:

f (G1, G2, …, Gm) = min (G1, G2, …, Gm) (11)

f (G1 ,G2 ,...,G m ) = ∑ i=1 Gi (12)

And in the case where the m components are connected in parallel, the function becomes: m

To satisfy a requested performance state, we introduce a satisfaction operator defined as the following:

p if G i ≥ G0 (13) φ  pZGi Z− G0  = φ pZGi −G0 =  0 if G i < G0

(

(

)

)

This operator also checks the following property:

(∑

∑ φ (p Z )=

I I Gi −G0 i i 1 =i 1=

φ

pZ

i

Gi −G0

) (14)

The objective of this operator is to eliminate all the terms not satisfying the given performance demand: Gi < G0 .

2.4. Application of the UGF on the MSS

To assess the availability of a MSS, the 𝒮 operator is introduced for the series composition and the operator 𝒫 for the parallel one. These operators determine the polynomial U(z) for a group of elements. Articles

49


Journal of Automation, Mobile Robotics and Intelligent Systems

Parallel components. When the performance G is linked to productivity or the system capacity, the overall performance of the parallel components of the system is the sum of performances according to equation (12). Therefore the 𝒰-function upar(z) of a component i containing X i elements in parallel can be obtained by using the operator 𝒫:

{

}

= upar ( z)  u1 ( z) ,u2 ( z) …u Xi ( z)

Where:

 (G1 ,G2 ,…Gn ) = ∑ i =1 G i Xi

Thus, by using property’s 𝒰-functions of equations (8), (10) and (14) we find two parallel components: n

m

=  ( u1 ( z) ,u2 ( z) ) = {∑piZ , ∑q jZ }

n

m

ai

=i 1=j 1

bj

a +b = ∑∑piq jZ( i j ) =i 1=j 1

ai are bj physically interpreted as the successive performances of two elements and n and m are the numbers of levels of these performances. pi et qj represent the probabilities equilibrium of each level. We can see that the operator 𝒫 performs a simple multiplication of individual 𝒰-functions of each component: Upar ( z) = ∏ i=1ui ( z) (15) Xi

Series components. In the case of s elements in series, the functionality of such system is provided by the element with the lowest performance, it acts as a bottleneck for the system. In this case, the 𝒰-function user(z) is given using the operator 𝒮 which also performs simple multiplications of the individual 𝒰-functions of each component. Considering n cardinal levels, equations (9) and (10) allow to write the following equation: user(z) = 𝒮{u1(z),u2(z)}

VOLUME 14,

=  ( u1 ( z) ,u2 ( z) ) = {∑piZ , ∑q jZ } n

min a ,b = ∑∑piq jZ { i j }

50

m

=i 1=j 1

=i 1=j 1

bj

u*i ( z) =− 1 A iv + A iv ZGiv

So the 𝒰-function of a component i is:

i ui ( z) =u*i ( z)  =1 − A iv + A iv ZGiv 

ki

k

Therefore, for a system consisting of n subsystems in series, for each subsystem i and for each version of components v, the subsystem is modeled as the following:

( (

 A zGi1 i1   A zGi2  i2 uipar ( z)   =   A zGiVi  iVi  

(

) ))

+ (1 − A i1 ) + (1 − A i2 …

(

+ 1 − A iVi

  K i2     K iVi    

K i1

))

The whole system contains n subsystems, thus, introducing the operator 𝒮 we obtain:

1 2 n upar ( z) ,upar ( z) ,…upar ( z)  (17)

Pr ( G ≥ W ) = φ User ( z) Z− W0 (18)

Once user(z) is determined, we will need to assess the probability to meet the given demand required for w0, for this, using equation (15) we have:

(

)

2.5. Formulation of the Problem

m a i + bj  〈 n 〉z − w ∑ i 1= ∑ j 1 PQ i jz =  ⊗ n m min{ a i ,bj } − w  〈= z ∑ j 1 PQ i jz  ∑ i 1=

While Respecting the given demand Gmin{ai, bj} ≥ W0

(16)

Series-parallel systems. The 𝒰-function of the entire parallel-series system is given, using consecutively, operators 𝒫 and 𝒮. In addition, in the case of systems with total failure as cited in section 3(ii), Articles

u*i ( z) = (1 − A iv ) Z0 + A iv ZGiv

The problem can be summarized as the following: – Maximize the availability resulting from:

So for two elements in series and using equations (8), (9) and (14) we obtain the following: m

A iv (G G=  Pr= iv )  Pr G 0 1 A iv = = − ( ) 

The 𝒰-function of each component will contain only two terms, which gives:

𝒮(G1,G2,...Gn) = min{G1,G2,...Gn}

ai

2020

for each i component containing ki version of components vi, having the nominal performance Giv and availability Aiv according to equation (8) we can write:

with:

n

N° 4

– Minimize the Cost: min= ∑ i 1= ∑ V 1 K ivCiv n

Vi

3. Effective Genetic Algorithm EGA Genetic algorithms attempt to simulate the process of natural evolution following the Darwinian model in


Journal of Automation, Mobile Robotics and Intelligent Systems

a given environment. The individual is represented by a chromosome consisting of genes that contain the hereditary characteristics of the individual. The principles of selection, crossing, mutation are inspired by natural processes of the same name. It is associated with the value of the criterion to be optimized, its adaptation. We then generate iteratively populations of individuals on which we apply processes of selection, crossing and mutation. We start by generating a random population of individuals. To go from generation k to generation k + 1, the following operations are performed. At first, the population is reproduced by selection where the good individuals reproduce better than the bad ones. Then, a cross is applied to the pairs of individuals (the parents) of a certain proportion of the population to produce new ones (the children). A mutation operator is also applied to a certain proportion of the population. Finally, new individuals are evaluated and integrated into the population of the next generation2. Several criteria for stopping the algorithm are possible: the number of generations can be fixed a priori (constant time) or the algorithm can be stopped when the population does not evolve sufficiently quickly. Genetic Algorithm (EGA), on a MSS with the objectives of minimizing the cost and maximizing the availability of the given system. The algorithm for EGA is the following: EGA (imputs: file)  Number_generation←0; Population←Init_population(P[NUMINDIVIDUAL]);   Evalpopulation(P[Number_generation]);    While(Number_generation<MAXNUMBER);     PS ← selection(Population);      PC ← crossover(PS);    mutation(PC); Population←addnewindividualsIFDosentExist (Population,PC);  Population ← RemoveWeakestIndividuals  (Population);      Number_generation++;   End While  Return best; End

3.1. Encoding of Solutions Our solutions are in the form of a set of strings that represents the set of the sub-systems of our system, and each string is a set of integers where the length of the set is equal to the number of devices of the subsystem, and each integer reflects the id of the device.

3.2. Evaluation of Each Solution

Each time an individual is created, a fitness value is associated to it. This value is used by the selection process to favor the most suitable solutions, as it reflects the performances of such individual towards our problem.

VOLUME 14,

N° 4

2020

We proposed two fitness functions to evaluate the solutions, the first one included in the EGA1, and the second one in the EGA2. The evaluation function of the EGA1 to calculate the fitness, in order to satisfy the objectives of, maximizing availability and minimizing the cost is the following:

 A S × C0  CS 

Fitness_function1(System) = max  

where: Cs : is the cost of the individual component. As: is the availability of the individual component. C0: is the initial cost which is used as an upper bound as shown in the constraint

The evaluation function of the EGA2 to calculate the fitness, in order to satisfy the objectives of, maximizing availability and minimizing the cost where is the following: Fitness_function1 (System) = max(α × As + β × Cs)

where: α and β are respectively the given weights for availability, cost of the system and that verified : α + β=1.

3.3. Initial Population

The choice of the initial population is based on a totally random solutions design, meaning that each subsystem of the given system has initially a random number of devices where these random solutions respect the constraints: A ≥ A0; G ≥ G0 et C ≥ C0. G is the system’s performance, and G0 the energy demand.

3.4. Selection

The selection process defines how many times an individual is involved in the re-production process, the individuals with the best performances (best fitness values) are selected more often than the others and are used in the following step which is the reproduction. Selection has done using roulette wheel scheme.

3.5. Reproduction

The reproduction or variation helps to find better individuals, by producing individuals based on the best solutions of the current population. It is carried out using two main operators: Crossover and mutation. We first create the set A of the common genes in the both selected somes P1 and P2 , then we create the set B and C which contains respectively P1 – A and P2 – A. Then we add to Z1 and Z2 all elements of A, and we choose randomly in which set we will take the next element, if it’s the set B, we add the element to Z1 and we add the element to Z2, else we do the same with choosing the unchosen set. Articles

51


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

i. Variables: X = {X1, ... , Xn} = {subsystem1, ..., subsystemn , ji , Dev};

ii. Domains: Fig. 3. An example of the crossover operator After the crossover, the mutations must occur with a low probability Pm. For EGA, mutations occur with a probability is the length of the individual encoding. And it uses the Swap mutation, since, as mentioned before, changing the order of integers for a system does not change anything. Evaluation and selection for replacement.

3.6. Termination

The termination criterion we chose for EGA is the number of generations. Once this number is reached EGA return the best solution found.

4. Constraint Satisfaction Problem in the Four Algorithms: EFC1, EFC2, TopK-FC and GenGA-FC 4.1. Constraint Satisfaction Problem We consider here a CSP defined by a triplet (X, D, C) where X is a set of n variables (X1, X2, ..., Xn) their respective finite domains (D(X1), D(X2), ..., D(Xn)) and C a set of relations or constraints between these variables (a constraint on Xi1, Xi2, ..., Xi k is a subset of the Cartesian product D(X1) × D(X2) × ... × D(Xk)). For an optimization problem (here of maximizing a function), we also consider a cost function f and a constraint on this cost f (X1, X2, ... , Xn) ≤ C where C is a constant that the optimization strategy makes evolve. We express our problem in constraint logic programming (CLP) [17] and use the ECLiPS e system [18] which implements all the constraints classical, linear (# = , # ≤ , ...) and others (alldistinct, element, ...), and also allows to simply define new ones (direct operations on domains, precise control of coroutining, ...). The min_max predicate (minimize, maximize, ...) optimizes a linear expression by integrating the resolution goal of the problem (usually variable instantiation, labeling) within a Branch & Bound (i.e. search tree path with pruning by limitation of the cost function).

4.2. Modeling the Problem of RAP Applied to a Series-Parallel System as a VCSP

As explained above, we will propose a modelling for the redundancy allocation problem as a VCSP. For that, we have to define variables, domains of variables, constraints and objective functions.

52

Articles

D = {Dx1, ... , Dxn}; = Dsubsystem i = {(Ciji , Aiji , Giji )},

DDev = {Deviji } = {1, ... , Devi,max}; et Dji = {1, ..., ki};

where: – n is the number of subsystems, i is a number of a subsystem – ji is the number of devices for each subsystem i – {Ciji } is the set of costs that are floats, where Ciji is the cost of a device ji of a subsystem i – Aiji is the set of availability values that are floats, where Aiji is the availability of a device ji of a subsystem i – {Giji } is the set of performance values that are floats, where Giji is the performance of a device ji of a subsystem i – {Deviji } is the number of devices that can be chosen, where Deviji is the number of devices ji  , we can choose for a subsystem i iii. Constraints: –

ki i =1

Deviji ≤ Devi,max

this constraint assures that the choice of devices respects the given required number of devices for each subsystem of the system – UFG(SG  ≥  G0) ≥ A0 , this constraint assures that the solution S must be available for a performance that respects a given demand. This is calculated using the UGF.

iv. Objective functions: – Maximize the availability A, st A ≥ A0 under the constraint: G ≥ G0 – Minimize the cost C, st. C ≤ C0

The problem here is to conceive the configuration of a system by making a choice of components and by allocating an appropriate level of redundancy.

4.3. Solver Based on Forward Checking Algorithm

To solve our problem using the VCSP model and the UGF introduced above and compare it with the previous approaches, we adapted and extended the forward checking algorithm [19], which consists of constructing a solution, by considering assignments to variables in a particular order; an order where the constraints are satisfied. The vector in our case is a set of assignments of devices to subsystems. A solution vector is a set where the devices choice satisfied the constraints (Cost, Availability, and Performance). The forward checking examines partial solutions, which


Journal of Automation, Mobile Robotics and Intelligent Systems

are assignments to a sub-set of the variables, and try to extend those partial solutions until all variables are assigned; it prevents assignments that guarantee later failure. When we are considering a possible value vil for the current variable Vi it is sufficient to look for a zero in Domain i. Hence, we do not need to do the backwards consistency checks that are characteristic of backtracking. The price, of course, is that when we make a successful assignment to the current variable, we must check it against all outstanding values of the future variables, updating Domain as necessary. After initialization, the call of Forward Checking (i) will print all solutions. An assignment to fails if there is a domain wipe-out‖ (DWO), which means that we have discovered that every value of some future variable is inconsistent with our choices so far. In our case, the set s = s1, ... , Sn is the set of devices that leads to a solution that respects the objectives of performance and cost. The availability and cost of that solution will help us to compare our method with some of the most relieving works of the literature. In the EFC1, EFC2, topK-FC, we added a third constraint which is:

∑ ∑ n

I =1

ki ji

rDev i i,ji Ci,ji ≤ C0

This constraint allows optimizing each subsystem depending on its number of devices that constitute the system. It also allows respecting a given maximum value of cost for a system. Note that: ri =

Dev iji

ki ji

Dev iji

The forward checking method, which is in common between all our algorithms, can be resumed in the following algorithm: Forward-Checking (i)   % loop on each value of the domain of and checks the constraints in order to find a solution For each (Ciji,Aiji,Giji) ∊ Dsubsystemi I      si ← (Ciji,Aiji,Giji) ; 

If  Ci < 

ji

C0   then ri Devi , ji 

If I=n then    If (UGF(S1, ..., Sn)G > G0 ≥ A0 then       Print S1, ..., Sn;     Else    If Check-Forward(i) then     Forward-checking(i+1);     Restor(i);    End If   End If  End If End If End for each  End

VOLUME 14,

N° 4

2020

4.4. Extended Forword Checking 1: EFC1 As we added in the constraints, the partial cost which has to be taken in consideration, the number of possible values is still high, so we developed a specific method to respond to that need. The domain initialization method is detailed in the following algorithm: Domain-Initialization (Devi,ji, ki) Result : list;    i, i1, j, k: intenewger;    temp, temp1: string ;      For i1 from 0 to Devi,ji – 1 do       temp ← i1 ; // empty string       k ← 0;       While (k < ki) do        temp ← i1;        k++;        end while       result.add(temp);       j ← temp.length();        for i from i1 + 1 to ki – 1 do       temp1 ← temp.substring(0, j/i)        for z from j/i to temp.length() do         if z mod Devi,ji = 0 then         temp1 ← temp1 + 1         else temp1 ← temp1 + (z mod (Devi,ji –1))       End if      End for      Result.add(temp1);      temp ← temp1     End for    End for   Return result;  End

4.5. Extended Forword Checking 2: EFC2 The domain initialization method in this algorithm was implemented by getting all the possible values respecting the participative cost and returning domains chosen randomly from the previous list not exceeding a given limit. The domain initialization method is detailed in the following algorithm: Domain-Initialization (i, Devi,ji, ki, Ci, limit) % is the number of the subsystem % is the partial cost % is the limit for the returned list where the number of  possible value ≤ limit * Devi,ji  result : list;   result Possible-Values(0, result, ki, Ci, i);    if result.size() > limit * Devi,ji then     result ← make-to-limit(result, limit * Devi,ji);    end if   return result; End    Possible-Values (j, lst, ki, Ci, i)    newlist : list;    e, temp : string; Articles

53


Journal of Automation, Mobile Robotics and Intelligent Systems

k : integer;    cost : float;     if j = ki then      return lst;     end if      for each e ∊ lst do       for k from 0 to ki – 1        temp ← e + k        Cost ← cost(temp);         If  !newlist.contains(temp)&&cost ≤ Ci then          newlist.add(temp);         End if        End for       End for      Return Possible-values(j+1, newlist, ki, Ci, i);     End    Make-to-limit(list, limit)      %choose randomly limit number of the list and return the new list    End

4.6. Top-K Forword Checking 1: TopK-FC The result of the ranking queries is therefore a set of objects (n-tuples in the relational databases) sorted by score. Each object is represented by an identifier and a score to measure its relevance and similarity to the request. The result of a ranking query is usually all the top k objects, most often those with the highest scores, this set is called top-k, and the query is simply called top-k query. We find a detailed study on this algorithm in [20]. In this method, we used the top-K method in our choice which helps us to get the best K elements from all possible values with highest availability. The domain initialization method is detailed in the following algorithm:  Domain-Initialization (i, Devi,ji, ki, Ci, limit)  % is the number of the subsystem   % is the partial cost    % is the limit for the returned list where the number of possible value  result : list;   result ← Possible-Values (0, result, ki, Ci, i);    if result.size() > limiti * Devi,ji then    result ← make-to-limit (result, limiti, Devi,ji);   end if  return result;  end  Possible-values(j, lst, ki, Ci, i)   newlist :string ;   e,temp :string ;   k :integer;   cost :float;

54

Articles

VOLUME 14,

N° 4

2020

if j=ki then     return lst;    end if    for each e ∊ lst     for k from 0 to ki – 1 do      temp ← e+ k;      cost ← cost(temp);      if !newlist.contain(temp&&cost ≤ Ci then        newlist.add(temp);      end if     end for    end for   return Possible-values(j+1, newlist, ki, Ci, i);  End  Make-to-limit(list, limiti)    % choose limiti number of the list with the highest availability and return the new list  End

4.7. Generating GA Forword Checking 1: GenGA-FC In this case, we observed that subsystems can compensate each other for the cost, so it can be that there are subsystems which exceed their partial cost, and others with too low cost, and the sum coincides with the perfect constrained cost, this helps to get solutions with the greatest availability. So we used an algorithm inspired from the step of generation of the initial population in the genetic algorithm. We decomposed the method into two procedures, the first one, gets all possible values for each subsystem under the constraint of the cost C. And the second one, takes the result list of the previous method and for a certain number of iteration, we choose randomly in the domains generated, check if they respect the constraint, if it‘s true, we add them to the result list.

5. Experimental Results In this section, we will present and discuss the results found using each method described above.

5.1. Experimentation

We implemented the algorithms above using Java 8 and we run them on an i7 laptop. As inputs, we choose two tables given in the literature in order to be able to compare our results and to prove the performance of our proposed methods.

5.2. Comparisons

To demonstrate this efficiency, we will conduct a comparative study with the best results obtained in the literature [21–22].


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

6 7

1

2

3 4 5 1

2

3 4

2 3

4 5 6 7 8 9 1

2 3

4

0.535

0.977

100

0.535

0.977

100

0.535

0.977

100

0.535

0.995

100

0.205

0.996

100

0.189

0.997

92

0.091

0.997

53

0,056

0,998

28

0,042

0,971 0.973

21

7,525

100

3,590

40

4,720

0,971 0.976

60

2.42

20

0.977

0,180

115

0,978

0,150

91

0.978

0,160

0.983

100

0.121

0.981

72

0,102

0,971

72

0.096

0.983

72

0,071

0.982

55

0.049

0.977

25

0.044

0.984 0.983

25

0,986

128

0.490

60

8,25

0.987 0.981

100

0,475

51

Tab. 2. Parameters of power demand curve [21–22] Power Demand Level (%)

Duration (Hour) Probability

100

4203

0.480

80

788

0.09

50

1228 0.14

20

2536

0,290

We will present in table 3 the results of our experimentation.

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

Sub1

Sub2 Sub3 Sub4 Sub5

4,4,4,4, 4,4,4 1,4

7,7,7,9 4,4,4

3,4,4,6,7

5,5,5,5,5,5,4 1,4

7,7,7,8,8,9 3,4,4,4 4,4,6

3,3

2,2,3 7,7,7 4,4,4

7,7,6,6

5,5,5,5,5,3,3 3,3

9,6,6,5

Harmony Search HS [21]

13,75

0,992

Ant colony [22]

14,302

0,9906

15,87

0,992

EGA1

10,165

0,999154

EGA2

10,322

0,999116

EFC1

9,795

0,999111

EFC2

10,009

0,99915

TOP-K -FC

11,148

0,999154

Geneti Algorithm [21,22]

A

Optimale Topology 4,4,6,7

Cost C (m$)

5

0.977

Sub1

Method

4

TopoLogy

100

Demand (%)

0.535

99%

0.977

99%

3

100

99%

120

0.535

99%

0,590

0.977

99%

Power Units HT Transformer HT Line

Cost(Mln$)

0.980

1

MT Transformer

Performance (MW) ()

1

2

MT Lines

Availability

99%

Device Number

99%

Subsystems

2020

Tab. 3. Optimal solutions obtained by HS, AC, GA and our approaches EGA1, EGA2, EFC1, EFC2, topK-FC and GenGA-FC

99%

Tab. 1. Data of available different power components technologies [21–22]

N° 4

4,3,3

6,6,6,6

5,5,4,4,3,3,1 3,3

8,8,4,3 3,3,3

6,6,6,6

5,5,5,5,5,5,5 3,3

9,9,9,9 4,4,4

7,7,6,6

5,5,5,5,5,4,4 3,3

9,9,9,3 4,4,4

7,7,5,1

5,5,4,3,2,1,1 3,3

8,8,8, 4 4,4,3

Articles

55


Sub3

99%

Sub4 Sub5

5,5,5,5,5,5,4 3,3

9,9,8,8

Method GA-FC

A

Sub2

6,6,6,6

9,819

0,999111

4,4,4

5.3. Discussion As shown in Table 3, our methods are better regarding the availability and cost. The topologies obtained through the methods offer more flexibility to the system. In addition, these configurations have an identical choice of components in each subsystem which is a great advantage for designers of components.

6. Conclusion We presented in this paper different methods for solving optimization problems of redundancy in multi-state systems, those methods based on genetic algorithm and constraints satisfaction including the extensions of the Forward checking algorithm gave the best results in the comparative case we studied, it also allows the verification of the results of the cost and availability obtained. The configurations obtained are simple and homogeneous and rarely varied, the orientation of the selection according to the application is a strong point of those methods. Our work has also helped proving that constraints oriented research improves complexity on one side and allows finding various and high quality solutions.

AUTHORS Mustapha Es-Sadqi – Intelligent Processing Systems Team, Computer Science Laboratory (LRI), Faculty of Science, Mohammed V University in Rabat, Morocco. Abdellah Idrissi * – Intelligent Processing Systems Team, Computer Science Laboratory (LRI), Faculty of Science, Mohammed V University in Rabat, Morocco, e-mail: idrissi@fsr.ac.ma. Ahlem Benhassine – College of Computer Science and Engineering, University of Jeddah, Saudi Arabia. *Corresponding author

56

Articles

VOLUME 14,

N° 4

2020

References

Cost C (m$)

Sub1

Optimale Topology

TopoLogy

Demand (%)

Journal of Automation, Mobile Robotics and Intelligent Systems

[1] D. K. Sambariya and R. Prasad, “Design and performance analysis of robust conventional power system stabiliser using cuckoo search algorithm”, International Journal of Power and Energy Conversion, vol. 8, no. 3, 2017, DOI: 10.1504/IJPEC.2017.084914.  [2] A. H. Bhat, V. Muneer and A. Firdous, “Performance investigation of nine-level cascaded H-bridge inverter-based STATCOM for mitigation of various power quality problems”, International Journal of Industrial Electronics and Drives, vol. 3, no. 3, 2017, DOI: 10.1504/IJIED.2017.084101.  [3] K.-H. Chang and P.-Y. Kuo, “An efficient simulation optimization method for the generalized redundancy allocation problem”, European Journal of Operational Research, vol. 265, no. 3, 2018, 1094–1101, DOI: 10.1016/j.ejor.2017.08.049.  [4] M. Heydari and K. M. Sullivan, “An integrated approach to redundancy allocation and test planning for reliability growth”, Computers & Operations Research, vol. 92, 2018, 182–193, DOI: 10.1016/j.cor.2017.12.013.  [5] A. Shahid, “Power management, intelligent control and protection in micro-grids - a review”. In: 2017 International Smart Cities Conference (ISC2), 2017, 10.1109/ISC2.2017.8090807.  [6] I. A. Ushakov, “A universal generating function”, Soviet Journal of Computer and Systems Sciences, vol. 24, no. 5, 1986, 118–129.  [7] M. Es-Sadqi, A. Laghrissi and A. Idrissi, “Reducing carbon footprint in redundancy allocation problem applied to multi-state systems”. In: 2016 International Renewable and Sustainable Energy Conference (IRSEC), 2016, 1125–1129, DOI: 10.1109/IRSEC.2016.7984038.  [8] A. Amarir, M. Es-Sadqi and A. Idrissi, “An Effective Genetic Algorithm for Solving Series-Parallel Power System Problem”. In: Proceedings of the 2nd international Conference on Big Data, Cloud and Applications, 2017, 1–6, DOI: 10.1145/3090354.3090377.  [9] A. Laghrissi, M. Es-Sadqi and A. Idrissi, “Solving redundancy allocation problem applied to electrical systems using CSP and forward checking”. In: 2016 International Renewable and Sustainable Energy Conference (IRSEC), 2016, 1120–1124, DOI: 10.1109/IRSEC.2016.7984036. [10] M. Es-Sadqi, A. Idrissi and A. Amarir, “An Effective Oriented Genetic Algorithm for solving redundancy allocation problem in multi-state power systems”, Procedia Computer Science, vol. 127, 2018, 170–179, DOI: 10.1016/j.procs.2018.01.112.


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

[11] R. K. Saket, B. B. Sagar and G. Singh, “ATM reliability and risk assessment issues based on fraud, security and safety”, International Journal of Computer Aided Engineering and Technology, vol. 4, no. 3, 2012, DOI: 10.1504/IJCAET.2012.046637. [12] R. Kumar, “Redundancy effect on coal-fired power plant availability”, International Journal of Intelligent Enterprise, vol. 3, no. 1, 2015, DOI: 10.1504/IJIE.2015.073458. [13] R. N. Allan and R. Billinton, Reliability Evaluation of Power Systems, Springer, 1996, DOI: 10.1007/978-1-4899-1860-4. [14] S. Nikolovski, G. Slipac and E. Alibašić, “Generator Assessment of Hydro Power Station Adequacy after Reconstruction from Classical to HIS SF6 Substation”, International Journal of Electrical and Computer Engineering (IJECE), vol. 7, no. 2, 2017, 729–740, DOI: 10.11591/ijece.v7i2.pp729-740. [15] A. Lisnianski and G. Levitin, Multi-State System Reliability: Assessment, Optimization and Applications, World Scientific, 2003, 10.1142/5221. [16] G. Levitin, The Universal Generating Function in Reliability Analysis and Optimization, SpringerVerlag, 2005, DOI: 10.1007/1-84628-245-4. [17] P. van Hentenryck, Constraint satisfaction in logic programming, MIT Press, 1989. [18] M. Meier and J. Schimpf, “ECLiPSe, ECRC common logic programming system, User manual”. Technical Report TTI/3/93, 1993. [19] A. Darehshoor, L. Zadeh, L. Cerdà-Alabern and V. Pla, “Candidate selection algorithms in opportunistic routing based on distance progress”, International Journal of Ad Hoc and Ubi­ quitous Computing, vol. 20, no. 3, 2015, DOI: 10.1504/IJAHUC.2015.073168. [20] G. Li, X. Gao, M. Liao and B. Han, “An iterative algorithm to process the top-k query for the wireless sensor networks”, International Journal of Embedded Systems, vol. 7, no. 1, 2015, DOI: 10.1504/IJES.2015.066139. [21] A. Zeblah, E. Chatelet, M. El Samrout, F. Yalaoui and Y. M., “Series-parallel power system optimisation using a harmony search algorithm”, International Journal of Power and Energy Conversion, vol. 1, no. 1, 2009, 15–30, DOI: 10.1504/IJPEC.2009.023474. [22] M. Nourelfath and D. Ait-Kadi, “Optimization of series–parallel multi-state systems under maintenance policies”, Reliability Engineering & System Safety, vol. 92, no. 12, 2007, 1620– 1626, DOI: 10.1016/j.ress.2006.09.016.

Articles

57


Journal Journal of of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME N°44 2020 2020 VOLUME 14,14, N°

CONVOLUTIONAL NEURAL NETWORKS FOR P300 SIGNAL DETECTION APPLIED TO BRAIN COMPUTER INTERFACE Submitted: 26th June 2019; accepted: 25th March 2020

Mouad Riyad, Mohammed Khalil, Abdellah Adib DOI: 10.14313/JAMRIS/4‐2020/46 Abstract: A Brain‐Computer Interface (BCI) is an instrument capa‐ ble of commanding machine with brain signal. The mul‐ tiple types of signals allow designing many applications like the Oddball Paradigms with P300 signal. We propose an EEG classification system applied to BCI using the con‐ volutional neural network (ConvNet) for P300 problem. The system consists of three stages. The first stage is a Spatiotemporal convolutional layer which is a succession of temporal and spatial convolutions. The second stage contains 5 standard convolutional layers. Finally, a lo‐ gistic regression is applied to classify the input EEG sig‐ nal. The model includes Batch Normalization, Dropout, and Pooling. Also, It uses Exponential Linear Unit (ELU) function and L1‐L2 regularization to improve the lear‐ ning. For experiments, we use the database Dataset II of the BCI Competition III. As a result, we get an F1‐score of 53.26% which is higher than the BN3 model. Keywords: Deep Learning, Convolutional neural network, Brain Computer Interface, P300, Classification

1. Introduction

58

58

A Brain‑Computer Interface (BCI) is a mean of communication between the brain and the machine [16]. It consists of translating neural activity to in‑ struction using machine learning algorithms and neu‑ roscience. Such interface can give an improvement in the health �ield like neurological diseases detection or prosthesis control [2, 15]. Besides, many non‑medical applications are possible such as those applied in se‑ curity and educational �ields [10, 21]. Mainly, the Electroencephalography is used to re‑ cord the brain’s waves from the scalp, giving a multi‑ channel signal called Electroencephalogram (EEG) signal [23]. There are many types of EEG signals; each one has its own frequency band, shape, and a related zone in the brain. The most common signals are Mo‑ tor Imagery and P300 [9]. For the P300 ones, it is an Event‑Related Potential (ERP) signal occurred 300ms after a visual or an acoustic stimulus, that is characte‑ rized by a low signal‑to‑noise ratio. The BCI follows the same steps as those of clas‑ sic pattern recognition and consists in three impor‑ tant steps. Firstly, it begins with data acquisition using the EEG or other techniques. Secondly, a preproces‑ sing step where the data is �iltred and cleaned. Thi‑ rdly, the extraction of the most discriminating featu‑ res. Fourthly, the classi�ication step where a classi�ier is trained with the data to recognize the pattern. Fi‑

nally, the translation step where the decision of the classi�ier is translated into a command. In this work, we are only interested in the feature extraction and the classi�ication steps. Several techniques have been proposed in the lite‑ rature. For the feature extraction, the most used are the parametric modeling ones like the Autoregressive model [17], the Time‑Frequency domain transforma‑ tion like the Short‑Time Fourier Transform (STFT), Wavelet Transform (WT) and the Filter‑Bank Com‑ mon Space Pattern (FBCSP) [12]. For the classi�ica‑ tion stage, the Linear Discriminant Analysis (LDA), the Support Vector Machine (SVM) and the Neural Net‑ work (NN) are widely used in several schemes [13]. Deep Learning is a new approach mainly used in computer vision and natural language processing, it has also been exploited in BCI [20]. There are many advantages to use the Deep Learning to solve BCI pro‑ blems ; it allows to merge feature extraction and classi‑ �ication step in a same step. Also, it allows to visualize the learned feature to understand more about brain function [19]. In many studies, we observe that all of them begin with a consecutive convolution that are si‑ milar to a spatial �ilter and a temporal one; this block is called “Space‑Time Convolutional Layer” (STCL). [3] used for a P300 problem 4‑layers ConvNet, the two �irst were the STCL and the others are two dense lay‑ ers, The architecture outperforms the SVM based met‑ hod of [18]. In the same logic, [11] improved the per‑ formance of the previous model by using three dense layers, the recti�ied linear unit (ReL�) activation was applied. The Batch Normalization was used in STCL and the Dropout between the dense layers. Both archi‑ tectures used tanh and sigmoid functions which can cause a vanishing gradient problem and slow compu‑ tation [6]. Also, they are shallow and they do not use more convolutional layers for visualization of the hid‑ den layers in example [19]. In this paper, we suggest to create a ConvNet ar‑ chitecture, fully data‑driven, capable of understanding the P300 signal used in the P300 Speller. The choice is motivated by its ability to overcome the main pro‑ blems of the standard methods like the over�itting or the curse of dimensionality. Moreover, ConvNets are compatible with the nature of the EEG signals. Also, it needs a minimum of preprocessing and it does not need a handcrafted feature extraction step because of its high complexity (cost of processing, choose of the method,...). Hence, we create a ConvNet with 7 lay‑ ers unlike the existing architectures which are shallow [3,11]. The �irst stage is dedicated to a space‑time con‑


Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

Fig. 1. The P300 Speller matrix [3] volution, we choose to follow the suggestion of [9] for the design of the STCL which is simulating the FBCSP. The second stage is a 5 standard convolutional layers instead of the dense layers. The third stage is a logistic regression. Obviously, Dropout and Batch Normaliza‑ tion are used. Furthermore, our model is designed to avoid over�itting by the use of the �L� [4]. The paper is organized as the following: In section 2, we provide details about the P300 speller and the ConvNet. In section 3, we introduce the proposed mo‑ del and justify its hyperparameters. The obtained re‑ sults are discussed in section 4. The section 5 contains the conclusion.

2. Background 2.1. P300 Speller

The P300 Speller, based on P300 waves, is the neu‑ ral response for an event that manifests itself in the form of a positive peak of a voltage appearing 300 ms after the event essentially in the occipital and parietal lobes. The speller is based on the Oddball Paradigm, where a row or a column of a 6X6 character matrix as illustrated in Fig. 1, is randomly �lashed. When the sub‑ ject is aiming a character, a P300 wave is detected 300 ms after the �lashing of the column and the row cor‑ responding to the selected character. Hence, the pro‑ blem will be transposed to a binary one : detecting a P300 wave or not. In the binary case, the speller para‑ digm generates an unbalanced size of P300/non‑P300 signal. 2.2. ConvNet

The ConvNets are the hierarchical neural networks inspired by the architecture of the visual �ield to pro‑ cess the matrix‑like data such signal but essentially image and video [7]. It can learn from raw data the most imminent features automatically with multiple levels of abstraction. Also, it is characterized by the sparse interactions, parameter sharing, and equivari‑ ant representations. It is based on many layers such : ‑ Convolution layer: Applies the convolution to extract the essential features. ‑ Pooling layer: Down‑samples the data to reduce the hyper‑parameters in the network.

Fig. 2. The EEG montage for the P300 data set [3] ‑ Activation layer: Increases the non‑linearity of the network. ‑ Regularization layer: Penalizes any non‑pertinent information, prevents the network from over�itting.

‑ Fully connected layer: Classi�ies the extracted fea‑ ture in the previous stages.

3. Methods

3.1. Dataset and Preprocessing For the dataset, we choose the “BCI Competition III Dataset II”. The speller follows the paradigm descri‑ bed previously for 2 subject. The signal is composed of 64 channels schematized in Fig. 2, initially sampled at 240 Hz and �iltered with a bandpass �ilter between 0.1‑60 Hz. We adopt the same preprocessing protocol as in [11]. So, we extract the segment 0‑667 ms after the end of the �lashing and as result we get an ��� signal with a dimension of 64 × 160 where the �ist dimension repre‑ sents space and the second one the time (S ×T ). Then, we apply a 0.1‑20 Hz 8th‑order Bandpass Butterworth �ilter to the obtained ��� signals. To overcome the un‑ balanced size between the P300 and non‑P300 signals, we replicate the signal with an offset of {‑2;‑1;0;1;2} in the training dataset. We choose to use only the subject ’B’. For the validation dataset, we use a validation data‑ set with a proportion of 6.66% from the train dataset after the balancing. 3.2. Architecture

The proposed ConvNet will be composed of three parts: STCL, multiple convolutional layers and regres‑ sion. The �irst stage is STCL, which is composed of two convolution supposed to simulate a temporal �ilter (convolution across time axis) by using a convolution in the temporal axis (1×nt ) and a spatial �ilter by using a convolution across the spatial axis (C ×1) where C is the number of the channels. This approach is inspired by the Filter Bank Common Spatial Pattern (FBCSP) and used in many studies [3, 9, 19]. For [19], splitting the two convolutions is better than merge them into one convolution (e.g. ns × nt ) like proved [22]. Also, Articles

59

59


Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME 2020 VOLUME 14,14, N°N°44 2020

the Batch Normalization will be used after the convo‑ lutions. The linear activation is used for the both con‑ volutions because non‑linear activation does not give an improvement. The non‑linear activation function is used after the third Batch Normalization [9]. The L1 and L2 regularizations are used for the convolutions with Dropout at the end of the block. The stage follows the following equations : a(0) = BN (X)

aNt = BN (padsame (a0 ) ∗ WNt ) (t)

(t)

aNs = g(BN ((as ) ∗ WNs )) (s)

(s)

xN1 = r ∗ aNs (1)

(s)

(1)

(2)

(3)

(4)

where xiNi is the output of the layer i with Ni fea‑ ture map, g(x) the activation function, padsame (x) a function that applies padding to get the same dimen‑ sion in the output of a convolution, BN (x) the Batch (i) normalization function, and wNi the weight of layer i with Ni feature map. As input of the layer, we give a matrix with C × T . The �irst convolutional kernel has fs a size of 1 × where fs is the sampling frequency 2 as in [9]. The second convolutional kernel size has a shape of C × 1 where C is the number of electrodes. As output, we get a matrix of F × T where F is the number of feature map that we unify for the convolu‑ tional. The second stage is composed of multiple convolu‑ tional layers based on EEGNet and DeepConvNet. Un‑ like the BN3 , our con�iguration contains 5 layers which is deeper than the others. The convolutional kernels follow the same paradigm of DeepConvNet which gave good performance in our tests. Also, we follow the same disposition of the layers of the EEGNet. So, the actual stage begins with a convolutional layer where the kernel size follows the pattern like in Tab. 1 with no padding and we use a high number of feature map for better performance. Then, the con‑ volutional layers are followed by Batch Normalization and an activation with a non‑linear function. For redu‑ cing the number of parameters, a pooling is performed with the maxpooling which gave better scores. Finally, the layer is regularized by only a Dropout. The equations bellow describes a single layer where k represent it the number : (k) a Nk

= g(BN (padsame (x

k−1

)∗

xNk = r ∗ pool(aNk ) (k)

60

60

(k)

(k) WNt ))

(5) (6)

The third stage is a layer of regression where the features are classi�ied into their corresponding clas‑ ses. According to the nature of the problem, the cho‑ sen activation function is a sigmoid. We omit the fully Articles

connected layer to reduce any risk of over�itting un‑ like BN3 which can lead to over�itting and increase the complexity of the network. To be more speci�ic, the goal of the Batch Normali‑ zation is to reduce the internal covariate shift in the neural network. The data shift to the saturation of some activation function like tanh or sigmoid causing a decrease of the gradient. Such a phenomenon slows the training when the network is deep. The solution is to Normalize every Batch to avoid the saturation. The formula is as below : yi (j) − µ(j) (7) ŷi (j) = σ 2 (j) + ϵ

where µ is the mean of the batch, σ the variance and ϵ a small constant for numerical stability. Also, the dropout [1] is a technique allowing to de‑ crease the complex co‑adaptation between neurons on the training data. It adds a noise by giving to every neuron a probability of 1 − p to be set to 0 in the for‑ ward propagation. That means it gives an average neu‑ ral network from many possibilities.

3.3. Training Setting For the implementation, we use the framework Ke‑ ras and Tensor�low as backend and a N�IDIA K80 GPU. We choose to use the Adam optimizer with default setting, and a binary cross‑entropy loss function. We choose the hyperparameters as follows: ‑ We use Adam optimizer because it is the most used, also it is faster and need low computational power compared to the other techniques, we use the de‑ fault value for the parameters [8].

‑ For the parameter F introduced previously, we use a cross validation with the proportion described above, we compare between 4, 16, 32 and 64. ‑ We use the ELU function rather than the sigmoid or tanh function because it has proven its superiority in terms of accuracy and speed. ‑ For the dropout, we choose the value 0.5 as in [1].

‑ From our test, the epoch around 100 are the best choice, so we choose it. ‑ Batch size should be small and we choose 32 follo‑ wing the recommendation in [14].

‑ Glorot Uniform method is used to initialize the para‑ meters [5]. ‑ We omit the use of the biais like in [9] but just in con‑ volutional layers.

4. Result and Discussion

The goal of our work is to build a P300 speller sy‑ stem, which is capable of translating the P300 signals in a small amount of time, this is why we design our model �irst by its architecture based on convolutional neural network. To evaluate the performance of our method. We use Recognition, Precision, Recall and F1‑ score as metrics: Recognition =

TP + TN TP + TN + FP + FN

(8)


Journal of of Automation, Automation, Mobile Journal MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

Tab. 1. The details of the proposed model in Keras codification Block 0 1

2

3

4

5

6

7

Layers input Reshape permute BatchNormalization Conv2D BatchNormalization Conv2D BatchNormalization Activation Dropout Permute Conv2D BatchNormalization Activation MaxPooling2D Dropout Permute Conv2D

BatchNormalization Activation MaxPooling2D Dropout Permute Conv2D BatchNormalization Activation MaxPooling2D Dropout Permute Conv2D BatchNormalization Activation MaxPooling2D Dropout Permute Conv2D BatchNormalization Activation MaxPooling2D Dropout Permute Dense

# �ilters

size

F

(1, 120)

F

(64, 1)

F 1‑score = 2.

4 120 ∗ F 4∗F 64 ∗ F 2

Output (T,C)

Activation

Option

Linear

mode = same, l1 = l2 = 0.001

Linear

ELU F

Linear

12 ∗ F 2 4∗F

(F, 12)

ELU

(2, 2)

2∗F

(

F , 6) 2

8∗F

4∗F

ELU

Linear

12 ∗ F 2 16 ∗ F

(F, 3)

ELU

(2, 2)

8∗F

(2 ∗ F, 3)

Linear

48 ∗ F 2 32 ∗ F

ELU

(2, 2)

16 ∗ F

(4 ∗ F, 3)

192 ∗ F 2 32 ∗ F

(2, 2) 1

TP TP + FP

Recall.P recision Recall + P recision

mode = valid, l1 = l2 = 0.001

p = 0.5 (2, 1, 3) mode = valid

p = 0.5 (2, 1, 3)

Linear

6 ∗ F2

(2, 2)

TP Recall = TP + TN P recision =

# params

Linear

ELU

Sigmoid

mode = valid

p = 0.5 (2, 1, 3) mode = valid

p = 0.5 (2, 1, 3) mode = valid

p = 0.5 (2, 1, 3) mode = valid

p = 0.5 (2, 1, 3) (2, 1, 3)

Tab. 2. Cross validation result (9) (10) (11)

with true positive (TP), false positive (FP), true nega‑ tive (TN) and false negative (FN). We focus only on the F1‑score because [11] results suggest that it is the most convenient to be an indica‑ tor of performance. We use cross validation to choose

F 64 32 16 4

Recognition 0.9029 0.9035 0.7882 0.5211

Precision 0.8751 0.8744 0.9209 0.9500

Recall 0.94 0.9423 0.6305 0.044

F1‑score 0.9064 0.9071 0.7486 0.085

the right value of F . The obtained results are presen‑ ted in Tab. 2. We observe that the best recognition rate, recall and F1‑score are obtained by the value 32. Also, the best precision is for value 4. Moreover, the other values did not get important result. So, the value 32 is Articles

61

61


Journal Journal of of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

VOLUME N°44 2020 2020 VOLUME 14,14, N°

Tab. 3. Compared metrics for multiple methods Architecture MCNN‑3 [3] BN3 [11] Proposed with Max pooling Proposed with Average pooling

TP 2077 2084 1996 1952

TN 11997 12139 12501 12343

FP 3003 2861 2499 2657

FN 923 916 1004 1048

Recognition 0.7819 0.7902 0.7992 0.7941

Tab. 4. Compared metrics for different deep of the second part of the proposed method Number of Layers 0 1 2 3 4 5

TP 1960 1797 1911 1840 1595 1996

TN 12273 12674 12398 12622 13257 12501

FP 2727 2326 2602 2378 1743 2499

the validated. Tab. 3 shows the results for the test dataset and we compare with the BN3 one [11] and MCNN‑3 of [3]. Also, we train two versions of our model one with Max‑ pooling, and the other with AveragePooling. We ob‑ serve that our model with MaxPooling outperforms the BN3 model. Our model has higher recognition, the precision and the F1‑score. Beside, the main metrics is higher with 0.008 for our model that make our model more satisfying than the BN3 . This paper is based on the argument that a deeper network is better than a shallow one. It takes origin from computer vision improvement on the last years. To demonstrate the effect of the deep on the BCI case, we train multiple versions of our model with different number of layer in the second part. We add progressi‑ vely one layer following the architecture of the propo‑ sed model and we test all the generated model on the test dataset. The result are in Tab. 4, as expected the deeper model get better performance for the F1‑score and the recall, the 4 layers model get the best recogni‑ tion and precision. There are many factor behind those result. First, our model is deeper than the others by using more lay‑ ers which implies a great number of parameters. Se‑ condly, we omit the use of fully connected layer and re‑ place them by multiple convolutional layers allowing them to learn more accurate features and decrease the risk of over�itting and complexity of the computation. For the pooling, the MaxPooling seems to be the ap‑ propriate method for our model compared with Aver‑ agePooling. But, our model contains more parameters than the other making the learning time longer.Also, it has a high loss at the end. Furthermore, the architec‑ ture is created for of�line classi�ication.

5. Conclusion

62

62

We presented a new ConvNet Architecture for a P300 speller application by the translation of the P300 signal. The proposed model is based on EEGNet and Deep ConvNet. The model contains 7 convolutional layers with batch normalization and dropout that im‑ Articles

FN 1040 1203 1089 1160 1405 1004

Recognition 0.7907 0.8039 0.7949 0.8034 0.8251 0.7992

Precision 0.409 0.4214 0.4440 0.4235

Precision 0.4181 0.4358 0.4234 0.4362 0.4778 0.4440

Recall 0.6533 0.599 0.637 0.6133 0.5316 0.6653

Recall 0.692 0.6947 0.6653 0.6506

F1‑score 0.514 0.5246 0.5326 0.5130

F1‑score 0.5099 0.5045 0.5087 0.5098 0.5033 0.5326

proved the performance. The model is composed of an STCL simulating the FBCSP which is one of the main traditional techni�ue. Then, �ive standard convolutio‑ nal layers follow the same logic of DeepConvNet. Fi‑ nally, a logistic regression is applied to get �inal deci‑ sion.The model is capable of outperforming the exis‑ ting models which lead to a new important architec‑ ture. Also, the design allows the visualizing of the lear‑ ned features. Moreover, we tried to justify our hyper‑ parameter based on previews work. Furthermore, we experimentally justify the necessity of a deep neural network rather than a shallow neural network. Furt‑ her works will focus on designing lighter architecture with better performance. Also, we will introduce a hy‑ brid neural network based on ConvNet and recurrent neural network.

AUTHORS

Mouad Riyad∗ – Hassan II University Of Casablanca, LIM@II‑FSTM, B.P. 146, Mohammedia 20650, Mo‑ rocco, e‑mail: riyadmouad1@gmail.com. Mohammed Khalil – Hassan II University Of Casa‑ blanca, LIM@II‑FSTM, B.P. 146, Mohammedia 20650, Morocco, e‑mail: mohammed.khalil@univh2c.ma. Abdellah Adib – Hassan II University Of Casablanca, LIM@II‑FSTM, B.P. 146, Mohammedia 20650, Mo‑ rocco, e‑mail: abdellah.adib@fstm.ac.ma. ∗

Corresponding author

REFERENCES [1] P. Baldi and P. Sadowski, “Understanding dro‑ pout”. In: Proceedings of the 26th International Conference on Neural Information Processing Sys‑ tems ‑ Volume 2, Red Hook, NY, USA, 2013, 2814– 2822. [2] S. Biswal, J. Kulas, H. Sun, B. Goparaju, M. B. Wes‑ tover, M. T. Bianchi, and J. Sun, “SLEEPNET: Auto‑ mated Sleep Staging System via Deep Learning”, arXiv:1707.08262 [cs], 2017.


Journal of Journal of Automation, Automation,Mobile MobileRobotics Roboticsand andIntelligent IntelligentSystems Systems

[3] H. Cecotti and A. Graser, “Convolutional Neu‑ ral Networks for P300 Detection with Ap‑ plication to Brain‑Computer Interfaces”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 3, 2011, 433–445, 10.1109/TPAMI.2010.125.

[4] D.‑A. Clevert, T. Unterthiner, and S. Hochrei‑ ter, “Fast and Accurate Deep Network Le‑ arning by Exponential Linear Units (ELUs)”, arXiv:1511.07289 [cs], 2016.

[5] X. Glorot and Y. Bengio, “Understanding the dif‑ �iculty of training deep feedforward neural net‑ works”. In: Proceedings of the Thirteenth Inter‑ national Conference on Arti�icial Intelligence and Statistics, 2010, 249–256. [6] X. Glorot, A. Bordes, and Y. Bengio, “Deep Sparse Recti�ier Neural Networks”. In: Proceedings of the �ourteenth International Conference on Arti�icial Intelligence and Statistics, 2011, 315–323.

[7] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, The MIT Press: Cambridge, Massachu‑ setts, 2016. [8] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization”, arXiv:1412.6980 [cs], 2017.

[9] V. J. Lawhern, A. J. Solon, N. R. Waytowich, S. M. Gordon, C. P. Hung, and B. J. Lance, “EEG‑ Net: a compact convolutional neural network for EEG‑based brain–computer interfaces”, Jour‑ nal of Neural Engineering, vol. 15, no. 5, 2018, 10.1088/1741‑2552/aace8c.

[10] L.‑D. Liao, C.‑Y. Chen, I.‑J. Wang, S.‑F. Chen, S.‑ Y. Li, B.‑W. Chen, J.‑Y. Chang, and C.‑T. Lin, “Ga‑ ming control using a wearable and wireless EEG‑ based brain‑computer interface device with no‑ vel dry foam‑based sensors”, Journal of NeuroEn‑ gineering and Rehabilitation, vol. 9, no. 1, 2012, 10.1186/1743‑0003‑9‑5.

[11] M. Liu, W. Wu, Z. Gu, Z. Yu, F. Qi, and Y. Li, “Deep learning based on Batch Normalization for P300 signal detection”, Neurocomputing, vol. 275, 2018, 288–297, 10.1016/j.neucom.2017.08.039. [12] F. Lotte, L. Bougrain, A. Cichocki, M. Clerc, M. Con‑ gedo, A. Rakotomamonjy, and F. Yger, “A review of classi�ication algorithms for EEG‑based brain– computer interfaces: a 10 year update”, Jour‑ nal of Neural Engineering, vol. 15, no. 3, 2018, 10.1088/1741‑2552/aab2f2.

VOLUME 2020 VOLUME 14,14, N°N° 4 4 2020

works for epileptic seizure prediction from intra‑ cranial EEG”. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, 2008, 244–249, 10.1109/MLSP.2008.4685487.

[16] L. F. Nicolas‑Alonso and J. Gomez‑Gil, “Brain Computer Interfaces, a Review”, Sensors, vol. 12, no. 2, 2012, 1211–1279, 10.3390/s120201211.

[17] J. Pardey, S. Roberts, and L. Tarassenko, “A review of parametric modelling techniques for EEG ana‑ lysis”, Medical Engineering & Physics, vol. 18, no. 1, 1996, 2–11, 10.1016/1350‑4533(95)00024‑ 0.

[18] A. Rakotomamonjy and V. Guigue, “BCI compe‑ tition III: dataset II‑ ensemble of SVMs for BCI P300 speller”, IEEE transactions on bio‑medical engineering, vol. 55, no. 3, 2008, 1147–1154, 10.1109/TBME.2008.915728.

[19] R. T. Schirrmeister, J. T. Springenberg, L. D. J. Fie‑ derer, M. Glasstetter, K. Eggensperger, M. Tan‑ germann, F. Hutter, W. Burgard, and T. Ball, “Deep learning with convolutional neural net‑ works for EEG decoding and visualization”, Hu‑ man Brain Mapping, vol. 38, no. 11, 2017, 5391– 5420, https://doi.org/10.1002/hbm.23730. [20] J. Schmidhuber, “Deep learning in neural net‑ works: An overview”, Neural Networks, vol. 61, 2015, 85–117, 10.1016/j.neunet.2014.09.003.

[21] J. Sohankar, K. Sadeghi, A. Banerjee, and S. K. Gupta, “E‑BIAS: A Pervasive EEG‑Based Iden‑ ti�ication and Authentication System”. In: Proceedings of the 11th ACM Symposium on QoS and Security for Wireless and Mobile Net‑ works, New York, NY, USA, 2015, 165–172, 10.1145/2815317.2815341.

[22] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architec‑ ture for Computer Vision”, arXiv:1512.00567 [cs], 2015. [23] M. Teplan, “Fundamentals of EEG measurement”, Measurement science review, vol. 2, no. 2, 2002, 1–11.

[13] F. Lotte, M. Congedo, A. Lé cuyer, F. Lamarche, and B. Arnaldi, “A review of classi�ication algorithms for EEG‑based brain–computer interfaces”, Jour‑ nal of Neural Engineering, vol. 4, no. 2, 2007, R1– R13, 10.1088/1741‑2560/4/2/R01. [14] D. Masters and C. Luschi, “Revisiting Small Batch Training for Deep Neural Networks”, arXiv:1804.07612 [cs], 2018. [15] P. W. Mirowski, Y. LeCun, D. Madhavan, and R. Ku‑ zniecky, “Comparing SVM and convolutional net‑

Articles

63

63


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Cloud-Based Sentiment Analysis for Measuring Customer Satisfaction in the Moroccan Banking Sector Using Naïve Bayes and Stanford NLP Submitted: 26th June 2019; accepted: 25th March 2020

Anouar Riadsolh, Imane Lasri, Mourad ElBelkacemi DOI: 10.14313/JAMRIS/4-2020/47 Abstract: In a world where every day we produce 2.5 quintillion bytes of data, sentiment analysis has been a key for making sense of that data. However, to process huge text data in real-time requires building a data processing pipeline in order to minimize the latency to process data streams. In this paper, we explain and evaluate our proposed real-time customer’ sentiment analysis pipeline on the Moroccan banking sector through data from the web and social network using open-source big data tools such as data ingestion using Apache Kafka, In-memory data processing using Apache Spark, Apache HBase for storing tweets and the satisfaction indicator, and ElasticSearch and Kibana for visualization then NodeJS for building a web application. The performance evaluation of Naïve Bayesian model show that for French Tweets the accuracy has reached 76.19% while for English Tweets the result was unsatisfactory and the resulting accuracy is 56%. To remedy this problem, we used the Stanford core NLP which, for English Tweets, reaches a precision of 80.7%. Keywords: Big Data Processing; Apache Spark; Apache Kafka; Real-time Text Processing; Sentiment Analysis; Stanford core NLP; Naïve Bayes classifier

1. Introduction

64

Sentiment analysis, or opinion mining, has obtained much attention in recent years with the advent of various social networking like Twitter which is a popular microblogging service where users gives their opinions. The aim of sentiment analysis is to define automatic tools able to extract subjective information from texts in natural language, such as opinions and sentiments, to create a structured and actionable knowledge to be used by either a decision support system or a decision maker [1, 3]. With the rise of Big Data and data lake, machine learning is taking on a whole new dimension as businesses now have access to a huge amount of different variables that, when correlated, can be an extremely powerful decision asset. That why the Moroccan banking sectors are particularly interested in using various machine learning algorithms for the sentiment analysis and it wants to go beyond the statistical approach to predict changes in financial markets through the analysis of Tweets, predicting that a cus-

tomer will leave his bank, detecting fraud, improving customer’ satisfaction. To achieve these goals, we propose two methods to analyze the great numbers of data available on Twitter and distribute them into three categories (positive, negative or neutral). This is very useful because it allows banks to improve services of products, marketing, customer service, business performance, risk management and to calculate indicators (KPIs). The first method is based on a Stanford CoreNLP API [2] for English Tweets and the second method is based on Naï�ve Bayes classifier [4] for French Tweet. In this paper, we have proposed and evaluated a real-time processing pipeline using the open-source tools in Microsoft Azure that can capture a large amount of data efficiently. Our suggested system uses Apache Kafka [5] as data ingestion system, Apache Spark [6] as a real-time data processing system, Apache Hbase [7] for persistent distributed storage, and ElasticSearch and Kibana for visualization. Traditionally, Apache Kafka accepts incoming data and sends it to Apache Kafka broker rapidly. Then Apache Spark consumes the data and performs predictive analytics using Spark’s MLib module and Stanford Core NLP. Finally, we use Apache Hbase to store the data. We developed a simple visualization component to analyze the results using Kibana and ElasticSearch. The rest of the paper has six more sections and ordered as follows. In Section 2 we introduce related work on sentiment classification. And in section 3 we describe the data preprocessing steps. In Section 4 we explain the system components to build the real-time sentiment analysis pipeline. Section 5 presents the two methods to analyze sentiment. Finally, in Section 6 and 7 we discuss the results and future extensions of our work.

2. Related Work In recent years, the growth of social media networks has made sentiment analysis a popular area of research. Accordingly, several recent articles have focused on the analysis of the media sentiment for a variety of purposes such as the public opinion and prediction of election results. Shoro et al. [8] extract and process some meaningful information from some sample big data source, such as Twitter, using Apache Spark streaming. Bollen


Journal of Automation, Mobile Robotics and Intelligent Systems

et al. [9] analyzed Tweets to find correlation between overall public mood and social, economic and other major events. The authors extracted six mood states (anger, tension, depression, vigor, confusion, fatigue) from the Tweets then they compared the results to a record of popular events gathered from media and sources. In [10], the authors used two supervised machine learning algorithms: K-Nearest Neighbour(K-NN) and Naï�ve Bayes to facilitate the quick discovery of sentimental contents of movie reviews and hotel reviews on the web. Our work differs in the scale of the Twitter dataset size as well as the number of banks we analyze. Our objectif, in this paper, is to build a real-time sentiment analysis pipeline on Microsoft Azure using opensource big data tools. In addition, we look at customer’ sentiment towards banks. Finally, we derive some meaningful insights from this analysis and demonstrate the value of such analysis in order to measure the customer’ satisfaction and to assess the impact of banks on social media.

3. Real-Time Sentiment Analysis Pipeline The customer’ sentiment analysis on the Moroccan banking sector requires a collection of data representing the Moroccan citizens’ opinion of the services offered by the banks. For this, we extracted the data from Twitter, in real-time through Apache Kafka then processed by Spark Streaming, the result is then stored on Apache Hbase in order to visualize the indicators and measure customer’ satisfaction on Kibana through the Elasticsearch search engine. Figure 1 shows the proposed real-time processing pipeline architecture :

Fig. 1. Proposed data analytics pipeline architecture In the rest of this section, we briefly explain each of these components in turn.

3.1. Twitter Data Extraction

With the rise of the internet and mobile telecommunications the need and importance of extracting data from the web is becoming increasingly loud and clear. In fact, online social networks attract the most users, though users of these new technologies provide their data through multiple sources. Of those networks, Twitter makes these data relatively easy to obtain by providing users’ data through two APIs, the streaming API and the REST API, that we used in this project. And it allows us to retrieve data from each Tweet as a JavaScript Object Notation (JSON) document via OAuth which is required for API authentication.

VOLUME 14,

N° 4

2020

Nonetheless, because of Twitter’s popularity, there are a large number of software libraries to access Twitter, including in Python, R and Spark.

3.2. Data Ingestion Using Apache Kafka

Nowadays businesses collect large volumes of structured and unstructured data, in order to discover real-time insights that inform decision making and support digital transformation [3]. Data ingestion is the process of importing, loading, processing data and storing them in a database. It requires fetching data from a variety of sources, such as social media sites, web logs, RDBMS and streaming data. There are many tools available today for data ingestion such as Apache Kafka, Apache Flume [11] and RabbitMQ [12]. We used, in this paper, Apache Kafka as a data ingestion system. It is a distributed publish-subscribe messaging system where multiple producers publish the message on a topic and multiple consumers consume the messages by subscribing to that topic. Kafka uses ZooKeeper [13] internally or externally to do leadership election of Kafka broker, topic partition pairs, managing service discovery for Kafka brokers. The terms mentioned below are used in Apache Kafka: • Topic: it identifies a class of messages. • Partition: it is used to scale a topic across many servers. • Producer: pushes messages to topics. • Consumer: pulls messages from topics. • Broker: an instance of the Kafka service.

3.3. P arallel Data Processing With Apache Spark

There are multiple parallel and distributed tools such as Apache Hadoop [14], Apache Spark, and Apache Storm [15]. We used Apache Spark as a real-time in-memory processing system. It was developed by the University of Berkeley to solve the limitations of Apache Hadoop. It include libraries for SQL, streaming, machine learning and graph processing engines. The resilient distributed datasets (RDD’s) [16] is the abstract data type for distributed and parallel computing for Apache spark. Specially, the SparkContext allocate resources across applications. Spark use executors to run computations and store data for the application. Then, it sends the application code to the executors. Finally, SparkContext sends tasks to the executors to run. Figure 2 below illustrates the execution architecture of Apache Spark application.

Fig. 2. Execution Flow of Apache Spark application Articles

65


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

3.4. Distributed Data Storage on Apache HBase The NoSQL databases [17] system are non-relational, distributed database system that enable the fast analysis of high-velocity data with disparate data types. The key factors of the NoSQL databases system is the scalability, high availability, and fault tolerance. There are many of NoSQL databases like Apache Hbase, Mongodb [18] and Apache Cassandra [19]. We use in our project Apache Hbase as NoSQL distributed database for the real-time system. HBase is a Java-based, open source, NoSQL, non-relational, column-oriented, distributed database built on top of the Hadoop Distributed Filesystem (HDFS) [20], modeled after Google’s BigTable paper. HBase brings to the Hadoop ecosystem most of the BigTable capabilities. HBase is built to be a fault-tolerant application hosting a few large tables of sparse data (billions/trillions of rows by millions of columns), while allowing for very low latency and near real-time random reads and random writes [21]. This is the major terms used in Apache Hbase: • Table: In HBase the data is organized into tables. Table names are strings. • Row: In each table the data is organized in rows. a line is identified by a unique key (RowKey). The Rowkeys does not have a type, it is treated as an array of bytes. • Column Family: Data within a row is grouped by column family. Each row of the table has the same column family, which are defined when the table is created in HBase. The names of the column family are strings. • Column qualifier: It allows access to data within a column family. Like rowkeys, the column qualifier is not typed, it is treated as an array of bytes. • Cell: The combination of RowKey, Column Family and Column Qualifier uniquely identifies a cell. The data stored in a cell is called the values of that cell. Values have no type, they are always considered as byte array. • Version: Values within a cell are versioned. The versions are identified by their timestamp. The number of versions is configured through the Column Family. By default, this number is equal to three. HBase is composed of 3 types of servers in a master slave type of architecture. Region servers serve data for reads and writes. Region assignment and DDL operations are handled by the HBase Master. Zookeeper, which is part of HDFS, maintains a live cluster state. The NameNode maintains metadata information for all the physical data blocks that comprise the files. Figure 3 shows the architecture of Apache Hbase:

66

Articles

Fig. 3. Hbase Architecture

3.5. Visualization Component The visualization displays the real-time dashboard on Kibana based on a real-time processed data stored on Elasticsearch, which helps in both decision making and visualization purposes. For, Twitter Sentiment Analysis Application, we created a web application in node.js on Microsoft Azure that display graphs and geo map of positive, negative and neutral Tweets. The web application also allows users to search a particular bank and finds its occurrence in positive, neutral and negative Tweets. The web application also show the customer’ satisfaction rate.

4. Data Preprocessing Sentiments analysis requires many preprocessing steps such as tokenization, stop words removing, and stemming. These steps approximately consumes 80% of the time and efforts. The preprocessing steps play, in addition to their preparation role, the data reduction role by excluding worthless feature from the bag of words. • Lower uppercase letters: Is the first step in the preprocessing wich consist on changing uppercase letter to lowercase letter, so that the analysis will not be case sensitive. • Remove numbers: Numbers can be removed from the sentences because they are not significant in the data analysis. Accordingly, this preprocessing step use regular expression. • Remove URLs and user references: a twitter user can use a nominated user @username to mention another user in Tweets, send him a message, or link to his profile. Therefore it is necessary to remove symbols such as ‘@’ and ‘#’ from the words because they are not relevant for analyzing the content of a text. • Remove stop words: Stop words are words which are commonly used in a language and is not useful for the analyzing system. The stop words of English and french language are stored in a list in HDFS. In this step, items in the tokens liste compared with each word in the stop word table in order to delete the stop words from tokens table for each Tweet.


Journal of Automation, Mobile Robotics and Intelligent Systems

• Tokenize: Tokenization annotator splits sentences into smaller units called token which can be words, numbers, n-grams, and symbols. So the token is created by splitting the sentence on each space. • Detect POS tags: Part of speech tagging task aims to assign every word/token in plain text a category such as noun, verb, adjective, etc. • Lemmatize: When processing samples, “word” and “words” would be considered as two different features. Hence, in order to improve the features reduction process, the unigrams can be lemmatized. This preprocessing step mainly allows to remove plurals and conjugations.

5. Material and Methods The main goal of the research is to analyze customer’ sentiment about Moroccan banks from Twitter. We used two methods to classify Tweets into 3 categories (‘Positive’, ‘Negative’ or ‘Neutral’). The first approch is based on a Stanford CoreNLP API for English Tweets and the second approch is based on Naï�ve Bayes classifier for French Tweet because Stanford NLP API does not contain sentiment models for languages excepting English. In this section, we explain this two methods.

5.1. Methodology

1) Stanford CoreNLP Natural language processing (NLP) is a branch of artificial intelligence focused on the interactions between human language and computers. Nowadays, many tools have been published to do natural language processing jobs. Stanford CoreNLP is a great Natural Language Processing (NLP) tool for analysing text. Given a paragraph, CoreNLP splits it into sentences then analyzes it to return the base forms of words in the sentences, their dependencies, parts of speech, named entities and many more. Stanford CoreNLP only supports English for sentiment analysis. It is implemented in Java and our main code-base is written in Spark using Scala that why we access it through an API. This model proposes many linguistic analysis tools like: • The part-of-speech (POS) tagger: is a process of associating the words of a text with a corresponding type. • The Named-Entity Recognition (NER): allows to recognize in a text a certain type of categorizable concepts such as names of people, names of organizations or companies, names of places, quantities, distances, values, dates, etc. • Parser: It consists in highlighting the structure of a text. This structure is often a hierarchy of syntagms, represented by a syntax tree or tagged tree ParseTree) whose nodes can be equipped with additional information.

VOLUME 14,

N° 4

2020

• Sentiment Analysis: Predicts the sentiment of a text (positive, negative, neutral) but based on a new type of recursive neuron network that relies on grammatical structures. Stanford sentiment analysis calculates the sentiment by relying on how words make up the meaning of the text. This applies by introducing the Stanford Sentiment Treebank which is a dataset with fully labeled analysis trees that allow complete analysis. The dataset is based on data introduced by Pang and Lee (2005) [22] and includes 11,855 sentences taken from film reviews. These sentences were analyzed with the Stanford parser which itself includes a total of 215 154 sentences, each annotated by 3 human judges. This new dataset allows us to analyze sentiment with subtlety. Algorithm 1

Input: a text t

Output: a predicted class

sf ∈ {‘Positive’,’Negative’,’Neutral’} Steps:

a) Segmentation of the text into a sentence with the annotator ssplit by detecting the points.

b) Fragmentation of sentences using the Tokenize annotator into smaller units called token which can be words, n-grams (group of n consecutive tokens), numbers, symbols, and punctuation.

c) Association of a morphosyntactic class for each fragment according to its context

d) Application of the lemma annotator to the words of the text in order to detect to which family belongs the word and replace it with its canonical form which is the infinitive if it is a verb and masculine singular for other words.

e) Application of the parsetree annotator for a syntactic analysis distributed in the form of a tree with two main branches corresponding to the nominal sentence and to the verbal sentence if it exists.

f) Using Sentiment Annotator to Attache a binarized tree of the sentence to the sentence level CoreMap. The nodes of the tree then contain the annotations from RNNCoreAnnotations indicating the predicted class and scores for that subtree.

g) Calculating the weight of the sentiment ws in each sentence which is equal to the sentiment s multiplied by the size of the sentiment sz:

ws = s . sz (1)

w = Σws / st (1)

h) In order to calculate of the weight w of the sentiment of the whole text we divide the sum of the weight of the sentiment ws of the whole sentences in the text by the size of the text st: i) If 0 < w < 2 then sf = ‘Negative’

Else if 3 < w < 5 then sf = ‘Positive’ Else sf = ‘Neutral’

Articles

67


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

2) Naïve Bayes Classifier Naï�ve Bayes Classifier is a popular algorithm in Machine Learning. It is a supervised classification algorithm that depend on Bayes’ theorem which is based on conditional probabilities. The basic idea is to find the probabilities of categories given a text by using the joint probabilities of words and categories. It is based on the assumption of word independence. The starting point is the Bayes’ theorem for conditional probability, stating that, for a given data point x and class C:

P (Cj / xi) = (P(xi/Cj) . P(Cj)) / P(xi)

Cj represent jth class of classes{1,2,3..n} xi represent features vector of ith sample of samples{1,2,3..m}

(1)

We used Naï�ve Bayes with Apache Spark MLlib for Text classification. It takes an RDD of LabeledPoint and an optional smoothing parameter lambda as input, an optional model type parameter (default is “multinomial”), and outputs a NaiveBayesModel, which can be used for evaluation and prediction. Algorithm 2

Input: a text t, a list of stop words sw Output: a predicted class s ∈{‘Positive’,’Negative’,’Neutral’}

Steps: a) Removing stop words, links, emails and worthless feature from the text. b) Transforming the text into linear vectors

c) Training the model on a dataset that shows the expected class which is the polarity of the text in relation to the features. d) Classification of the predicted polarity p: If p = 0 then s = ‘Negative’ Else if p =4 then s = ‘Positive’ Else s = ‘Neutral’

Fig. 4. The customer’ satisfaction rate by bank

6. Experimental Results 6.1. Performance Evaluation Accuracy is the ratio of number of correct predictions to the total number of input samples. Accuracy =

Number of correct predictions Total number of predictions made

Our performance evaluation show that for French Tweets the accuracy of Naï�ve Bayesian model has reached 76.19% while for English Tweets the resulting accuracy is 56%. That why we used the Stanford core NLP which, for English Tweets, reaches a precision of 80.7%.

6.2. Result

In the rest of this section, we present the visualizations of the result on Kibana. We first visualize the data in the form of heat map which express the number of Tweets according to the sentiment deducted for each bank. Figure 5 shows the occurrence of banks in positive, neutral and negative Tweets.

5.2. Customer’ Satisfaction Rate Calculation

Key performance indicators (KPIs) wich are like milestones on the road to online retail success. Ecommerce entrepreneurs identify progress toward sales, marketing, and customer’ service goals monitoring them. We have calculated in this research the customer’ satisfaction rate by bank which is equal to the sum of the positive Tweets for a bank multiplied by 100 and divided by the total number of Tweets. Figure 4 shows the customer’ satisfaction rate by bank stored in Hbase:

68

Articles

Fig. 5. The number of Tweets by bank according to the sentiment


Journal of Automation, Mobile Robotics and Intelligent Systems

The second graph in figure 6 is a histogram expressing the customer’ satisfaction rate with respect to each bank.

Fig. 6. Customer’ satisfaction by bank The figure 7 is a pie chart that shows the rate of positivity, neutrality and negativity of customer’ opinion of the Moroccan banking sector.

VOLUME 14,

N° 4

2020

The fourth graph in the figure 8 is a map explaining the number of Tweets by region. Having a large number of Tweets, this explains that this region is more banked than any other. This graph gives an idea of the regions that banks must target to improve financial inclusion.

Fig. 8. A map for the number of Tweets by region The figure number 9 and 10 below represent the number of Tweets by banks and the numbers of Tweets by region.

Fig. 9. Number of Tweets by bank

Fig. 7. A pie chart for the rate of positivity, neutrality and negativity by bank

Fig. 10. Number of Tweets by region Articles

69


Journal of Automation, Mobile Robotics and Intelligent Systems

The Kibana Dashboard page is where we combined multiple visualizations onto a single page, then filter them by providing a search query or by selecting filters by clicking elements in the visualization. Figure 11 shows the Kibana dashboard which contain all the visualizations mentioned above.

VOLUME 14,

N° 4

2020

Mourad ElBelkacemi – Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco, e-mail: mourad_prof@yahoo.fr. *Corresponding author

References

Fig. 11. The sentiment Analysis Dashboard

7. Conclusion In this paper, we analyzed the sentiment of customer on Twitter towards 23 banks in Morocco. To the best of our knowledge, no other work has attempted to analyze sentiment of users towards Moroccan banks. We first began by studying the business needs for the types of data, the results it wants and the right tools. Then we analyzed the customer’ sentiment using two methods to classify Tweets into 3 categories (‘Positive’, ‘Negative’ or ‘Neutral’). The first method is based on a Stanford CoreNLP API for English Tweets and the second method is based on Naï�ve Bayes classifier for French Tweet. However performing such Twitter sentiment analysis against the large amount and high velocity of data requires large-scale processing of data on multiple machines using big data tools. The real-time system we proposed and evaluated in this paper is able to perform the Twitter sentiment analysis over large and high velocity of data near the real-time. In future, we will work on setting up an analysis solution that predicts yield and turnover, while tracking multiple features. We would also like to compare try and come up with an efficient sentiment analyzer like random forest, Support vector Machine etc.

Acknowledgements

The authors would like to thank Bank of Maghreb and Faculty of Sciences for the support during this research work.

AUTHORS Anouar Riadsolh* – Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco, e-mail: anouarriadsolh@yahoo.fr.

70

Imane Lasri – Laboratory of Conception and Systems, Faculty of Sciences, Mohammed V University in Rabat, Morocco, e-mail: imanelasri95@gmail.com. Articles

[1] F. A. Pozzi, E. Fersini, E. Messina and B. Liu, (eds.) Sentiment analysis in social networks, Morgan Kaufmann, 2017, DOI: 10.1016/C2015-0-01864-0.  [2] C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard and D. McClosky, “The Stanford CoreNLP Natural Language Processing Toolkit”. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014, 55–60, DOI: 10.3115/v1/P14-5010.  [3] A. Riadsolh and M. E. Belkacemi, “Toward a Good Decision to Improve the Weight of Control of Expenditure for Local Communities”, American Journal of Applied Sciences, vol. 13, no. 3, 2016, 299–306, DOI: 10.3844/ajassp.2016.299.306.  [4] K. Ming Leung, “Naive Bayesian Classifier”, Polytechnic University, 2007, https://cse.engineering.nyu.edu/~mleung/FRE7851/f07/naiveBayesianClassifier.pdf. Accessed on: 202102-05.  [5] “Apache Kafka: a distributed straming platform”, kafka.apache.org. Accessed on: 2021-02-05.  [6] M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker and I. Stoica, “Spark: Cluster Computing with Working Sets”. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud’10), 2010.  [7] “Apache HBase ™ Home”, https://hbase.apache. org. Accessed on: 2021-02-05.  [8] A. G. Shoro and T. R. Soomro, “Big Data Analysis: Apache Spark Perspective”, Global Journal of Computer Science and Technology, 2015.  [9] J. Bollen, H. Mao and A. Pepe, “Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena”. In: ICWSM, vol. 11, 2011, 450–453. [10] L. Dey, S. Chakraborty, A. Biswas, B. Bose and S. Tiwari, “Sentiment Analysis of Review Datasets Using Naïve Bayes’ and K-NN Classifier”, International Journal of Information Engineering and Electronic Business (IJIEEB), vol. 8, no. 4, 2016, DOI: 10.5815/ijieeb.2016.04.07. [11] “Welcome to Apache Flume — Apache Flume”, http://flume.apache.org/. Accessed on: 2021-02-05. [12] A. Videla and J. J. W. Williams, RabbitMQ in Action, Manning, 2012. [13] “Apache ZooKeeper”, https://zookeeper.apa­ che.org. Accessed on: 2021-02-05.


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

[14] O. O’Malley, “Terabyte sort on apache hadoop”, 2008, sortbenchmark.org/YahooHadoop.pdf. Accessed on: 2021-02-05. [15] A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal and D. Ryaboy, “Storm@twitter”. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014, 147–156, DOI: 10.1145/2588555.2595641. [16] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker and I. Stoica, “Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing”. In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2012. [17] H. Jing, E. Haihong, L. Guan, and D. Jian, “Survey on NoSQL database”. In: 2011 6th International Conference on Pervasive Computing and Applications, 2011, 363–366, DOI: 10.1109/ICPCA.2011.6106531. [18] K. Chodorow, “Introduction to MongoDB”, https://archive.fosdem.org/2010/schedule/ events/nosql_mongodb_intro.html. Accessed on: 2021-02-05. [19] “Apache Cassandra”, https://cassandra.apache. org/. Accessed on: 2021-02-05. [20] K. Shvachko, H. Kuang, S. Radia and R. Chansler, “The Hadoop Distributed File System”. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010, 1–10, DOI: 10.1109/MSST.2010.5496972. [21] J.-M. Spaggiari and K. O’Dell, Architecting HBase Applications: a Guidebook for Successful Development and Design, O’Reilly Media, 2016. [22] B. Pang and L. Lee, “Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales”. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, 2005, 115–124, DOI: 10.3115/1219840.1219855.

Articles

71


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

A Dropout Predictor System in MOOCs Based on Neural Networks Submitted: 26th June 2019; accepted: 25th March 2020

Khaoula Mrhar, Otmane Douimi, Mounia Abik DOI: 10.14313/JAMRIS/4-2020/48 Abstract: Massive open online courses, MOOCs, are a recent phenomenon that has achieved a tremendous media attention in the online education world. Certainly, the MOOCs have brought interest among the learners (given the number of enrolled learners in these courses). Nevertheless, the rate of dropout in MOOCs is very important. Indeed, a limited number of the enrolled learners complete their courses. The high dropout rate in MOOCs is perceived by the educator’s community as one of the most important problems. It’s related to diverse aspects, such as the motivation of the learners, their expectations and the lack of social interactions. However, to solve this problem, it is necessary to predict the likelihood of dropout in order to propose an appropriate intervention for learners at-risk of dropping out their courses. In this paper, we present a dropout predictor model based on a neural network algorithm and sentiment analysis feature that used the clickstream log and forum post data. Our model achieved an average AUC (Area under the curve) as high as 90% and the model with the feature of the learner’s sentiments analysis attained average increase in AUC of 0.5%. Keywords: Massive open online courses MOOCs, Student Attrition, Dropout prediction, Neural Network, Sentiment Analysis

1. Introduction

72

More and more, the massive open online courses (MOOCs) have witnessed a tremendous development in the recent year [1]. Through the MOOCs, the learners have the opportunities to self-organize their participation, learning goals, knowledge, abilities and interests. Certainly, the MOOCs have brought interest among the learners (given the number of learners enrolled in these courses), nevertheless, There are many unresolved question related to MOOCs , One of the major recurring issues raised is the high dropout rate of the MOOC learners , Indeed, the MOOC dropout rate generally knows of 90 % attrition [2][3], in this sense, and according to the last study elaborate by EDX, only 17 % of the enrolled learners consulted the courses and only 8 % get the certification at the end of the MOOC [4], which means that most of the users who join a MOOC eventually don’t complete it. However, the dropout in the MOOC was related to di-

verse aspects that can be classified as learner-related factors and MOOC-related factors. The learner-related factors are especially the motivation of the learner and their expectations [5][6][7], insufficient learner’s background knowledge and lack of required learner’s skills [8]. The MOOCs-related factors are related to courses design and the lack of the social interactions in MOOCs leading to an isolation feeling [9]. There has been an increasing attention to this issue by several researchers, in this sense, we can cite the paper of Kizilcec and al. [10], Hill [11] who proposed the classification of the learners according to their interactions with the platform. Hill [11] classified the various MOOCs participants in five categories: active, passive, Drop-ins, Observers and No. Shows. Besides, we can classify three different approaches to minimize the dropout rate [12][13]: – The pedagogical strategy approach: it is a number of theoretical strategies validated by empirical studies. Among these educational strategies we can cite this work [14]. – The personalization and adaptation approach: several research works focus more and more on the importance of the personalization to reduce the dropout rate. In this context, we can cite different project which proposes a personalization system of the pedagogical objectives within MOOC [15] [16]. – The gamification approach: this is the mode of learning which capture a large audience. Indeed, integrating serious games into learning process in order to increase learners’ motivation and engagement. [17]. However, to reduce the dropout rate, the adoption of these approaches is not sufficient. By reason of, the low pedagogical tutoring in these platforms, the no personalization of the courses according to the real learner profile (profile updated with the knowledge and skills acquired in MOOCs), thus the majority of the serious games couldn’t influence the intrinsic and extrinsic learners’ motivation. As consequence the need to predict the learner’s dropout in MOOC in order to propose to them a suitable pedagogical strategy and\or complementary resources to help them to complete their courses. Our objective in this paper is to provide a solution to reduce the dropout rates through the prediction of the learner’s dropout in the MOOC. Our approach consists to use a machine learning algorithm based on the clickstream log and


Journal of Automation, Mobile Robotics and Intelligent Systems

forum post data from EDX MOOC. We can summarize our research problems in three questions: • How can we identify the features to be considered in the prediction task to have a better accuracy of our algorithm? • What is the best machine learning algorithm to be used to predict the dropout of a learner? • Can the sentiment analysis influence the accuracy of our algorithm?

2. Related Work Recently, there have been several effort to predict learner dropout in MOOCs by analyzing the learner interaction, extracting a variety features and applying the machine learning algorithms. The mostly works use clickstream features which contain the interaction event among learners and the MOOCs courseware including discussion forum content , video lectures, quiz answers and more. Different from other works, using features such as number of threads viewed, number of forum posts, ,the number of video viewed and more to predict learner attrition [19]. Using learner’s social interaction, quiz score and number of peer review [20]. And some works try to classify the various features to understand their relationship and their relative importance in order to classify the learners in MOOCs [21]. All these works use a variety of classification algorithms and adapt a different approaches for extracting features. In [22], the author propose a support vector machine SVM and extract features from clickstream to predict dropout learner each week. [23] Apply a logistic regression to identify the learners who seem to be not able to complete the course [23]. Author in [24] use a k-means to discover inactive learners in MOOCs environment. However, among various ML algorithms, only some researchers focus on Artificial Neural Network ANN [25] and Recurrent Neural Network RNN [26]. Table 1 presents a Synopsis of prior works on the dropout prediction in MOOCs.

VOLUME 14,

Study Jiang et al.[31]

1

Xing et al. [33]

1

Kloft et al.[32]

1

Wang et al. [26]

1

MOOC Numbers

Dataset

1

Clickstream

Boyer et Veeramachaneni [28]

3

Clickstream

Chaplot et al. [25] Taylor et al. [29]

1 1

Clickstream forum, posts

ANN

Coleman et al [30]

1

Clickstream

LDA+LR

Clickstream, forum posts

HMM + SVM TL+LR

LR

Algorithm used

Dataset Socialnetwork, grades

LR

Clickstream

PCA + {BN, DT}

Clickstream

SVM

Clickstream

CNN+RNN

Data variance

Algorithm used

Balakrishnan et Coetzee [27]

2020

However, several challenges facing dropout prediction using machine learning methods [34] (Fig. 1), such as the large mass of unstructured data contained in MOOCs platforms which need a specific management mainly when the missing data occurs. In this sense, we cannot apply several machine learning techniques that require a finite set of data and no missing observations such as HMM. The non-interoperability of learning platforms and the non-standardization of data MOOC can lead to another challenge which the generalization of machine learning solution. Furthermore, each learning platform has its own data definition, vocabulary, and the clickstreams data are represented in a different format. Therefore, the process of creating, training and validation of ML model is specified of each learning platforms (Open edx, Coursera, canvas...) and can’t be used for other learning platform. Another challenge is the data variance in MOOC platforms that lead to imbalanced classes. The high data imbalance may result to poor performance, less accuracy and reliability in ML models such as SVM. In addition the privacy and non-availability of learner’s data in MOOCs platforms and the availability only the clickstream data is one of the important challenge of ML learner dropout prediction in MOOCs because the result obtained limited and not representative

Tab. 1. Survey of prior works on the dropout prediction in MOOCs Study

MOOC Numbers

N° 4

Large mass of unstructured Data

ML techniques challenges

Nonavailability of publicy dataset

Non standarization MOOCs data

Fig 1. ML challenges for dropout prediction Furthermore, according to some works [18], we observe a significant correlation between sentiment Articles

73


Journal of Automation, Mobile Robotics and Intelligent Systems

expressed in the discussion forum of the course and the number of learners who drop the course. But unfortunately, there has not been much work on use of learner sentiments in predicting dropout. We propose in this paper a model that predicts learner attrition. This model is based on the most interesting information that impact the learner ‘dropout [34]. It’s selected after a correlation with several features and learner ‘dropout. The feature is related to the clickstream log such as (Number of video views, number of subsection viewed …) and related to discussion forum such as number of forum post viewed, number of forum post votes and student sentiments in discussion forum posts etc. However, we decided to use ANN model with two hidden layers and nine features input after several experimentation and accuracy comparison between the standard dropout prediction architecture. In addition, the choice of this algorithm is also to answer to some challenge of machine learning model cited before (large mass unstructured Data, solving high imbalanced data). In the following section, we introduce our proposed model and the machine learning algorithm used to address the MOOC dropout prediction problem.

VOLUME 14,

N° 4

2020

3.2 Dataset The experimentation and analyses in this paper are based on a dataset which was prepared by Stanford’s university exclusively for our work after a request via their official website (https://datastage.stanford.edu/). The data was collected from a MOOC ‘introduction to computer science’ which was launched in March, 2016. The course lasted twelve weeks with 11 607 participants at the beginning of the week and 3861 participants staying until the last week of course. Globally, 20 828 learners are participated, with approximately ML techniques challenges Data variance No availability of public dataset Non interoperability of Learning Platforms Large mass of unstructured Data 81,4 % as dropout rate. Fig. 2 summarizes the various data sources received from Stanford:

3. Proposed Model 3.1. Prediction Dropout Problem Formulation Since their appearance, the MOOC has several limits mainly the very high dropout rate, on all the MOOC, no matter their subject. On average, just 8% of the enrolled finish the courses and get a certification. The high dropout in MOOCs has been attribute to many factors, such as the lack of time, learner’s motivation, feeling of isolation, and lack of interactivity in MOOCs. However, this problem raised the following research questions: “how can we predict the attrition of the learners in the MOOC?“. To answer to this question, we propose a dropout predictor that can be used by the educators during the courses to propose necessary intervention for the learners at risk of dropping in order to reduce attrition learners in MOOCs we suggest applying four types of machine learning algorithm to the tracking log, discussion forum interaction clickstream and forum post data from OpenEdX Platforms MOOC ‘Introduction to Computer Science‘, prepared exclusively for this work by Stanford’s university. The objective of this step is to choose the algorithm which give the best accuracy in our context. We suggest afterward testing our algorithm by adding new features to improve even more the performance of the chosen algorithm. We also study in this paper, the influence of the sentiments analysis feature on the prediction accuracy

74

Articles

Fig. 2. Overview our Data sources • A file of the clickstream log data from the historic navigation and edX server in CSV format. For example, every page visited by every learner was stored as event CSV. • Forum post, comments and the answers stored in a file CSV. • Wiki pages visited stored in a file CSV. • A file CSV containing information about the state of the learners. For example, the database contained its final answer to a problem. • A file CSV of the calendar of the MOOC which included information such as the deadlines of sending the homework.

We visualized several properties of our dataset in Fig. 3.


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Fig. 3. Proprieties of our dataset We observe that the number of active learners quickly decreases. Furthermore, the probability of dropout is high during the first two weeks.

3.3 Features Extraction

After the cleaning of our dataset, we process the data and transform them in the good format for the classification task. firstly, we translate the interactions in our data set in an adapted format to the task of classification. In particular, we analyze the interactions and extract a set of features, which are the input of our machine learning algorithms. Every features represents an aspect of the dataset which we want to take into account when we predict if a learner will dropout or not. To build our set of features, firstly, we sort out the interactions in the chronological order according to their timestamp, then, the missing attributes are replaced by the median of the week. After that, we select the features based on a manual feature selection method, we calculate a correlation between each pair of features and we choose nine features based on correlation matrix plot as appeared in figure 4.

removal of duplicate rows in order to cleaning our dataset. After that, the normalization procedure. Furthermore, the class imbalance is one of the issue that occurs in the MOOC dataset. Where the number of learners who dropout far exceeds the number of learners who complete the courses. The presence of the class imbalance in our dataset can bring the classifier to predict in an incorrectly way the learners may dropout. This is partially dangerous for the learners who are in the reality complete their courses and classified badly by our predicted model. To solve this problem under sampling the majority class (‘learner dropping out ’) was used. Tab. 2. Features extracted for MOOCs dropout prediction Features Course Week

The number of the weeks since the course has begun

Number of problems answered

The number of the questions that the learner answered.

Number of video views

Number of problems answered correctly Number of subsection viewed Number of forum post

Number of forum post votes Learner Sentiment Analysis

Student started week

Fig. 4. Correlation Matrix Plot The complete list of the extracted features is presented in Tab. 2. After the extraction of these features, different important steps of preprocessing have been applied. Firstly, we applied several technique including the

Description

The number of videos played by the learner in the current week

The number of the questions that the learner answered correctly. The number of section viewed by the learner in the current week The Number of forum pages viewed by the learner in the current week The Number of forum post voted by the learner in the current week

The sentiment analysis of the learner in the forum post in the current week The number of the weeks since the learner began the course.

3.4. Proposed Model To predict the learner attrition in MOOC, we suggest using the machine learning algorithm based on nine features presented in the previous section namely course week, Number of video views, Number of problems answered, Number of problems answered correctly, Number of subsection viewed , Number of forum Articles

75


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Fig. 5. Structure of our neural network used in the learner attrition

76

post, Number of forum post votes, Learner Sentiment Analysis, Student started week. So after the features extraction, we try to find the answers to these remaining questions: – What is the best algorithm of machine learning to be used to predict the learner’ dropout? – The sentiment analysis feature can influence the accuracy of our learning algorithm? In our work, we select several features related to discussion forum and clickstream based on a correlation between them and learners ‘dropout. We decided to build an artificial neural network because is the most suitable to model the learner attrition prediction problem and there are a large number of inputs and any mathematical relation between the inputs and output is unknown contrary to many other machine learning techniques. Furthermore, our neural network model consisting of nine nodes in the input layer which the value of the features for the current week. And the output layer is composed of one node which allow to predict if the learner will dropout the next week. Every input is normalized to take values between 0 and 1. We add two hidden layers with eight neurons in our neural network between the input and the output layers. The number of neurons in the hidden layer was determined experimentally to obtain the possible better results. Besides, to evaluate the prediction performance of the proposed model, we compare it with the others baseline algorithms used in MOOCs dropout prediction such as SVM, KNN, decision tree. Fig. 5 shows the structure of our neural network to predict the learner attrition. To build the neural network, we use a resilient back propagation .it gave a better result in our experimentation in comparison with the back propagation and quick propagation. According to our experimentation concerning the four machine learning algorithms [table III], we notice that the neural network gives more accuracy comparing Articles

with the others namely KNN, SVM and decision tree. So, we choose to use the neural network algorithm for our predictor dropout system. To respond to our research question concerning the impact of the sentiments analysis features on the accuracy of the results, we added the feature of the learner sentiments analysis in the discussion forum that already prepared in our dataset. Table 3 below summarizes the results of our experimentation of the four algorithms on our dataset. Tab. 3. Comparison of AUC Average between learning algorithms Algorithm

AUC Average

KNN

0.72

Neural Network (Without Sentiments Analysis )

0.90

SVM

Decision Tree

Neural Network (With Sentiment Analysis )

0.84 0.74 0.95

According to these results which will be discussed in details in future section results, we notice that the neural network algorithm has a better accuracy, besides, we noticed that the model considering the features “sentiment analysis” attained average increase in AUC of 0.05. This comparative study allowed us to consider that the neural network algorithm the most performing in the learner prediction attrition in MOOC.

4. Results To measure the performance of our prediction algorithm, we use both metrics the AUC and the accuracy. AUC allow us to determine the validity of the


Journal of Automation, Mobile Robotics and Intelligent Systems

VOLUME 14,

N° 4

2020

Fig. 6. Prediction performance for training data for AUC Tab. 4. AUC and accuracy results for our algorithms overs weeks Algorithm KNN SVM DT ANN (with the sentiment analysis feature)

Metric AUC

Accuracy

AUC

Accuracy

AUC

Accuracy

AUC

Accuracy

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

0.79

0.78

0.70

0.71

0.70

0.71

0.79

0.70

0.71

0.72

0.73

0.70

0.82

0.80

0.90

0.78

0.92

0.96

0.94

0.83

0.80

0.90

0.75

0.91

0.95

0.94

0.87

0.84

0.91

0.78

0.92

0.95

0.95

0.90

0.83

0.91

0.73

0.91

0.96

0.97

model, it give us the information about the probability of the test result, when the test is perfectly discriminating the surface under the curve (AUC) is 1. This means that, for two learners (at risk learner of dropping and not at risk learner of dropping), the test allows us to distinguish between the learner who will dropout the MOOC and who will complete it. The table IV and the figure 6 present the performance of prediction of the various algorithms which we applied. The AUC value for KNN lies between 0.70 to 0.79 and for the decision tree, lies between 0.71 to 0.79. For SVM, AUC varied between 0.80 to 0.89. For our algorithm which bases on the neural networks, the AUC varies between 0.93 to 0.98. The results of the AUC values show that the limit of prediction performance of the lowest prediction of our neural network has still exceeded the limit superior of the algorithms KNN and the decision tree. Our neural network has the highest average of the AUC values of 0. 95. It surpasses the baseline algorithms. In fact, in this study, most of the time a single layer was used in the construction of the machine Learning algorithm. There is enough space to refine the performances of our neural network.

0.89

0.89

0.94

0.79

0.91

0.97

0.96

0.86

0.81

0.96

0.71

0.94

0.98

0.95

0.86

0.87

0.94

0.77

0.93

0.98

0.98

0.82

0.80

0.91

0.76

0.91

0.97

0.96

0.82

0.83

0.90

0.77

0.92

0.96

0.97

0.83

0.89

0.93

0.79

0.93

0.94

0.95

0.80

0.88

0.94

0.78

0.92

0.95

0.97

0.81

0.84

0.92

0.79

0.90

0.93

0.96

Overall, our network of neurons presented a much better and performance for identification of the learners at-risk of dropping in MOOC. Concerning the accuracy of the dropout prediction, the range for KNN was varied between 0.82 and 0.90 according to table 4. The range of precision for SVM was between 0.90 and 0.94. The interval for the decision tree was 0.90 and 0.94. For the precision of prediction of our algorithm, the range was 0.94 and 0.98. The precision of the neural network of neurons always has a better performance than the basic algorithms. According to this implementation we notice that the SVM has an accuracy prediction very close to neural network model.

5. Conclusion The MOOC become more and more popular because of several elements that distinguish it to other online education courses (completely opened and free to anyone). However, a limited number of the enrolled learners complete the courses. Moreover, the MOOC teachers are incapable to identify the learners Articles

77


Journal of Automation, Mobile Robotics and Intelligent Systems

at risk of dropping using the traditional methods (interview, observations and questionnaire). Furthermore, the researchers began to explore the use of machine learning to develop a prediction dropout models for an effective identification of the learners at risk of attrition, and offer to them the suitable interventions. Nevertheless, in education context and in particular in MOOC, the data generated by the learners is enormous and varied. And the selection of the most informative features is the important task in the machine learning algorithms. In our work we choose nine features that are the most correlated with learner ‘dropout related to discussion forum and clickstream data. And we construct an artificial neural network ANN model to predict if the learner will dropout the next week. According to our study, we conclude that in our context, the neural network is more performant than other baseline algorithm, including KNN, SVM and decision tree. By comparing the prediction performance of the various algorithms for the training data and test data, the results show that the prediction performance in neural network is more stable than the others. Besides, we note that the learner sentiment analysis in discussion forum feature improved the AUC value of our algorithm of 0.05 In future work, we plan to: Explore the social aspect of discussion forums in the detection of learner at risk of dropping. Improve more the performance prediction of our neural network by adding other features and hidden layers. Exploit the result obtained from our model to design a recommendation system which can help and motivate the learners to complete their MOOC.

AUTHORS Khaoula Mrhar* – Faculty of Science, Mohammed V University in Rabat, Morocco, e-mail: khaoula_ mrhar@um5.ac.ma.

Otmane Douimi – ENSIAS, Mohammed V University in Rabat, Morocco. Mounia Abik – Information Retrieval and Data Analytics Research Team, ENSIAS, Mohammed V University in Rabat, Morocco. *Corresponding author

References  [1] A. Watters, “MOOC Mania: Debunking the hype around massive open online courses”, 2013, http://www.thedigitalshift.com/2013/04/ featured/got-mooc-massive-open-onlinecourses-are-poised-to-change-the-face-of-education/. Accessed on: 2021-02-06.  [2] K. Jordan, “Initial trends in enrolment and completion of massive open online courses”, The 78

Articles

VOLUME 14,

N° 4

2020

International Review of Research in Open and Distributed Learning, vol. 15, no. 1, 2014, DOI: 10.19173/irrodl.v15i1.1651.  [3] R. Meyer, “What It’s Like to Teach a MOOC (and What the Heck’s a MOOC?)”, 2012, https://www.theatlantic.com/technology/ archive/2012/07/what-its-like-to-teach-amooc-and-what-the-hecks-a-mooc/260000/. Accessed on: 2021-02-06.  [4] D. F. O. Onah, J. Sinclair and R. Boyatt, “Dropout rates of massive open online courses: behavioural patterns”. In: L. Gómez Chova, A. López Martínez and I. Candel Torres (eds.), EDULEARN14 Proceedings, 2014, 5825–5834.  [5] Y. Belanger and J. Thornton, “Bioelectricity: A Quantitative Approach. Duke University’s First MOOC”, 2013, https://dukespace.lib. duke.edu/dspace/handle/10161/6216. Acces­ sed on: 2021-02-06.  [6] C. Gütl, R. H. Rizzardini, V. Chang and M. Morales, “Attrition in MOOC: Lessons Learned from Drop-Out Students”. In: L. Uden, J. Sinclair, Y.-H. Tao and D. Liberona (eds.), Learning Technology for Education in Cloud. MOOC and Big Data, 2014, 37–48, DOI: 10.1007/978-3-319-10671-7_4.  [7] P. Hill, “Emerging Student Patterns in MOOCs: A (Revised) Graphical View”, 2013, https:// eliterate.us/emerging-student-patterns-inmoocs-a-revised-graphical-view/. Accessed on: 2021-02-06.  [8] H. Khalil and M. Ebner, “MOOCs Completion Rates and Possible Methods to Improve Retention - A Literature Review”. In: Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications, 2014, 1236–1244.  [9] H. Khalil and M. Ebner, ““How satisfied are you with your MOOC?” - A Research Study on Interaction in Huge Online Courses”. In: J. Herrington, A. Couros & V. Irvine (Eds.), Proceedings of EdMedia + Innovate Learning, 2013, 830–839. [10] R. F. Kizilcec, C. Piech and E. Schneider, “Deconstructing disengagement: analyzing learner subpopulations in massive open online courses”. In: Proceedings of the third international conference on learning analytics and knowledge, 2013, 170–179. [11] P. Hill, “Some validation of MOOC student patterns graphic”, 2013, https://eliterate.us/validation-mooc-student-patterns-graphic/. Accessed on: 2021-02-06. [12] A. Bakki, L. Oubahssi, C. Cherkaoui and S. George, “cMOOC: How to Assist Teachers in Integrating Motivational Aspects in Pedagogical Scenarios?”. In: T. Brinda, N. Mavengere, I. Haukijärvi, C. Lewin and D. Passey (eds.), Stakeholders and Information Technology in Education, 2016, 72–81, DOI: 10.1007/978-3-319-54687-2_7.


Journal of Automation, Mobile Robotics and Intelligent Systems

[13] A. Bakki, L. Oubahssi, C. Cherkaoui and S. George, “Motivation and Engagement in MOOCs: How to Increase Learning Motivation by Adapting Pedagogical Scenarios?”. In: G. Conole, T. Klobučar, C. Rensing, J. Konert and E. Lavoué (eds.), Design for Teaching and Learning in a Networked World, 2015, 556–559, DOI: 10.1007/978-3-319-24258-3_58. [14] J. J. Williams, “Improving learning in MOOCs with Cognitive Science”. In: International Conference on Artificial Intelligence in Education 2013 Workshops Proceedings, 2013. [15] S. Downes, “Creating the Connectivist Course”, 2012, https://halfanhour.blogspot. com/2012/01/creating-connectivist-course. html. Accessed on: 2021-02-06. [16] F. Brouns, J. Mota, L. Morgado, D. Jansen, S. Fano, A. Silva and A. Teixeira, “A Networked Learning Framework for Effective MOOC Design: The ECO Project Approach”. In: Doing Things Better – Doing Better Things – EDENRW8 Conference Proceedings, 2014, 161–172. [17] S. Deterding, D. Dixon, R. Khaled and L. Nacke, “From game design elements to gamefulness: defining “gamification””. In: Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, 2011, 9–15, DOI: 10.1145/2181037.2181040. [18] M. Wen, D. Yang and C. P. Rosé, “Sentiment Analysis in MOOC Discussion Forums: What does it tell us?”. In: Proceedings of the 7th International Conference on Educational Data Mining, 2014, 130–137. [19] T. Sinha, N. Li, P. Jermann and P. Dillenbourg, “Capturing “attrition intensifying” structural traits from didactic interaction sequences of MOOC learners”. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 2014, 42–49, DOI: 10.3115/v1/W14-4108. [20] M. Vitiello, S. Walk, R. Rizzardini, D. Helic and C. Gütl, “Classifying students to improve MOOC dropout rates”, Proceedings of the European Stakeholder Summit on experiences and best practices in and around MOOCs (EMOOCS 2016), 2016, 501–508. [21] S. Crossley, L. Paquette, M. Dascalu, D. S. McNamara and R. S. Baker, “Combining click-stream data with NLP tools to better understand MOOC completion”. In: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, 2016, 6–14, DOI: 10.1145/2883851.2883931. [22] M. Kloft, F. Stiehler, Z. Zheng and N. Pinkwart, “Predicting MOOC Dropout over Weeks Using Machine Learning Methods”. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 2014, 60–65, DOI: 10.3115/v1/W14-4111.

VOLUME 14,

N° 4

2020

[23] J. He, J. Bailey, B. I. P. Rubinstein and R. Zhang, “Identifying at-risk students in massive open online courses”. In: Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence, 2015, 1749–1755. [24] T.-Y. Liu and X. Li, “Finding out Reasons for Low Completion in MOOC Environment: An Explicable Approach Using Hybrid Data Mining Methods”, Proceedings of 2017 International Conference on Modern Education and Information Technology (MEIT 2017), 2017, DOI: 10.12783/dtssehs/meit2017/12893. [25] D. S. Chaplot, E. Rhim and J. Kim, “Predicting Student Attrition in MOOCs using Sentiment Analysis and Neural Networks”. In: Proceedings of the Workshops at the 17th International Conference on Artificial Intelligence in Education AIED 2015, vol. 3, 2015, 7–12. [26] W. Wang, H. Yu and C. Miao, “Deep Model for Dropout Prediction in MOOCs”. In: Proceedings of the 2nd International Conference on Crowd Science and Engineering, 2017, 26–32, DOI: 10.1145/3126973.3126990. [27] G. Balakrishnan, “Predicting Student Retention in Massive Open Online Courses using Hidden Markov Models”, Technical Report, 2013, https://www2.eecs.berkeley.edu/Pubs/ TechRpts/2013/EECS-2013-109.html. Acces­ sed on: 2021-02-06. [28] S. Boyer and K. Veeramachaneni, “Transfer Learning for Predictive Models in Massive Open Online Courses”. In: C. Conati, N. Heffernan, A. Mitrovic and M. F. Verdejo (eds.), Artificial Intelligence in Education, 2015, 54–63, DOI: 10.1007/978-3-319-19773-9_6. [29] C. Taylor, K. Veeramachaneni and U.-M. O’Reilly, “Likely to stop? Predicting Stopout in Massive Open Online Courses”, arXiv:1408.3382 [cs], 2014. [30] C. A. Coleman, D. T. Seaton and I. Chuang, “Probabilistic Use Cases: Discovering Behavioral Patterns for Predicting Certification”. In: Proceedings of the Second (2015) ACM Conference on Learning @ Scale, 2015, 141–148, DOI: 10.1145/2724660.2724662. [31] S. Jiang, A. Williams, K. Schenke, M. Warschauer and D. O’Dowd, “Predicting MOOC performance with Week 1 Behavior”. In: Proceedings of the 7th International Conference on Educational Data Mining, 2014, 273–275. [32] M. Kloft, F. Stiehler, Z. Zheng and N. Pinkwart, “Predicting MOOC Dropout over Weeks Using Machine Learning Methods”. In: Proceedings of the EMNLP 2014 Workshop on Analysis of Large Scale Social Interaction in MOOCs, 2014, 60–65, DOI: 10.3115/v1/W14-4111. [33] W. Xing, X. Chen, J. Stein and M. Marcinkowski, “Temporal predication of dropouts in MOOCs: Reaching the low hanging fruit through stacking generalization”, Computers in Human Behavior, vol. 58, 2016, 119–129, DOI: 10.1016/j.chb.2015.12.007. Articles

79


Journal of Automation, Mobile Robotics and Intelligent Systems

[34] F. Dalipi, A. S. Imran and Z. Kastrati, “MOOC dropout prediction using machine learning techniques: Review and research challenges”. In: 2018 IEEE Global Engineering Education Conference (EDUCON), 2018, 1007–1014, DOI: 10.1109/EDUCON.2018.8363340.

80

Articles

VOLUME 14,

N° 4

2020


VOLUME 14, N° 4, 2020 www.jamris.org

Indexed in SCOPUS

Journal of Automation, Mobile Robotics and Intelligent Systems pISSN 1897-8649 (PRINT) / eISSN 2080-2145

(ONLINE)

WWW.JAMRIS.ORG  •  pISSN 1897-8649 (PRINT) / eISSN 2080-2145 (ONLINE)  •  VOLUME  14, N° 4, 2020

logo podstawowe skrót

Publisher: ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP

logo podstawowe skrót

ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.