WWW.JAMRIS.ORG
Indexed in SCOPUS
pISSN 1897-8649 (PRINT)/eISSN 2080-2145 (ONLINE)
VOLUME 17, N° 1, 2023
Journal of Automation, Mobile Robotics and Intelligent Systems A peer-reviewed quarterly focusing on new achievements in the following fields: • automation • systems and control • autonomous systems • multiagent systems • decision-making and decision support • • robotics • mechatronics • data sciences • new computing paradigms • Editor-in-Chief
Typesetting
Janusz Kacprzyk (Polish Academy of Sciences, Łukasiewicz-PIAP, Poland)
SCIENDO, www.sciendo.com
Advisory Board
Webmaster
Dimitar Filev (Research & Advenced Engineering, Ford Motor Company, USA) Kaoru Hirota (Tokyo Institute of Technology, Japan) Witold Pedrycz (ECERF, University of Alberta, Canada)
TOMP, www.tomp.pl
Editorial Office
Co-Editors Roman Szewczyk (Łukasiewicz-PIAP, Warsaw University of Technology, Poland) Oscar Castillo (Tijuana Institute of Technology, Mexico) Marek Zaremba (University of Quebec, Canada)
Executive Editor Katarzyna Rzeplinska-Rykała, e-mail: office@jamris.org (Łukasiewicz-PIAP, Poland)
ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland (www.jamris.org) tel. +48-22-8740109, e-mail: office@jamris.org The reference version of the journal is e-version. Printed in 100 copies. Articles are reviewed, excluding advertisements and descriptions of products. Papers published currently are available for non-commercial use under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0) license. Details are available at: https://www.jamris.org/index.php/JAMRIS/ LicenseToPublish
Associate Editor Piotr Skrzypczyński (Poznań University of Technology, Poland)
Statistical Editor Małgorzata Kaliczyńska (Łukasiewicz-PIAP, Poland)
Editorial Board: Chairman – Janusz Kacprzyk (Polish Academy of Sciences, Łukasiewicz-PIAP, Poland) Plamen Angelov (Lancaster University, UK) Adam Borkowski (Polish Academy of Sciences, Poland) Wolfgang Borutzky (Fachhochschule Bonn-Rhein-Sieg, Germany) Bice Cavallo (University of Naples Federico II, Italy) Chin Chen Chang (Feng Chia University, Taiwan) Jorge Manuel Miranda Dias (University of Coimbra, Portugal) Andries Engelbrecht ( University of Stellenbosch, Republic of South Africa) Pablo Estévez (University of Chile) Bogdan Gabrys (Bournemouth University, UK) Fernando Gomide (University of Campinas, Brazil) Aboul Ella Hassanien (Cairo University, Egypt) Joachim Hertzberg (Osnabrück University, Germany) Tadeusz Kaczorek (Białystok University of Technology, Poland) Nikola Kasabov (Auckland University of Technology, New Zealand) Marian P. Kaźmierkowski (Warsaw University of Technology, Poland) Laszlo T. Kóczy (Szechenyi Istvan University, Gyor and Budapest University of Technology and Economics, Hungary) Józef Korbicz (University of Zielona Góra, Poland) Eckart Kramer (Fachhochschule Eberswalde, Germany) Rudolf Kruse (Otto-von-Guericke-Universität, Germany) Ching-Teng Lin (National Chiao-Tung University, Taiwan) Piotr Kulczycki (AGH University of Science and Technology, Poland) Andrew Kusiak (University of Iowa, USA) Mark Last (Ben-Gurion University, Israel) Anthony Maciejewski (Colorado State University, USA)
Krzysztof Malinowski (Warsaw University of Technology, Poland) Andrzej Masłowski (Warsaw University of Technology, Poland) Patricia Melin (Tijuana Institute of Technology, Mexico) Fazel Naghdy (University of Wollongong, Australia) Zbigniew Nahorski (Polish Academy of Sciences, Poland) Nadia Nedjah (State University of Rio de Janeiro, Brazil) Dmitry A. Novikov (Institute of Control Sciences, Russian Academy of Sciences, Russia) Duc Truong Pham (Birmingham University, UK) Lech Polkowski (University of Warmia and Mazury, Poland) Alain Pruski (University of Metz, France) Rita Ribeiro (UNINOVA, Instituto de Desenvolvimento de Novas Tecnologias, Portugal) Imre Rudas (Óbuda University, Hungary) Leszek Rutkowski (Czestochowa University of Technology, Poland) Alessandro Saffiotti (Örebro University, Sweden) Klaus Schilling (Julius-Maximilians-University Wuerzburg, Germany) Vassil Sgurev (Bulgarian Academy of Sciences, Department of Intelligent Systems, Bulgaria) Helena Szczerbicka (Leibniz Universität, Germany) Ryszard Tadeusiewicz (AGH University of Science and Technology, Poland) Stanisław Tarasiewicz (University of Laval, Canada) Piotr Tatjewski (Warsaw University of Technology, Poland) Rene Wamkeue (University of Quebec, Canada) Janusz Zalewski (Florida Gulf Coast University, USA) Teresa Zielińska (Warsaw University of Technology, Poland)
Publisher:
Copyright © 2023 by Łukasiewicz Research Network - Industrial Research Institute for Automation and Measurements PIAP All rights reserved
1
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17, N˚1, 2023 DOI: 10.14313/JAMRIS/1-2023
Contents 3
45
A Model of Continual and Deep Learning for Aspect Based in Sentiment Analysis Dionis López, Fernando Artigas‑Fuentes DOI: 10.14313/JAMRIS/1‐2023/1
Hybrid Adaptive Beamforming Approach for Antenna Array Fed Parabolic Reflector for C‐Band Applications Sheetal Bawane, Debendra Kumar Panda DOI: 10.14313/JAMRIS/1‐2023/6
13
Feature Selection for the Low Industrial Yield of Cane Sugar Production Based on Rule Learning Algorithms Yohan Gil Rodríguez, Raisa Socorro Llanes, Alejandro Rosete, Lisandra Bravo Ilisástigui DOI: 10.14313/JAMRIS/1‐2023/2 22
Inverse Kinematics Model for a 18 Degrees of Freedom Robot Miguel Angel Ortega‑Palacios, Amparo Dora Palomino‑Merino, Fernando Reyes‑Cortes DOI: 10.14313/JAMRIS/1‐2023/3 30
EEG Signal Analysis for Monitoring Concentration of Operators Łukasz Rykała DOI: 10.14313/JAMRIS/1‐2023/4 40
Automated Anonymization of Sensitive Data on Production Unit Marcin Kujawa, Robert Piotrowski DOI: 10.14313/JAMRIS/1‐2023/5
2
51
Real-Time Face Mask Detection in Mass Gathering to reduce COVID-19 Spread Swapnil Soner, Ratnesh Litoriya, Ravi Khatri, Ali Asgar Hussain, Shreyas Pagrey, Sunil Kumar Kushwaha DOI: 10.14313/JAMRIS/1‐2023/7 59
People Tracking in Video Surveillance Systems Based on Artificial Intelligence Abir Nasry, Abderrahmane Ezzahout, Fouzia Omary DOI: 10.14313/JAMRIS/1‐2023/8 69
Model‐Free Sliding Mode Control for a Nonlinear Teleoperation System with Actuator Dynamics Henni Mansour Abdelwaheb, Kacimi Abderrahmane, Belaidi AEK DOI: 10.14313/JAMRIS/1‐2023/9
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
A MODEL OF CONTINUAL AND DEEP LEARNING FOR ASPECT BASED IN SENTIMENT ANALYSIS Submitted: 10th January 2023; accepted: 18th February 2023
Dionis López, Fernando Artigas‑Fuentes DOI: 10.14313/JAMRIS/1‐2023/1 Abstract: Sentiment analysis is a useful tool in several social and business contexts. Aspect sentiment classification is a subtask in sentiment analysis that gives information about features or aspects of people, entities, products, or services present in reviews. Different deep learning models that have been proposed to solve aspect sen‐ timent classification focus on a specific domain such as restaurant, hotel, or laptop reviews. However, there are few proposals for creating a single model with high performance in multiple domains. The continual learn‐ ing approach with neural networks has been used to solve aspect classification in multiple domains. However, avoiding low, aspect classification performance in contin‐ ual learning is challenging. As a consequence, potential neural network weight shifts in the learning process in different domains or datasets. In this paper, a novel aspect sentiment classification approach is proposed. Our approach combines a trans‐ former deep learning technique with a continual learning algorithm in different domains. The input layer used is the pretrained model Bidirectional Encoder Representations from Transformers. The experiments show the efficacy of our proposal with 78 % F1‐macro. Our results improve other approaches from the state‐of‐the‐art. Keywords: Continual Learning, Deep Learning, Catas‐ trophic Forgetting, Sentiment Analysis.
1. Introduction Sentiment analysis is a useful tool in several social and business contexts, such as: social net‐ works, online shops (Amazon1 , Alibaba2 ), blogs, etc. It is an important task in natural language pro‐ cessing (NLP) and natural language understanding (NLU) [15]. Aspect based sentiment analysis (ABSA) is a fundamental subtask in sentiment analysis where users and decision‐makers can obtain more informa‐ tion about sentiments in reviews [8]. An aspect term is related to features about products, services, events, and people [19]. ABSA has three essential subtasks: (i) opinion tar‐ get extraction (OTE), (ii) aspect category detection (ACD), and (iii) sentiment polarity (SP) or aspect sen‐ timent classi ication (ASC). Thus, OTE is mainly con‐ cerned with the extraction of aspect terms (i.e., entity or attribute), ACD is related to associate entities and
attributes to a global category (i.e., comfort or clean‐ ness in hotel domain), whereas ASC is focused on the sentiment polarity classi ication of aspects [8]. The ASC subtask has been studied by several researchers. They have been using deep learning approaches with better results [2, 39]. The proposed ASC models have been associated usually with a single domain; however, when they have been applied to dif‐ ferent ones, their effectiveness decreases [2]. Suitable F‐measure values were obtained with the same model when it is applied to different single domains [10]: laptops (83%), hotels (89%). Nevertheless, a retrieval system can process objects or instances from more than two domains. For that reason, approaches such as CL (CL) that is capable of learning in an incremental learning pro‐ cess from more than two domains have emerged [5]. It takes advantage of the local learning of sev‐ eral domains by identifying the main features or patterns found in the previous learning process with‐ out losing effectiveness (i.e., the price aspect is com‐ mon for restaurants, hotels, and electronic devices domains) [5, 6]. The constraint in the CL setting is that a compu‐ tational model would not be able to access the data from the previous tasks; it can only access a limited amount of information [5]. This learning problem is challenging. If the same model is retrained using the current available dataset DM , it will forget how to predict for datasets Dm ; m < M . This is known as the catastrophic forgetting problem. This occurs when networks are sequentially trained in many tasks; for instance, in task A, network weights can be modi ied by the learning process of a task B [22, 22]. Several proposals have tried to improve it in image classi ica‐ tion [9]. Nevertheless, there are few proposals to solve this challenge in the ABSA subtask [2]. In this paper, we propose a hybrid model that combines the continual and deep learning approaches for ASC. First, a text preprocess module extracts the aspect word candidates (i.e., noun, adverbs) and the proposed model classi ies each aspect into one of three possible classes: positive, negative, or neutral. Our model starts from a Bidirectional Encoder Repre‐ sentations from Transformers (BERT) [7] model and improves the CL disadvantages based on: ‐ Combining a CL regularization approach in NLP (i.e., ABSA) with a gradient descent modi ication algo‐ rithm to preserve relevant weights in a CL scenario.
2023 © López and Artigas-Fuentes. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
3
Journal of Automation, Mobile Robotics and Intelligent Systems
‐ Using the output of a pretrained BERT model to improve the results and tune the BERT model on the CL process. The rest of this paper is organized as follows. The subsection “Related Work” describes the methods based on deep and CL in ABSA. Section 2 presents the proposed model based on deep and CL. Section 3 dis‐ cusses the evaluation of the model with respect to the state‐of‐the‐art (SOTA). Toward the end, we provide concluding remarks and future research directions. 1.1. Related Work As mention in the previous section, CL represents a long‐standing challenge for machine learning and neural network systems [9]. This is due to the ten‐ dency of learning models to catastrophically forget existing knowledge when learning from novel obser‐ vations [22, 25]. The most common CL strategies [2, 6, 16, 21] are described below: ‐ Architectural strategy: Speci ic architectures, layers, activation functions, and/or weight freezing strate‐ gies are used to decrease forgetting. This adds other neural network architectures for each domain, as proposed in [14, 18]. ‐ Regularization strategy: The DL model loss function is extended with loss terms promoting selective con‐ solidation of the weights, which are important to retain memories. This strategy includes basic reg‐ ularization techniques such as weight speci ication, dropout, or early stopping, as described in [1]. ‐ Rehearsal strategy: Past information is periodically replayed to the model to strengthen connections for memories it has already learned. A common approach is part of the previous training data and interleaving them with new tasks or domains for future training, as described in [20]. The architectural and rehearsal strategies propose the creation of new structures for new domains [7,16]. In the case of rehearsal, it is necessary to preserve instances of the previous domains, which is also com‐ putationally expensive. A lower cost will be achieved by regularization [16], by not adding any architecture or additional memory during a learning process. The target of this research is the CL model with a regular‐ ization strategy. The Elastic Weight Consolidation model (EwC) is one of the more successful regularization approaches. It tries to control forgetting by selectively constrain‐ ing (i.e., freezing to some extent) the model weights, which are important for the previous tasks. The EwC regularization used a Fisher Information Matrix in a stochastic gradient descent (SGD) computation. The Fisher Information Matrix in a neural network is expensive, because of the need to preserve the all neural network weight in external memory. Other CL strategies, such as hard attention to the task (HARD) [29] and incremental moment matching (IMM) [14], are not better than in CL for sentiment analysis [35]. The synaptic intelligence (SI) model [38] is an optimization of EwC. In the SI approach, the neural 4
VOLUME 17,
N◦ 1
2023
network weights are calculated online when the SGD is applied. The SI reduces the EwC computational cost. To the best of our knowledge, there are no works in ABSA with SI. The architectural and regularization 1 (AR1) [16] is an SI optimization with batch instance results and last layer weight average computation. Although, AR1 was used in image classi ication tasks, their architecture and the model computational result was studied in this research. In sentiment analysis task, “lifelong learning mem‐ ory” (LLM), proposed in [35], is a CL regularization approach. It incorporates the mined knowledge into its learning process, where two types of knowledge are involved named aspect sentiment attention and context sentiment effect. Because only datasets of household appliances (reviews on cameras, laptops, smartphones, etc.) are applied, the CL model does not learn from diverse domains. The model performance of 82% F1‐macro suggests being taken as an element of comparison in our research. Another CL model is “knowledge accessibility net‐ work” (KAN) proposed in [12]. Its target is to classify sentiment in sentences (i.e., not ABSA substask) in a task incremental learning (TIL) scenario (e.g., in a CL process, each new task has new instances). The KAN model is closely related to the HARD model [35] (i.e., a CL architectural strategy and more expensive regu‐ larization strategy). Although, it is not similar to our research objective, KAN is of the more recent works in sentiment analysis with a CL approach. In the ASC subtask, there are several works that combine BERT and deep learning architecture, for instance, “local context focus with BERT” (LC) [37] receives the words that correspond to the sentence where the aspect appears and a set of words in the aspect neighborhood as input as an upper layer of the BERT model. It reaches an 82% accuracy with a laptop dataset. Another approach is the model Attentional Encoder Network with BERT(AE) [31] applies a multi‐ head attention architecture to the BERT model output. It has two inputs: the words of the context (a sentence) and the words that make up the aspect. It reaches an 83% Accuracy in a Restaurant dataset. The attentional encoder network with “long short‐ term memory networks” (LSTM) and identi ied with acronym AT [31] applies an attention mechanism and concatenates the aspects and their context. The AT enables aspects to participate in computing attention weights. It uses GloVe as input and LSTM as deep learning model. The above approaches have good classi ication measures; however, they are evaluated in a single domain scenario and not a CL process. Their deep learning architectures are interesting as base models in a CL framework. The sentiment analysis models improve services such as tourism or e‐commerce (e.g., analyzing sold product reviews). Building datasets for each domain is expensive (time, specialists). A model (such as the one proposed in this research) can be used in various social services (tourism, e‐commerce, government) and reduces model learning costs (time and memory) and economic resources.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
2. Content A new model for ASC in multidomains based on deep and CL with the regularization approach is intro‐ duced in this section. First, a formal de inition of the problem is given. Then, we introduce the different stages of the new model and its main inputs, outputs, and activities. In the ABSA subtask, given a sentence (sequence of words) wc = w1c , w2c , …, wnc , an aspect is a sequence t of words, de ined as wt = w1t , w2t , …, wm , where wt c e is a subsequence of w . The goal of this task is to predict the sentiment polarity “s” of aspect wt , where s ∈ positivo, negativo, neutral. The new model is evaluated in a particular CL setting called domain incremental learning (DIL) [34]. Each task is from a different domain or dataset; however, the classes are the same (i.e., positivo,negativo,neutral). The DIL setting is particularly suited to ASC because in testing the system, it need not know the task/domain to which the test data belongs. A CL model with the regularization approach has three main components [6]: ‐ A base machine learning model (e.g., CNN, BLSTM, or a pretrained model such as Resnet). ‐ The CL approach to preserve knowledge (e.g., weights in a neural network, based classi ier) during the learning process from one domain to another. ‐ The knowledge base, which is usually the common or a new part of the model (e.g., neural network weights do not vary between learning domains, new neurons inserted to the neural network) to be added according to the CL approach. 2.1. Model Description and Main Stages The model components are presented in 1. In this, the CL model is represented at the top (i.e., labeled with “CL model”). This model preserves the knowl‐ edge learned during model training in the previous domains. The base model is represented by the square named “Bert neural network” in the middle of the ig‐ ure, and the classi ication neural node gives the input values to the next CL model. The bottom showns how the domains were used by the CL process. The model training process has four stages: Stage 1. Textual representation: This stage receives the original textual opinions and returns the vector of tokens per each sentence (applying a sen‐ tence splitter) and the aspects in a sentence (nouns, adjectives and noun phrases). The aspects are selected by a part‐of‐speech (POS) tagger. Although in our model word tags (i.e., noun, adjective) are used to identify aspects, there are other approaches with a Deep Learning model (or other Machine Learning approach) that are used in aspect extraction process as in [17]. If a sentence has more than an aspect, then is associated a sentence vector of tokens for each aspect. The spaCy3 NLP tool offers the implementation and documentation for developing this stage. Stage 2. Train the CL model: This stage learns the knowledge to be included in the knowledge base,
Figure 1. Input and output information representation in the CL model
depending on BERT and the CL training process for each current domain. This stage is divided into the following steps: ‐ Train the BERT neural network for each domain, where the output is a classify neuron. ‐ Train the CL approach with the pretrained BERT model output last layer. The stage input is the tokens vector in a sentence and the tokens vector associated with an aspect, from Stage 1. In this stage, the BERT embedding input layer is built. To build the input vector to the BERT pre‐model, a vector of weights is obtained from a vector model of words (i.e., WordPiece) and the position of each token in the sentence. This vector is the irst layer of the deep learn‐ ing base model represented by BERT. Traditional Word2Vec or GloVe embedding layers provide a single context independent representation for each token. On the contrary, in BERT, the representation of each token is related to the data obtained from the sentence used as input [7]. This allows having more information about the word context when training the models. The stage’s output is a neuron associated with the classi ication token in the BERT last layer. Stage 3. Knowledge base upgrade: In this stage, catastrophic forgetting is avoided through the anal‐ ysis of the training process results. The input is the classi ication neuron value (Stage 2 output) to feed a layer with three neurons (e.g., positive, negative, neutral). The obtained loss/error is used in the weight optimization process by the regularization strategy. The output is the KB vector updated with the new weights obtained from the CL approach and BERT neural network. Stage 4. Aspect classi ier creation: This stage makes available the continual deep learning model for solving the ABSA subtask in multidomains. The inal con iguration of the model is obtained from the parameters in the knowledge base. 5
Journal of Automation, Mobile Robotics and Intelligent Systems
A CL regularization approach has a machine learn‐ ing base model for learning raw features in each task or domain (datasets). BERT Special (BSp) [31] is the base model in the new CL model proposed in this article. It is selected by the attention mechanism (i.e., the transformers) in the BERT neural network archi‐ tecture. The BSp associates the words in a sentence and the aspect (represented by “BERT Neural Net‐ work” in 1). The new CL model is general enough to replace this base model with others, as shown in Results section. The BSp method constructs the input sequence as: < CLS > tokens < SEP > asp < SEP >
(1)
The first token of every input sequence is the special classification embedding (< CLS >), and it separates the context words associated to sentence words and aspect words (i.e., asp) with a special token (< SEP >). In the BERT neural network, the output of the neuron associated with the token < CLS > in the last layer is the classi ication value in the Stage 2 learning process. The base model architecture was val‐ idated in the experiments against complex models as AE, which reaches 73% F1‐macro for a single dataset about restaurant reviews [31]. 2.2. The Continual Learning Model The new proposal, named Lifelong Learning of Aspects (LLA), is inspired by AR1 [16] and the Synap‐ tic Intelligence (SI) [38] learning process, because they achieve better classi ication results [24] (i.e., the LLA is represented in the 1 with “ Lifelong Learning of Aspects” label). Although both models were applied to image clas‐ si ication task, we adapted them for NLP and clas‐ sifying aspects (e.g., words in a sentence) in three classes (positive, negative, and neutral) for different datasets during the learning process. AR1 improves SI [16] in image classi ication challenges; however, in our model, we combine the updated descendent gradient and weights preservation mechanism from SI in the AR1 regularization model. The new CL model LLA combined with BERT (as base model) has the acronym (BSpLLA ), and it is the new computational model proposed. In contrast to the original AR1 proposal in [21], our approach does not extend with new classes for each new domain. For each domain, the same three classes already mentioned are used. The main objective of LLA is to obtain the set of weights in the output layer w ⃗ as shown in 1. One of the main settings of this algorithm is that w ⃗ is initialized to 0 (as input) and does not use any random initialization as in other CL approaches, inspired in the AR1 approach. The parameters of each output layer from previous domains are stored in w. ⃗ In the CL process, the base deep learning model is represented by BSp, and their output layer is the LLA CL model input (i.e., the AR1 modi ication for aspect classi ication in the NLP context). The LLA model loss function and optimization process tune the neural net‐ work weights in both models (BSp and LLA) during the learning process. 6
VOLUME 17,
N◦ 1
2023
Algorithm 1 LLA Input: cw ⃗ = 0▷ The consolidated weights used for inference. M̄ = 0 ▷ The deep learning base algorithm weights. M = 0 ▷ The optimal shared weights resulting from training. F̂ = 0 ▷ he weight importance matrix (SI algorithm). T ext ▷ Domains sentences dataset. Output: cw ⃗ ▷ The trained weights used for inferencing. 1: In the T ext extracts, the sentence x and its candi‐ date word aspect y are group in batches B. 2: loop ▷ For each batch in B, process all pair (x, y). 3: Train the base deep learning model with pair (x, y). 4: Learn M̄ and cw ⃗ subject to SI algorithm with F̂ and M . 5: Save weigths in M with M = M̄ and cw. ⃗ 6: Update F̂ according to trajectories computed on the batch Bi . 7: Test the model by using M̄ and cw. ⃗ 8: end loop
The BSp is trained by using each B batch of sen‐ tences in the datasets, as shown in line 3. The CL approach is described in lines 4 and 5. Line 6 updates the parameters of the neural network using the regularization and gradient descent. The w ⃗ vector is the knowledge base in the new classi ication domain. The combination between F̂ weight importance and w ⃗ vector is the regularization approach to reduce catastrophic forgetting because it has the balance of the learning process in each domain and the output layer evaluation in the classi ication algorithm. The BERT and LLA combination adopted the SI mechanism [38] to compute the weight importance during SGD. This mechanism is important in LLA to preserve the common neuronal weight (transfer learning) in BERT tuning and the model learning. The loss function in LLA is de ined as in [38]: L̃µ = Lµ + c
∑ k
( ′ )2 Ωµk θk − θk
(2)
Where k is the neural networks weights param‐ eters, c is a strength parameter that trades off old versus new tasks,Ωµk is the parameters regularization ′
2
strenght, and (θk − θk ) is the reference weight corre‐ sponding to the parameters at the end of the previous task and the current task.
3. Experiments and Results Experiments were designed to compare the ASC performance of the new model LLA against the CL SOTA [6, 24]. In our experimental design, each deep learning model for ABSA was used as the base approach in each CL model (combinations between deep learning and CL models).
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
To verify the effectiveness of our proposal against the SOTA approaches, the following experiments were conducted: ‐ Compare the LLA approach with the main models of the SOTA. ‐ Analyze if the domain order affects the quality of the results. 3.1. Datasets and Continual Learning Scenario To evaluate the performance of BSpLLA , exper‐ iments are conducted using seven of the most used ABSA datasets, as described in Table 1. They were taken from four sources: laptops and restaurants, from SemEval‐2014 Task 4 subtask 2 [26], datasets about electronic devices used in [27], and hotels reviews from TripAdvisor [32]. The considered training and testing subsets are the same as those de ined by the datasets’ authors. Dataset instances are sentences and can contain more than one word tagged as aspect. During model training and testing, sentences that have more than one aspect are split into a sentence (i.e., the same sentence text) with only one aspect present. The CL scenario is DIL, all tasks sharing the same ixed classes (i.e., positive, negative, and neutral). A factor that could in luence the results of BSpLLA and the SOTA is the possible semantic closeness of the domains. For instance, restaurant and hotel reviews could be semantically closed (reviews with words related to cleanliness, comfort, price). The used datasets (domains), in the BSpLLA learning process, were grouped by humans. These can be considered clusters. We estimate the domain cen‐ troid (the mean of the BERT output vectors for all sentences in each domain) and the cosine similarity of the centroids [30], with the objective to estimate semantic closeness. The results are showed in 2. In the igure, the similarity indicates restaurant and hotel review domains are close. However, others as routers, laptops are not close to restaurants. Other important measures to evaluate the qual‐ ity of the clusters is the silhouette coef icient. It has performance measure in the interval [−1, 1], and values near zero, indicate overlapping clusters [33]. The dataset using the silhouette coef icient (with cosine similarity) was ‐0.017. In 2, the similarity indi‐ cates restaurant and hotel review domains are close. However, others as routers, laptop are not close to restaurant.
Figure 2. Cosine similarity between domain centroids The value near zero indicates that there are some sentences close semantically among domains, and the negative value of a sample has been assigned to the wrong cluster (the datasets were created by humans and not by the same authors). This value indicates that there is common information between domains and con irms the performance of the BSpLLA model. Because, it learns common aspects in ASC between different domains and reduces forgetting the past knowledge. 3.2. Compared Baselines The new proposal BSpLLA was evaluated against three SOTA strategies of CL with the regularization approach (see model descriptions in Table 3): ‐ “Lifelong Learning Memory” (LLM) ‐ “Elastic Weight Consolidation” (EwC). ‐ “Architectural and Regularization 1” (AR1) The CL regularization approach employs a base model (e.g., deep learning model) for learning the raw features of datasets. During the evaluation, each of the CL models mentioned was combined with the deep learning approach used in ABSA: ‐ “Local Context Focus with BERT” (LC) [37]. ‐ “Attentional Encoder Network with BERT” (AE) [31]. ‐ “Attentional Encoder Network with LSTM” (AT) [11]. These methods were selected because they have relevant accuracy performance in ABSA, and they have as input a BERT pretrained model or GloVe input as in AT. 3.3. Hyperparameters
Table 1. Labeled dataset description (Sent = Sentences, Aspect = Aspects). Domain Sentences Aspects Train Test Digital Cameras 597 237 477 120 Smart Phones 546 302 436 110 Routers 701 307 877 176 Speakers 687 440 549 138 Restaurants 3841 4722 3041 800 Laptops 3845 2951 3045 800 Hotels 4856 3810 3371 1485
The pretrained uncased BERT base model was applied in the learning process.4 The neural network architecture was 12‐layer, 768‐hidden, 12‐heads, 110M parameters, trained on lower‐cased English text. GloVe pretrained with 300 vectors was used in the Word Embedding. The weights of the LLM , EW C, LLA, and AR1 models are initialized with the Glorot initialization,5 whereas the coe icient of L2 regular‐ ization is 10−5 and the dropout rate is 0.1. The BERT model was implemented by pythorch library trans‐ formers 2.1.0.6 7
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 2. Example of imbalance between classes in used datasets. Domain Restaurants Laptops Hotels
Positive 2892 1328 2343
Negative 1001 994 656
Neutral 829 629 811
VOLUME 17,
AE LLA
3.4. Evaluation Measures Taking into account that the selected datasets are imbalanced (see an example in Table 2), we will use F1‐macro (averaged F1‐score over all classes) in addi‐ tion to accuracy (Accr) in the experimentation. The performance of the proposed model was also evaluated with the Cohen‐Kappa (Kappa) mea‐ sure [40]. This selection is motivated because it allows considering the effectiveness of a model in imbalanced datasets [40]. Kappa is computed as shown Equa‐ tion 3, where ρo is the model probability of the label assigned to any sample, and ρe is the expected label assign by annotators: K = (ρo − ρe )/(1 − ρe )
(3)
Kappa gives values in the interval [−1, 1], where 0 or lower values mean not relevant model training [40]. 3.5. Catastrophic Forgetting Evaluation Measure Several authors have proposed different catas‐ trophic forgetting measures. The de inition of a stan‐ dard measure is a research challenge [2, 6, 23]. The measure selected to evaluate catastrophic for‐ getting in this research was proposed in [12], because this is a CL work oriented to sentiment analysis in sentences, very close to the target in this research. The catastrophic forgetting measure in [12] average the result of the inal classi ier in the test sets of the tasks before the last one. This measure is named OvrcF orgtt in this research. 3.6. Evaluation Settings Two con igurations were taken into account dur‐ ing the experiments. Initially, a test was analyzed with‐ out adjusting the BERT architecture weights. This is because the BERT is supported by a neural network architecture, and it is a pretrained model. But this did yield low classi ication results (accuracy), and it was rejected. As a inal con iguration, the weights of BERT architecture and the deep learning model were adjusted during the backpropagation steps. Other authors as in [28] exploit this possibility of training 8
2023
Table 3. Acronyms and names of the models considered in the evaluation of the new proposal. Acronyms LLM AE EW C
All combinations of deep learning and CL models have been trained with a batch size equal to 64 and 10 epochs, for all datasets (i.e., Each dataset (domain) trains the model for 10 epochs and uses a batch of 64 instances). The optimization function was an Adam with a 2e‐5 learning rate. The training and evaluation processes were made using a 2 x Intel Xeon L5520 with 64 GB RAM on the high performance computing cluster at Central University of Las Villas, Cuba. The source code is public.7
N◦ 1
AE AR1 BSpEW C BSpLLA AT EW C AT LLA LC LLA
Compared Baselines Lifelong Learning Memory. Attentional Encoder Network with BERT and EwC. Attentional Encoder Network with BERT and the new model LLA. Attentional Encoder Network with BERT and AR1. BERT Special with EwC. BERT Special with the new model LLA. Attentional Encoder Network with LSTM and EwC. Attentional Encoder Network with LSTM and the new model LLA. Local Context Focus with BERT and the new model LLA.
Table 4. Average results using different deep learning base models and the LLA (CL approach). Model Accr F1 Kappa OvrcForgtt
AELLA 0.69 0.49 0.40 0.497
ATLLA 0.64 0.38 0.12 0.38
LCLLA 0.79 0.66 0.59 0.66
BSpLLA 0.80 0.73 0.62 0.73
Table 5. Average results between BSpLLA and other in SOTA. Model LLM AEEW C AEAR1 BSpEW C BSpLLA Accr 0.39 0.68 0.57 0.61 0.80 F1 0.23 0.50 0.33 0.62 0.73 Kappa 0.03 0.42 0.16 0.53 0.62 OvrcForgtt 0.23 0.49 0.66 0.62 0.73
BERT to obtain ef icient classi ication results in ABSA for speci ic domains. The evaluation results are the performance mea‐ sure average of all possible domain permutations in the CL process. The BSpLLA model obtains the best results against other base models with the same CL approach (i.e., LLA), as shown in Table 4. Besides, it is possible to modify or improve our base model in the future and use the LLA proposal as part of a general framework. The results obtained by BSpLLA outperform the rest of the models and demonstrate that our LLA approach can improve results in ASC during a CL of multiple domain scenario (as shown in Table 5). In all tests (Tables 4‐5), BERT‐based models per‐ form better than Word Embeddings (i.e., AT LLA and AT EW C have word embedding vectors as input), because BERT takes better account of the context where an aspect occurs. In an ablation study, the BSp (Base deep learn‐ ing model), without the LLA algorithm (CL approach) in the same evaluation scenario as BSpLLA , as shown in Table 6. The BSpLLA results were better, as an in luence of the CL approach. This experimentation
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 6. Ablation averaged experimental results between BSpLLA and BSp. Model Accr F1 Kappa OvrcForgtt
LLA 0.64 0.52 0.39 0.52
BSpLLA 0.80 0.73 0.62 0.73
Figure 3. Ranking models with Holm’s test with a significance level of 0.05 for F‐measure
VOLUME 17,
N◦ 1
2023
Table 7. Results in [13] against BSpLLA . Modelo Accr F1
CLASSIC [13] 0.90 0.85
BSpLLA 0.80 0.73
model is proposed that follows the Learning by Con‐ trast strategy [4] and is named CLASSIC, modifying the BERT architecture at two points (i.e., adding two fully connected network layers) and only adjusting the weights of these new components during training. According to the authors of CLASSIC, this model performs better than LLA (see Table 7). But when the model proposal and their evaluation method was ana‐ lyzed, notable differences were observed concerning those used in LLA model: ‐ Experimentation with 19 datasets (13 more than those used in BSpLLA ). ‐ In the CLASSIC model training, the process took ive datasets randomly to estimate the experimental results. But these ive datasets are not named in [13] and cannot be compared with those of LLA model training.
Figure 4. Ranking models with Holm’s test with a significance level of 0.05 for Kappa measure shows that the LLA algorithm cannot be eliminated without loss of effectiveness. The Friedman test and Holm’s method, for the post‐hoc analysis [36], were used to verifying signif‐ icant differences between the models by measuring Accuracy, Kappa, and F1 (see Figures 3 and 4 ). The experiments show that BSpLLA has no signi icant differences from other SOTA approaches. The execution time of the methods was estimated during each test. The training time of BERT‐based methods (24 hours’ average) was longer than Word Embedding‐ based methods (three hours’ average). This time is associated with the complex transformer architecture and the attention mechanism learning process. The experimentations show that a less complex architecture such as BSpLLA obtains better results than others (i.e., AE LLA , AE AR1 ). The difference is that in BSpLLA the input to the BERT model is the context (words in a sentence) and the aspect. The successful results are due to three main character‐ istics: the BERT is the base model inBSpLLA and has an attention mechanism with weights obtained in huge datasets and the LLA´s regularization approach to avoid changing the values of the weights in a CL process. 3.7. The BSpLLA Evaluation with a State‐of‐the‐art Recent Proposal The proposal presented in [13] constitutes one of the most recent state‐of‐the‐art proposals. In these, a
‐ In [13] the number of datasets to train the model (i.e., in a CL framework) was not declared. This value will determine if the sample size is signi icant. ‐ There are different adjustments to the CLASSIC neu‐ ral network hyperparameters in the learning pro‐ cess, for instance, the number of 30 epochs for the electrical device datasets and 10 for the laptop and restaurant review dataset, as a consequence of instances number. ‐ For the all datasets, a similar hyperparameters (e.g., epoch, batch, etc.) was used by the LLA model in the training process. ‐ A different form of input to the BERT architecture was found by code analysis of the CLASSIC model. It is the opposite of that used in LLA input. ‐ To estimate catastrophic forgetting, CLASSIC was used in the measure proposed in [3], which differs from that used in LLA model experimentation. ‐ In CLASSIC, there is no semantic closeness analysis of the datasets. It does not determine if the learning and the inal results are on close datasets or not. The generalization (i.e., same neural network hyperparameters for all datasets) is relevant in a CL model training framework (e.g., a homogeneous training process in incremental learning). Another dis‐ advantage of con iguration changes is the need to dis‐ tinguish the type of datasets to adjust the settings (e.g., epochs number) because it is necessary to use another model or external tool for this purpose (i.e., increased computational cost in training time and memory). Based on these differences, a comparative exper‐ iment was realized for both models. The evaluated criteria were: 1) A comparison of both models (i.e., CLASSIC and BSpLLA ) on the same training and test datasets for Continuous Learning (The datasets used by 9
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 8. The experiment results to estimate the best performance between [13] and BSpLLA . Experiment name Same-phd Same-parameters Invert-input
CLASSIC 0.311 0.182 0.311
BSpLLA 0.316 0.316 0.316
the BSpLLA model training process, because they have a semantic closeness study). 2) Use of the same measure to estimate catastrophic forgetting and proposed in CLASSIC. 3) Use of the same hyperparameters for all datasets as in BSpLLA . 4) The model classi ication effectivity was estimated based on the averaged values of the F1‐macro. In the experiment with the same dataset (as shown Table 8 for “Same‐PhD” results), the value of the BSpLLA model did not obtain a signi icant differ‐ ence (i.e., BSpLLA has a 0.005). However, during this experiment, the CLASSIC model maintained different hyperparameter values for datasets (e.g., the dataset of the electronic device has a higher epoch and batch than the laptop or restaurant dataset). The hyperparameter values in luence the results because the model can learn better by performing a more extensive search for better solutions (depending on the type of dataset or domain). It is a disadvan‐ tage of the CLASSIC model concerning the BSpLLA , as explained above. The CLASSIC disadvantages to reducing catas‐ trophic forgetting are shown by the result of the exper‐ iment where the same hyperparameters were kept for all datasets (See Table 8 for “Same‐parameters” results). The difference between the results of these models was 0.134, and the CLASSIC model is not better than LLA because it does not generalize the hyperpa‐ rameters for all sets of variables and does not selec‐ tively preserve neural network weights during CL (as BSpLLA ). The experiment value “Invert‐input” obtains a sim‐ ilar outcome to the experiment named “Same‐PhD”. Because in “Invert‐input” the neural network hyper‐ parameters as in CLASSIC have been maintained. The main difference in “Invert‐input” is that the form of representation of the input data to the CLASSIC model was similar to the LLA training. This exper‐ iment shows that this modi ication did not have a results impact. Finally, the results of the experiments (Table 8) shown that the BSpLLA model has a positive in lu‐ ence on avoiding catastrophic forgetting and the inal results because it avoids changes in the neural network weights value during the calculation of the gradient descent. This conclusion was established by analyzing Equation 2. In this equation, terms such as Ωµk allow to compensate or avoid weight changes on Gradient Descent. Although omission or forgetting occurs when dif‐ ferent instances appear in new domains, weight 10
VOLUME 17,
N◦ 1
2023
compensation is fundamental in a CL process. It allows previous knowledge not to be complete or partially modi ied. The adjustment of the weights of a part of the BERT neural network (as was proposed by CLASSIC) is not better than a CL model based on regularization (i.e., BSpLLA), which preserves the weight values in the last layer associated with the aspect classi ication process. In CLASSIC, two new components were added to the BERT network architecture, only in a speci ic part. Therefore, the neural network weight updating only in these components decreases computational cost in terms of execution time in training. However, this model has no compensation or regularization during neural network weight updating. Models with BERT’s output as their input or base model perform better than those that use Word Embeddings. This result is similar to those reported in SOTA [7,8] and is associated with the architecture that follows the BERT model and its learning process. Results obtained by applying the Friedman and Holm’s tests do not show signi icant differences from other SOTA approaches. However, the results validate the selection of the SI approach as a catastrophic for‐ getting reducing mechanism and it has a lower compu‐ tational cost than EwC [25]. The SI weight importance update method during SGD is part of BSpLLA . Although the experimentation did not evaluate models that follow the few‐shot learning approach, the BSpLLA outperformed other SOTA models with high classi ication measure values as a result of their architecture and main components.
4. Conclusion In state‐of‐art, several sentiment analysis models are trained on a single domain (i.e., restaurant or hotel reviews) or dataset. The effectiveness decreases when these models learn patterns in a new domain in a CL framework. This paper presents a novel model that combines an attentional deep learning approach with a CL model to classify aspects in the sentiment analysis context. This model allows improvement of products and services (as part of information retrieval systems) in areas such as tourism, government, and health. The model learning process uses data from multiple domains, and retains common information patterns for a new domain with relevant results (F1‐macro = 73%). The input layer was the pretrained model BERT. The CL approach was named Lifelong Learning of Aspects (LLA) with a regularization approach. The evaluation results are better than those obtained by the existing regularization approaches such as EWC, AR1, and CLASSIC. The LLA reduces catastrophic forgetting in the multi‐domain context, and it is a novel approach in the ABSA context. Although the dataset’s order in luence on the learning process has been evaluated, it is nec‐ essary to deepen these experiments. There are few studies on the linguistic rule’s effec‐ tiveness in classifying aspects of the Sentiment Analy‐ sis task. However, the use of the linguistic rules, in the
Journal of Automation, Mobile Robotics and Intelligent Systems
deep and Continual Learning models combinations, is an interesting methodology that could be evaluated given the small number of labeled datasets in multiple domains. The future work areas will be extending our approach to other language models such as Spanish and combining it with linguistic rules and few‐shot learning strategies.
Notes 1 https://amazon.com/ 2 https://alibaba.com/ 3 https://spacy.io 4 https://huggingface.co/models 5 Follow a uniform distribution U (−1, 1) at the time of assigning
VOLUME 17,
N◦ 1
2023
[5] Z. Chen, and B. Liu. “Lifelong machine learning”, Synthesis Lectures on Arti icial Intelligence and Machine Learning, vol. 12, no. 3, 2018, pp. 1–207. [6] M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuyte‐ laars. “A continual learning survey: Defying for‐ getting in classi ication tasks”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021. [7] J. Devlin, M.‐W. Chang, K. Lee, and K. Toutanova. “Bert: Pre‐training of Deep Bidirectional Trans‐ formers for Language Understanding”, arXiv preprint arXiv:1810.04805, 2018.
the values to the initial weights of the network 6 https://pypi.org/project/transformers/2.1.0/ 7 https://github.com/dionis/ABSA‐DeepMultidomain/
[8] H. H. Do, P. Prasad, A. Maag, and A. Alsadoon. “Deep learning for aspect‐based sentiment anal‐ ysis: a comparative review”, Expert Systems with Applications, vol. 118, 2019, pp. 272–299.
AUTHORS Dionis López∗ – Faculty of Engineering in Telecommunications, Informatics and Biomedical, Universidad de Oriente Ave. Patricio Lumumba s/n, Santiago de Cuba, Cuba, e‐mail: dionis@uo.edu.cu, www: https://www.linkedin.com/in/ ~dionis‐lopez‐ ramos. Fernando Artigas-Fuentes – Center for Neuroscience Studies and Image and Signal Processing, Faculty of Engineering in Telecommunications, Informatics and Biomedical, Universidad de Oriente Ave. Patricio Lumumba s/n, Santiago de Cuba, Cuba, e‐mail: arti‐ gas@uo.edu.cu.
[9] R. M. French. “Catastrophic forgetting in connec‐ tionist networks”, Trends in Cognitive Sciences, vol. 3, no. 4, 1999, pp. 128–135.
∗
Corresponding author
ACKNOWLEDGEMENTS This work was supported by “Center of Informatics Research,” Universidad Central “Marta Abreu” de Las Villas, Santa Clara, Cuba.
References [1] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars. “Memory aware synapses: Learning what (not) to forget”, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 139–154. [2] M. Biesialska, K. Biesialska, and M. R. Costa‐jussà. “Continual lifelong learning in natural language processing: A survey”, Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 6523–6541. [3] A. Chaudhry, P. K. Dokania, T. Ajanthan, and P. H. Torr. “Riemannian walk for incremental learning: Understanding forgetting and intransi‐ gence”, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 532–547. [4] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. “A simple framework for contrastive learning of visual representations”, International Conference on Machine Learning, 2020, pp. 1597–1607.
[10] M. Hoang and A. Bihorac. “Aspect‐based sen‐ timent analysis using the pre‐trained language model BERT”, 2019. [11] M. Huang, Y. Wang, X. Zhu, and L. Zhao. “Attention‐based LSTM for aspect‐level sentiment classi ication”, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, USA, 2016, pp. 606–615. [12] Z. Ke, B. Liu, H. Wang, and L. Shu. “Continual learning with knowledge transfer for sentiment classi ication”, Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, vol. 3, 2020, pp. 683–698. [13] Z. Ke, B. Liu, H. Xu, and L. Shu. “CLASSIC: Con‐ tinual and contrastive learning of aspect sen‐ timent classi ication tasks”, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 6871–6883. [14] S.‐W. Lee, J.‐H. Kim, J. Jun, J.‐W. Ha, and B.‐T. Zhang. “Overcoming catastrophic forgetting by incremental moment matching”, Advances in neural information processing systems, vol. 30, 2017. [15] B. Liu, Sentiment analysis: Mining opinions, sentiments, and emotions, Cambridge University Press, 2020. [16] V. Lomonaco. Continual learning with deep architectures. PhD thesis, Universidad de Bologna, Italia, 2019. [17] D. López and L. Arco. “Multi‐domain aspect extraction based on deep and lifelong learning”, Iberoamerican Congress on Pattern Recognition, 2019, pp. 556–565. [18] D. Lopez‐Paz. “Gradient episodic memory for continual learning”, Advances in Neural Information Processing Systems, 2017, pp. 6467–6476. 11
Journal of Automation, Mobile Robotics and Intelligent Systems
N◦ 1
2023
[19] D. López Ramos and L. Arco García. “Aprendizaje profundo para la extracción de aspectos en opin‐ iones textuales”, Revista Cubana de Ciencias Informáticas, vol. 13, no. 2, 2019, pp. 105–145.
[31] Y. Song, J. Wang, T. Jiang, Z. Liu, and Y. Rao. “Atten‐ tional encoder network for targeted sentiment classi ication”, arXiv preprint arXiv:1902.09314, 2019.
[20] A. Mallya, and S. Lazebnik. “Packnet: Adding multiple tasks to a single network by iterative pruning”, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
[32] F. Tang, L. Fu, B. Yao, and W. Xu. “Aspect based ine‐grained sentiment analysis for online reviews”, Information Sciences, vol. 488, 2019, pp. 190–204.
[21] D. Maltoni, and V. Lomonaco. “Continuous learn‐ ing in single‐incremental‐task scenarios”, Neural Networks, vol. 116, 2019, pp. 56–73. [22] M. McCloskey and N. J. Cohen. “Catastrophic interference in connectionist networks: The sequential learning problem”, Psychology of learning and motivation, vol. 24, 1989, pp. 109–165. [23] A. Nazir, Y. Rao, L. Wu, and L. Sun. “Issues and challenges of aspect‐based sentiment analysis: a comprehensive survey”, IEEE Transactions on Affective Computing, vol. 13, no. 2, 2020. [24] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter. “Continual lifelong learning with neural networks: a review”, Neural Networks, vol. 113, 2019, pp. 54–71. [25] G. I. Parisi and V. Lomonaco. “Online continual learning on sequences”. Recent Trends in Learning From Data, pp. 197–221. New York Springer, 2020. [26] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopoulos, and S. Manandhar. “Semeval‐2014 task 4: aspect based sentiment analysis”, Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 27–35. [27] Y. Ren, Y. Zhang, M. Zhang, and D. Ji. “Improv‐ ing twitter sentiment classi ication using topic‐ enriched multi‐prototype word embeddings”, Thirtieth AAAI Conference on Arti icial Intelligence, 2016. [28] A. Rietzler, S. Stabinger, P. Opitz, and S. Engl. “Adapt or get left behind: domain adapta‐ tion through bert language model inetuning for aspect‐target sentiment classi ication”, arXiv preprint arXiv:1908.11860, 2019. [29] J. Serra, D. Suris, M. Miron, and A. Karatzoglou. “Overcoming catastrophic forgetting with hard attention to the task”, International Conference on Machine Learning, 2018, pp. 4548–4557. [30] R. Singh, and S. Singh. “Text similarity measures in news articles by vector space model using nlp”, The Institution of Engineers (India): Series B, vol. 102, no. 2, 2021, pp. 329–338.
12
VOLUME 17,
[33] E. Terra, A. Mohammed, and H. Hefny. “An approach for textual based clustering using word embedding”. Machine Learning and Big Data Analytics Paradigms: Analysis, Applications and Challenges, pp. 261–280. Springer, 2021. [34] G. M. Van de Ven, and A. S. Tolias. “Three sce‐ narios for continual learning”, NeurIPS Continual Learning Workshop, vol. 1, no. 9, 2018. [35] S. Wang, G. Lv, S. Mazumder, G. Fei, and B. Liu. “Lifelong learning memory networks for aspect sentiment classi ication”, 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 861–870. [36] F. Wu, X.‐Y. Jing, Z. Wu, Y. Ji, X. Dong, X. Luo, Q. Huang, and R. Wang, “Modality‐speci ic and shared generative adversarial network for cross‐ modal retrieval”, Pattern Recognition, vol. 104, 2020, 107335. [37] B. Zeng, H. Yang, R. Xu, W. Zhou, and X. Han. “LCF: a local context focus mechanism for aspect‐ based sentiment classi ication”, Applied Sciences, vol. 9, no. 16, 2019, 3389. [38] F. Zenke, B. Poole, and S. Ganguli. “Continual learning through synaptic intelligence”, International Conference on Machine Learning, 2017, pp. 3987–3995. [39] J. Zhou, J. X. Huang, Q. Chen, Q. V. Hu, T. Wang, and L. He. “Deep Learning for aspect‐level sentiment classi ication: survey, vision and challenges”, IEEE Access, vol. 7, 2019, pp. 78454–78483. [40] K. M. Zorn, D. H. Foil, T. R. Lane, D. P. Russo, W. Hillwalker, D. J. Feifarek, F. Jones, W. D. Klaren, A. M. Brinkman, and S. Ekins. “Machine learn‐ ing models for estrogen receptor bioactivity and endocrine disruption prediction”, Environmental Science & Technology, vol. 54, no. 19, 2020, pp. 12202–12213.
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
FEATURE SELECTION FOR THE LOW INDUSTRIAL YIELD OF CANE SUGAR PRODUCTION BASED ON RULE LEARNING ALGORITHMS Submitted: 10th August 2022; accepted: 19th October 2022
Yohan Gil Rodríguez, Raisa Socorro Llanes, Alejandro Rosete, Lisandra Bravo Ilisástigui DOI: 10.14313/JAMRIS/1‐2023/2 Abstract: This article presents a model based on machine learning for the selection of the characteristics that most influence the low industrial yield of cane sugar production in Cuba. The set of data used in this work corresponds to a period of ten years of sugar harvests from 2010 to 2019. A pro‐ cess of understanding the business and of understand‐ ing and preparing the data is carried out. The accuracy of six rule learning algorithms is evaluated: CONJUNC‐ TIVERULE, DECISIONTABLE, RIDOR, FURIA, PART and JRIP. The results obtained allow us to identify: R417, R379, R378, R419a, R410, R613, R1427 and R380, as the indi‐ cators that most influence low industrial performance. Keywords: Feature selection, Rule learning, Data mining, CRISP‐DM, Industrial yield.
1. Introduction The increase in the volume and variety of infor‐ mation that is computerized in digital databases and other sources has grown signi icantly in recent decades. Much of this information is historical, that is, it represents transactions or situations that have occurred. Apart from its function of organizational memory, historical information is useful for explain‐ ing the past, understanding the present, and predict‐ ing future information. Most of the decisions of com‐ panies, organizations, and institutions are also based on information about past experiences extracted from very diverse sources. In addition, since the data can come from different sources and may belong to differ‐ ent domains, the imminent need to analyze them to obtain useful information for the organization seems clear [17]. In many situations, the traditional method of turn‐ ing data into knowledge involves manual analysis and interpretation. This way of acting is slow, expensive, and subjective. In fact, manual analysis is impracti‐ cable in domains where the volume of data is grow‐ ing: the enormous abundance of data overwhelms the human capacity to understand it without the help of powerful tools. Consequently, many important deci‐ sions are made, not on the basis of the large amount of data available, but rather following the user’s own intuition as they do not have the necessary tools. This
is the main task of data mining: to solve problems by analyzing the data present in the databases [17]. In the Cuban sugar industry there is a large database that needs to be used effectively to guide productive development towards more pro itable sce‐ narios. The correct use of this information would help decision‐making with objective bases. The Cuban sugar sector needs to implement methods that allow people to quantify with greater precision the in lu‐ ence of the technological variables of the process on industrial performance. It is necessary to foresee the behavior of its production process in order to plan and optimize the use of technical, human, and inancial resources to improve those technological variables that have the greatest weight on industrial perfor‐ mance [27]. At present, as an important step in data pre‐ processing, feature selection has become a popular research direction [29]. It also allows one to remove redundant/irrelevant features and keep some impor‐ tant features in the data. In view of this, we can improve the classi ication accuracy and speed up the model building procedure [15]. In this work, the characteristics of the process that in luence the low industrial performance are deter‐ mined using data mining techniques. The CRISPDM methodology and the KNIME tool are used for the development of this research. The article is structured in ive sections that are described below. Related works are reviewed in Sec‐ tion 2. In Section 3, we carry out an understanding of the business, analyze and prepare the data used, as well as the details of the proposed methods. Then, we carry out the modeling and discussion in Section 4. Finally, the conclusions appear in Section 5.
2. Related Works As a result of the bibliographic study carried out, there are some works on feature determination and the use of prediction techniques that are related to ours. Among these works are the following: In [6] the authors explain that machine learn‐ ing techniques bene it performance models. They applied protocols through the entire model develop‐ ment process: splitting data for expected sets, fea‐ ture selection, cross‐validation for model itting, and model evaluation. They used three different machine
2023 © Rodríguez et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
13
Journal of Automation, Mobile Robotics and Intelligent Systems
learning techniques to create models in each proto‐ col: Regression Trees (BRT), Random Forest (RF), and Support Vector Regression (SVR). In [16] the authors explain the hierarchical impor‐ tance of the factors that in luence the yield of sugar‐ cane. They use three different machine learning tech‐ niques: Random Forest (RF); Boosting and Support Vector Machine (SVM), for which they initially propose to identify and order the main variables that condi‐ tion the yield of sugarcane, according to their relative importance. On the other hand, in [5] the authors propose that Random Forests (RF) can cope with the generation of a prediction model when the search space of predictor variables is large, because there are many different combinations of climatic, seasonal variables, climate prediction indices, and crop model outputs that could be useful in explaining the size of the sugarcane crop. In [31] the authors use the C4.5 algorithm to ind out the climatic parameter that most in luences the yield of selected crops in districts of Madhya Pradesh. In [23] the authors identify the most important risk factors from a highly dimensional data set that helps in the accurate classi ication of heart diseases with fewer complications. The identi ication of the most relevant medical features aids in the prediction of heart disease using a ilter‐based feature selection technique. Different ML classi ication models such as Logistic Regression (LR), Decision Tree (DT), Naive Bayes (NB), Random Forest (RF), Multi Layer Percep‐ tron (MLP), are used in the data sets to identify the models. suitable for the problem. In [18] the authors propose a feature selection algorithm based on association rules and an inte‐ grated classi ication algorithm based on random equi‐ librium sampling. The experimental results show that the association rule‐based feature selection algo‐ rithm is better than the CART, ReliefF, and RFE‐SVM algorithms in terms of classi ication accuracy and feature dimension. The proposed integrated classi i‐ cation algorithm based on random equalization sam‐ pling is superior to the comparative SMOTE‐Boost and SMOTE‐RF algorithms in macro accuracy, full macro speed, and macro F1 value, representing the robust‐ ness of the algorithm. In [34] the authors propose an improved ilter function selection method to select effective functions to predict the listing statuses of Chinese‐listed com‐ panies. Models based on C4.5 and C5.0 decision trees are employed and compared with several other widely used models. To assess the robustness of the models over time, the models are also tested under moving time windows. The empirical results demonstrate the ef icacy of the proposed feature selection method and the C5.0 decision tree model. In [30] the authors present a novel oil spill fea‐ ture selection and classi ication technique, based on a forest of decision trees. The work seeks the mini‐ mization of the input features used and, at the same time, the maximization of the general test classi ica‐ tion accuracy. Examination of the robustness of the above result showed that the proposed combination 14
VOLUME 17,
N◦ 1
2023
achieved higher classi ication accuracy than other well‐known statistical separation indices. Further‐ more, comparisons with previous indings converge on classi ication accuracy (up to 84.5%) and number of features selected, but differ on actual features. This observation leads to the conclusion that there is no single optimal combination of characteristics. In [4] the authors state that the determination of the quality and authenticity of food and the detection of adulterations are problems of growing importance in food chemistry. The objective of this study was to consider parameters that contribute to the differen‐ tiation of the beer according to its degree of quality. Chemical (e.g., pH, acidity, dry matter, alcohol content, CO2 content) and sensory feature (e.g., bitter taste, color) were determined in 70 beer samples and used as variables in decision tree techniques. These pattern recognition techniques applied to the data set allowed us to extract useful information to obtain a satisfactory classi ication of the beer samples according to their quality grade. In [2] the authors state that the inductive learning of a fuzzy rule‐based classi ication system (FRBCS) is hampered by the presence of a large number of fea‐ tures that increase the dimensionality of the problem to be solved. The dif iculty comes from the exponential growth of the fuzzy rule search space with the increase in the number of features considered in the learning process. In this work, we present a genetic feature selection process that can be integrated into a mul‐ tistage genetic learning method to obtain, more ef i‐ ciently, FRBCS composed of a set of comprehensible fuzzy rules with high classi ication capacity. The pro‐ posed process ixes, a priori, the number of selected characteristics and, therefore, the size of the candidate fuzzy rule search space. The experimentation carried out, using the Sonar example base, shows a signif‐ icant improvement in the simplicity, accuracy, and ef iciency achieved by adding the proposed feature selection processes to the multistage genetic learning method or to other learning methods. According to [19], in many systems, such as fuzzy neural networks, language labels (such as large, medium, small, etc.) are often adopted to split the orig‐ inal function into several fuzzy functions. To reduce the computational complexity of the system after fea‐ ture fuzzi ication, the optimal fuzzy feature subset should be selected. They propose a new heuristic algo‐ rithm, where the criterion is based on the min‐max learning rule and a fuzzy extension matrix is designed as a search strategy. In [26] the authors propose a new feature selection method based on the bee colony and gradient boosting decision tree with the aim of addressing issues such as ef iciency and informative quality of selected fea‐ tures. This method achieves global optimization of the decision tree inputs using the bee colony algorithm to identify informative features. According to [33], to improve the accuracy of the classi ication, a preprocessing step is used to pre ilter some redundant data or irrelevant features before the construction of the decision tree. The authors
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
propose a new decision tree algorithm based on fea‐ ture weight. The experimental results show that the proposed method performs better for the measures of precision, recall, and F1 score. Furthermore, it can reduce the time required in the construction of the decision tree. For their part, the authors in [7] state that rough sets have proven to be effective in developing machine learning techniques, including methods for discov‐ ering classi ication rules. In this work, they present an algorithm to generate classi ication rules based on similarity relationships, which allows it to be applicable in cases where the traits have a discrete or continuous domain. The experimental results show a satisfactory performance compared to other algo‐ rithms such as C4.5 and MODLEM. In [10] the authors state that existing rule‐based classi ication algorithms tend to generate a number of rules with a large number of conditions in the antecedent part. However, these algorithms fail to demonstrate high predictive accuracy while balancing coverage and simplicity. Therefore, it becomes a chal‐ lenging task for researchers to generate an optimal rule set with high predictive accuracy. They propose a biogeography‐based optimization (BBO) method. The performance of the proposed algorithm is compared with a variety of rule miners such as OneR, PART, JRip, Decision Table, Conjunctive Rule, J48, Random Tree, among others. For their part, the authors in [24] develop two hybrid machine learning models, AdaBoost‐DT and Bagging‐DT based on Decision Table as a classi ier for evaluating and mapping of susceptibility of lood risks for Quang Nam. The authors of [12] state that the Fuzzy Unordered Rules Induction Algorithm (FURIA) is a recent algo‐ rithm, proposed by Huhn and Hullermeier, responsi‐ ble for creating fuzzy logic rules from a given database, and for classifying it using the generated rules. In this work they intend to analyze the effectiveness of FURY as a classi ication method applied in different contexts. It was found that for databases with a greater number of instances, quantitative or qualitative, this algorithm presented better performance; and in most cases resulted in a good coef icient agreement. Based on the previously revised literature, the most used feature selection method is rule‐based and the most used classi ication algorithm is decision trees followed by random forests.
the application of the programs of the existing agro‐industrial platform in the AZCUBA1 group has guaranteed the speed and quality of the harvest infor‐ mation. The platform is made up of several systems, including the IPlus2 system. This is the group’s harvest information system that enables the connection of the operational results of the agro‐industrial process. It is displayed at different management levels. The in luence that some technological variables have on industrial performance is known, either by empirical knowledge or by scienti ic research, such as that of [27], where the annual values of previously selected technological variables in a three‐year harvest period are analyzed to predict industrial performance. At present, it is necessary to know, based on the historical behavior of the production process, interest‐ ing relationships between the technological variables that have greater weight in the low industrial perfor‐ mance. From the analysis of the historical information, solid rules, unknown or as con irmation of the rela‐ tionships currently used, will be identi ied.
3. Data and Methods
‐ Number of Indicators: The number of indicators managed by the system is 3,605 on average, but only 578 on average are stored in the records in each database. The database that has the fewest number of indicators is that of the year 2010 with 518 indi‐ cators and the one that has the most is that of 2019 with 676 indicators (Fig. 2). An initial exploration of the available data sources is carried out, where interesting information is revealed about the behavior of the indicators of the sugar harvests in the country. Some of the problems detected are the following:
3.1. Data Business Understanding Currently, given the large amount of data that is collected and stored in the harvest database, traditional data management tools and statistical tools are not adequate to extract useful, understandable, and previously unknown knowledge, that is why it is necessary to apply data mining tech‐ niques to the historical records of the sugar harvest. The computerization of the processes of the sugar industry generates abundant data. At present,
Understanding Data The data of the historical behav‐ ior of the production process of the sugar harvest used in this work are provided by the AZCUBA Group’s Information Technology, Communications and Analy‐ sis Department. There is a database (MS‐SQL Server) for each year, corresponding to the period from 2010 to 2019. The databases have the following dimensions: ‐ Number of Records: The number of records is on average more than 4 million. The database with the fewest number of records is from 2011 with 2,369,119 records and the one with the most is from 2019 with 6,652,282 records (Fig. 1).
Figure 1. Number of records per year
15
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
transposing rows into columns. It is done for all the indicators, generating attributes with the form i10, i11, R325a, R42a, etc. This process is carried out in each data source, to carry out this action it is neces‐ sary to use the Pivoting Node – KNIME. Transposing rows into columns transforms transactional records into mineable records.
Figure 2. Number of indicators per year
‐ Several data sources are available, corresponding to one for each year of the sugar harvest. The basic addition method is used to integrate two or more data sets with similar attributes, but differ‐ ent records. Applying the KNIME tool, a work low is designed where a Concatenate Node ‐ KNIME is used. ‐ From the value of the indicator ”Yield_Reported”, the categorical attribute Performance Evaluation (EVAL_LOWYIELD) is generated, which can take the following ordinary values: ‐ Low: For R295 < 10, assigning the value 1 ‐ Not Low: For R295 >= 10, assigning the value 0
Figure 3. Percentage of records at zero per year ‐ The indicators increase over the years, which implies that the irst data warehouses have fewer indicators than the last ones. This does not consti‐ tute an inconsistency in the coding since they adapt to the needs as time goes by. These indicators vary by addition or deletion from one year to the next. ‐ The 65.03% of transactional records have zero value. This can be interpreted in some speci ic cases as a real value, but most of them are data from unmanaged indicators. The zero data is related to the con iguration of each sugar mill. The database with the least amount of zero transactional records is the one from 2011 with 60.60% of the records and the one with the most is from 2010 with 70% of the records (Fig. 3). Data Preparation The selection of the attributes or characteristics of interest for the current investigation is carried out in the database where all the informa‐ tion regarding the values of the indicators analyzed in the sugar harvest is stored. The Indust_Daily_Indicator table attributes are very useful, the attributes are detailed in Table 1. Transformations are made to the original transac‐ tional data set, obtained by means of SQL query, with a view to obtaining the mineable view. ‐ From the ID_INDICATOR attribute and its value con‐ tained in the VALUE_DAY attribute, a new attribute is generated with numerical values for each different ID_INDICATOR. This new attribute will be named according to the indicator’s description. These new attributes are generated from the process of 16
To carry out this action, it is necessary to use a Math Formula Node – KNIME, which, through the if(x,y,z) function, will allow assigning a numerical value to the previously de ined ranges and conditions. Then it is necessary to apply a Cell Replacer Node – KNIME to replace the numerical values (0, 1) with the ordi‐ nary values (Low, Not Low) respectively. ‐ A process of iltering rows for missing values is car‐ ried out, this action is carried out using the Row Filter Node – KNIME, with which 10.24% of the records of the data set are eliminated minable. ‐ Due to the large number of missing values, 65.03% referring to transactional records, the omission of these attributes is taken as an alternative to mitigate missing data. This action is performed using the Missing Value Column Filter – KNIME node, where all attributes with less than 10% miss‐ ing values are selected, thus omitting 83.22% of the attributes. ‐ Outliers are detected for each of the attributes indi‐ vidually. This action is performed using the Numeric Outliers – KNIME node, generating a numerical outlier model. This is used by the Numeric Out‐ liers(Apply) node – KNIME, to treat outliers in the input data according to the parameters of the model input. Once this process is done, the minable data set is obtained, which is saved for later use in a .CSV ile. 3.2. Methods In this session, machine learning‐based methods used in trait selection for low‐yield industrial cane sugar production are presented. Next, a brief descrip‐ tion of rule learning algorithms and methods for iden‐ tifying informative features is given. Inductive rule learning is one of the most tradi‐ tional ields in machine learning [1]. Inductive rule learning solves a classi ication problem through the induction of a set of rules or a list of decisions. The
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Table 1. Attributes of The Indust_Daily_Indicator Name ID_INDICATOR ID_ENTITY DATE_LOAD_DATA VALUE_DAY VALUE_HF VALUE_WEEK
Data Type INT INT DATETIME NUMERIC NUMERIC NUMERIC
Detail Main, identi ies the analyzed indicator through a relationship with Iplus_Indicator. Excluded, unimportant for the present study. Excluded, unimportant for the present study. Main, stores the values needed for the investigation. Excluded, unimportant for the present study. It is accumulated value. Excluded, unimportant for the present study. It is accumulated value.
main approach is the so‐called spread‐and‐conquer or cover algorithm, which learns one rule at a time, successively eliminating covered examples. The indi‐ vidual algorithms within this framework differ mainly in the way they learn individual rules [8]. Rule‐based methods are useful and well known in machine learn‐ ing because they are capable of creating interpretable models [25]. However, noisy examples and outliers can harm the performance of the inal model [9]. The data in many real‐world applications may have many dimensions, and the characteristics of that data are often highly redundant. The identi ication of infor‐ mative features has become an important step for data mining, not only to circumvent the curse of dimen‐ sionality, but also to reduce the amount of data for processing [26]. The algorithms analyzed for the selection of char‐ acteristics are the following: CONJUNCTIVERULE: It implements a single con‐ junctive learning rule from the comparison of a validated data set [21]. A rule consists of several antecedents together and the value of the class for classi ication. The next thing the algorithm does is to distribute the available classes to the middle term for a numeric value. If the experimental instance is inconspicuous for this rule, then it is predicted using a predetermined distribution of the class on the data covered by the learning rule [22]. DECISIONTABLE: Decision tables are classi ica‐ tion models used for prediction [24]. Decision tables are one of the simplest forms of knowledge repre‐ sentation within the ield of classi ication. Its most basic form of use is the storage of the occurrences of the most relevant attributes on each of the classes. The accuracy of this classi ier depends largely on the attribute selection process that is carried out in its irst stage. It is generally used as an evaluation function for the selection of attributes to the accuracy of the decision table itself using the cross‐validation pro‐ cess [28]. RIDER: It is based on the Ripple Down Rule algo‐ rithm. It generates a default (default) rule and then takes a set of rules that predict classes for the default rule with the least error. Then it generates the best set of rules until the error is reduced. It performs a tree‐like expansion of exceptions. Exceptions are a set of rules that predict classes other than the default ones [21]. FURIA: It learns fuzzy rules instead of conven‐ tional rules, and unordered rule sets instead of rule lists. Furthermore, to deal with uncovered examples it
makes use of an ef icient stretching rule. The exper‐ imental results presented show that FURIA signi i‐ cantly outperforms the original RIPPER algorithm, as well as other classi iers such as C4.5, in terms of clas‐ si ication accuracy [3]. The main difference between a fuzzy rule and a conventional rule is that the fuzzy rule tends to cover more, so it has an advantage over the conventional rule [25]. PART: It generates a list of decision rules in hier‐ archical order. In essence, it builds a rule, removes the instances it covers, and continues recursively cre‐ ating rules for the inal instances until there are no instances left [21]. It is considered an industry standard as a classi ication algorithm. It is consid‐ ered a much improved algorithm in terms of predic‐ tion accuracy [25]. The algorithm uses pessimistic pruning. The algorithm generates a decision tree and the tree building and pruning operations are combined to produce the subtree that cannot be expanded further. A rule is derived from a partial tree [11]. JRIP: It based on the RIPPER algorithm (Repeated Incremental Pruning for Error Reduction). It uses sev‐ eral comparisons at the same time, builds a set of rules separately and then performs comparisons between them [21]. It is a learning algorithm that is based on different rules that it uses to create a set of rules that is responsible for identifying the possible classes, while minimizing the number of errors. The error is de ined by the number of training examples misclassi‐ ied by the rules. The algorithm assumes that the data with which it has been previously trained is similar in some way to the unseen data on which it will perform the calculations to obtain the different rules [25]. It uses sequential coverage algorithms to create ordered lists of rules. The algorithm goes through four stages: Growth of a rule, Pruning, Optimization, and Selec‐ tion [14]. The training model for classi ication is de ined with the KNIME tool and Weka nodes for rule‐based classi ication algorithms. A work low is designed where the six rule learning algorithms are applied. The model generated by the algorithm is taken and classi ied with the test data and a series of precision statistics are calculated. Subsequently, a comparative analysis between the algorithms is carried out. The data described in the previous section were explored in the experiments. The models were built using 70% of the data for the training set and 30% for the test set. A strati ied sampling is applied where, out 17
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 2. Data set partitioning
Total Low Not Low
Training Set Nr. Records % 10 904 70.00 5 596 51.32 5 308 48.68
Test Set Nr. Records % 4 674 30.00 2 399 51.33 2 275 48.67
of a total of 15,578 records, 10,904 are used for the training set and 4,674 for the test set. The partition of the data set is detailed in Table 2. Then, in a similar way, a low is performed to auto‐ mate the selection of features from the most appro‐ priate attribute subsets to explain a target attribute, in the sense of supervised classi ication, that is, to explore which attribute subsets are the best to classify the instance class. The objective attribute to explain is the EVAL_LOWYIELD categorical attribute. At the beginning of the feature selection cycle, all the features of the input data set that will be taken into account for the construction of the model are selected, as well as those that will be kept ixed in the selection process.
4. Results and Discussion When evaluating the algorithms, the precision statistics for each one are obtained. They are detailed in Table 3. The Recall metric measures how good the model is at detecting positive events [32]. It is obtained that the algorithm that is the best to identify poor performance is DECISIONTABLE with (1.0), followed by RIDOR with (0.93). The Precision metric measures how good the model is for assigning positive events to the posi‐ tive class [32]. It is obtained that the algorithm that presents the most precision for the training carried out to classify low performance is the CONJUNC‐ TIVERULE with (0.94), followed by JRIP with (0.92). The Sensitivity metric measures how apt the model is to detect events in the positive class [32]. It is obtained that the algorithm that presents the most sensitivity for the training carried out to classify low performance is the CONJUNCTIVERULE with (0.94), followed by the JRIP with (0.92).
VOLUME 17,
N◦ 1
2023
The Speci icity metric measures how exact the assignment to the positive class is [32]. It is determined that the algorithm that presents the most speci icity for the training carried out to classify low performance is CONJUNCTIVERULE with (0.87), followed by FURIA with (0.62). The F-measure metric is the harmonic mean of recovery and precision [32]. It is determined that the algorithm that presents the best precision and recov‐ ery to classify low performance is DECISIONTABLE with (0.92), followed by RIDOR with (0.91). The Cohen’s Kappa Coef icient (κ), a concor‐ dance statistic between two researchers that corrects for chance [13], shows that the most reliable algorithm for the training performed is FURIA with (0.35), fol‐ lowed by JRIP with (0.33 ). The Accurancy metric measures the percentage of cases that the model has been correct [20]. It is obtained that the algorithm that presents the best precision and recovery to classify low performance is DECISIONTABLE and RIDOR with (0.85), followed by PART with (0.8). The algorithm that selected the highest number of attributes was Ridor with 75 attributes, while the one that selected the least amount was JRIP with 52 attributes. For its part, the algorithm that presented the lowest prediction error was PART, while the one with the highest was CONJUNCTIVERULE, as shown in Table 4. As a result of the number of rules generated by these algorithms, the process is automated for fea‐ ture selection of the most appropriate attribute sub‐ sets to explain the target attribute. Table 5 presents Table 4. Statistics of Feature Selection Filter Algorithms CONJUNCTIVERULE DECISIONTABLE RIDOR FURIA PART JRIP
Error 0.057 0.012 0.004 0.004 0.003 0.005
Statistics Nr. of Features 65 53 75 62 69 52
Table 3. Acurrancy Statistics Accuracy Statistics True Positives False Positives True Negatives False Negatives Recall Precision Sensitivity Speci icity F-measure Acurrency Cohen’s kappa Nr. Rules
18
CONJUNCTIVERULE DECISIONTABLE RIDOR FURIA PART JRIP Low NotLow Low Not Low Low Not Low Low Not Low Low Not Low Low Not Low 1792 751 5061 8 4744 278 4179 537 4451 276 4194 504 113 3283 856 14 586 331 327 896 588 624 360 881 751 1792 8 5061 278 4744 537 4179 276 4451 504 4194 3283 113 14 856 331 586 896 327 624 588 881 360 0.35 0.87 1.00 0.01 0.93 0.32 0.82 0.62 0.88 0.32 0.83 0.58 0.94 0.19 0.86 0.36 0.89 0.46 0.93 0.37 0.88 0.31 0.92 0.36 0.35 0.87 1.00 0.01 0.93 0.32 0.82 0.62 0.88 0.32 0.83 0.58 0.87 0.35 0.01 1.00 0.32 0.93 0.62 0.82 0.32 0.88 0.58 0.83 0.51 0.31 0.92 0.02 0.91 0.38 0.87 0.47 0.88 0.31 0.87 0.45 0.43 0.85 0.85 0.79 0.8 0.79 0.09 0.01 0.29 0.35 0.19 0.33 1 8119 91 45 948 64
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 5. Frequency of appearance Frequency Nr. of of Appearance Attributes 6 out of 6 6 100%
Selected Attributes R740a, R230a, R365a, R638, R390, R421 R613, R346, R334, R3d, R375, R1a, R345, R313d, R350, R349, R160b, R314, R299e, R591, R371, R457a R379, R378, R419a, R410, R380, R336, R170, R544, R545, R347a, R1426, R476a, R3a, R419, R464, R313c, R333a, i115, R4, R5a, R434, R572, R351, R594, R364, R547, R703, R282c, R160, R145, i64, R296, R370, R365, R462, R588, R534d, R, R463, R628, R457 R613a, R1427, R230, R190, R3g, R344, R434a, R1d, R574, i146, R333, R337a, R167, R365c, R364a, i113, i42a, R6, R420, R1524 R417, R418, R381, R497, R345a, R567, R311, R324, R744b, R5d, R282e2, R458, R529
5 out of 6 83.33%
16
4 out of 6 66.67%
41
3 out of 6 50%
20
2 out of 6 33.33%
13
1 out of 6 16.67%
4 R548, R551, R590, R583
a summary of the analysis carried out with the fre‐ quency of appearance of the attribute in each of the algorithms, which allows us to know those that pre‐ dominate. The most predominant attributes according to their frequency are 6 attributes are present in 100% of the algorithms, as well as 16 attributes are present in 83.33% of the algorithms. 41% of the attributes analyzed, the largest number, are present in 66.67% of the algorithms. From analysis carried out, the attributes with the highest frequency of appearance are described: ‐ Last Juice Extracted Total Brix (R740a): It is the amount of dissolved solids in the juice that the bagasse contains when it leaves the mills. ‐ Sugar 96 in Operation (R230a): It is the amount of sugar that remains in the technological equipment. ‐ Juices Total Pol (R365a): It is the sucrose content in the juice. ‐ Cubic Meters Mass Cooked A / t Cane (R638): It is an indicator of the relationship between the mixture of sugar and mother liquor discharged from the tank, with the tons of cane. ‐ Recovered % Pol Cane (R390): Percentage of sucrose extracted from the cane juice in the manufacturing process. ‐ Total Cane % Pol (R421): It is the percentage of sucrose content in the total cane. A ROC curve analysis is performed to select the possibly optimal attributes that most in luence poor performance. It creates the column that contains the two classes: EVAL_LOWYIELD and it sets the value
VOLUME 17,
N◦ 1
2023
Table 6. Area Under Curve Area Under Curve Pol Bagasse % Pol Cane 0.87 Bagasse Loss % Pol Cane 0.81 Final Honey Loss % Pol Cane 0.79 Boiler House Losses % Pol Cane 0.75 Total Final Honey % Pol Cane 0.74 Hours Removal without extractions 0.73 Total Foreign Matter % 0.70 Filter Cake Loss % Pol Cane 0.70
Selected Description Attributes R417 R379 R378 R419a R410 R613 R1427 R380
to which the high probabilities are assigned: Low. In Table 6, the attributes that are above the random esti‐ mation line (diagonal) are listed in descending order, representing the good classi ication results for the selected class. It is interesting to point out that the attribute R417, which is the one closest to the perfect classi i‐ cation point, in the previous analysis is only selected as a characteristic in two algorithms. On the other hand, the attributes R379, R378, R419a, R410 are only selected as characteristic in four algorithms. While R613a is selected as characteristic in ive algorithms despite having a lower area under the curve. Based on the analysis carried out, the attributes that are considered for this study as the attributes that most in luence low industrial performance are described thus: ‐ Pol Bagasse % Pol Cane (R417): Parts of Pol that must come out with the bagasse for every 100 parts of bagasse. It is used to measure the ef iciency work in grinding. ‐ Bagasse Loss % Pol Cane (R379): Expresses the value of the pol in the bagasse produced by the sugar mill for every 100 parts of pol that entered with the cane. ‐ Final Honey Loss % Pol Cane (R378): Expresses the value of the Pol in the inal molasses produced by the mill for every 100 parts of Pol that entered with the cane. ‐ Boiler House Losses % Pol Caña (R419a): The losses in the boiler house are made up of the inal molasses, ilter cake and indeterminate. ‐ Total Final Honey % Pol Cane (R410): Same as R378 but covers the losses produced in streams of prod‐ ucts that leave the process through extractions such as rich honey and different juices. Reason why it says Total. ‐ Hours Removal without extractions (R613): It expresses the time it takes for the sugar that the cane brings to travel through the entire process until it comes out as a inal product and as losses. Normal values are between 24 and 36 hours. The lower the values, the lower the indeterminate losses. ‐ Total Foreign Matter % (R1427): Foreign matters are parts of the cane plant that contain substances 19
Journal of Automation, Mobile Robotics and Intelligent Systems
N◦ 1
2023
that are harmful to the process. It is expressed as the % of the weight of the cane that is ground that is made up of these impurities: earth, bud, green leaves, dry leaves and others.
rule‐based classi ication system learning pro‐ cess for high‐dimensional problems”, Information Sciences, vol. 136, no. 1, 2001, 135–157, doi: 10.1016/S0020‐0255(01)00147‐5.
‐ Filter Cake Loss % Pol Cane (R380): It expresses the value of the Pol in the ilter cake produced by the mill as residue for every 100 parts of Pol that came in with the cane.
[3] J. Coto Palacio, Y. Jiménez Martínez, A. Nowé, J. Coto Palacio, Y. Jiménez Martínez, and A. Nowé. “Aplicación de sistemas neuroborrosos en la clasi icación de reportes en problemas de secuenciación”, Revista Cubana de Ciencias Informáticas, vol. 14, no. 4, 2020, 34– 47, Publisher: Universidad de las Ciencias Informáticas.
5. Conclusion ‐ The work allows for a broad understanding of the business, an understanding of the data, as well as a preparation to carry out the modeling of different techniques. Many of the dataset’s attributes were found to be worthless. ‐ The work allowed for comparison of different algo‐ rithms of rules and for carrying out an automated process for the selection of characteristics that allow identifying those that best it the stated objectives. ‐ The following were identi ied: R417, R379, R378, R419a, R410, R613, R1427 and R380, as the indi‐ cators that most in luence the classi ication of low industrial performance. ‐ The work constitutes a starting point for the evalu‐ ation and deeper validation of the rules and charac‐ teristics obtained.
Notes 1 Grupo Azucarero AZCUBA, https://www.azcuba.cu 2 Industrial Plus, https://www.datazucar.cu/?featured_item=iplus
AUTHORS Yohan Gil Rodríguez∗ – ESI DATAZUCAR, AZCUBA, Avenida 23 No.171/ N y O, Vedado, Plaza de la Revolución, La Habana, Cuba, e‐mail: yohan.gil@ datazucar.cu, www: https://orcid.org/0000‐0002‐ 8239‐4124/~ORCID. Raisa Socorro Llanes – CUJAE, Calle 114 No.11901/ Ciclovía y Rotonda, Marianao, La Habana, Cuba, e‐mail: raisa@ceis.cujae.edu.cu, www: https://orcid.org/00 00‐0002‐2627‐1912/~ORCID. Alejandro Rosete – CUJAE, Calle 114 No.11901/ Ciclovía y Rotonda, Marianao, La Habana, Cuba, e‐mail: rosete@ceis.cujae.edu.cu, www: https://orcid.org/00 00‐0002‐4579‐3556/~ORCID. Lisandra Bravo Ilisástigui – CUJAE, Calle 114 No.11901/ Ciclovía y Rotonda, Marianao, La Habana, Cuba, e‐mail: lbravo@ceis.cujae.edu.cu, www: https :// orcid.org/0000‐0002‐8209‐4121/~ORCID. ∗
Corresponding author
References
20
VOLUME 17,
[4] B. Dębska, and B. Guzowska‐Świder. “Decision trees in selection of featured determined food quality”, Analytica Chimica Acta, vol. 705, no. 1, 2011, 261–271, doi: 10.1016/j.aca.2011.06.030. [5] Y. Everingham, J. Sexton, D. Skocaj, and G. Inman‐ Bamber. “Accurate prediction of sugarcane yield using a random forest algorithm”, Agronomy for Sustainable Development, vol. 36, no. 2, 2016, 27, doi: 10.1007/s13593‐016‐0364‐z. [6] M. A. Ferraciolli, F. F. Bocca, and L. H. A. Rodrigues. “Neglecting spatial autocorrelation causes underestimation of the error of sugar‐ cane yield models”, Computers and Electronics in Agriculture, vol. 161, 2019, 233–240, doi: 10.1016/j.compag.2018.09.003. [7] Y. Filiberto, R. Bello, Y. Caballero, and M. Frías. “Algoritmo para el aprendizaje de reglas de clasi‐ icación basado en la teoría de los conjuntos aproximados extendida”, DYNA, vol. 78, no. 169, 2011, 62–70, Publisher: 2006, Revista DYNA. [8] J. Fürnkranz. “Rule Learning”. In: C. Sammut and G. I. Webb, eds., Encyclopedia of Machine Learning, 875–879. Springer US, Boston, MA, 2010. [9] S. García, J. Luengo, and F. Herrera, Data Preprocessing in Data Mining, volume 72 of Intelligent Systems Reference Library, Springer International Publishing: Cham, 2015, doi: 10.1007/978‐3‐319‐10247‐4. [10] P. K. Giri, S. S. De, S. Dehuri, and S. Cho. “Biogeog‐ raphy based optimization for mining rules to assess credit risk”, Intelligent Systems in Accounting, Finance and Management, vol. 28, no. 1, 2021, 35–51, 10.1002/isaf.1486. [11] Gnanambal, S., Thangaraj, M., Meenatchi, V. T., and Gayathri, V., “Classi ication Algorithms with Attribute Selection: an evaluation study using WEKA”, International Journal of Advanced Networking and Applications, vol. 9, no. 6, 2018, 3640–3644.
[1] F. Beck, and J. Fürnkranz. “An Empirical Inves‐ tigation Into Deep and Shallow Rule Learning”, Frontiers in Arti icial Intelligence, vol. 4, 2021.
[12] E. A. d. M. Gomes Soares, L. C. Leite Damascena, L. M. Mendes de Lima, and R. Marcos de Moraes. “Analysis of the Fuzzy Unordered Rule Induction Algorithm as a Method for Classi ication”, 2018.
[2] J. Casillas, O. Cordón, M. J. Del Jesus, and F. Herrera. “Genetic feature selection in a fuzzy
[13] J. J. T. Gordillo, and V. H. P. Rodríguez. “Cálculo de la iabilidad y concordancia entre codi icadores
Journal of Automation, Mobile Robotics and Intelligent Systems
de un sistema de categorías para el estudio del foro online en e‐learning”, vol. 27, 2009, 17. [14] A. Gupta, A. Mohammad, A. Syed, and M. N.. “A Comparative Study of Classi ication Algorithms using Data Mining: Crime and Accidents in Denver City the USA”, International Journal of Advanced Computer Science and Applications, vol. 7, no. 7, 2016, doi: 10.14569/IJACSA.2016.070753. [15] I. Guyon, and A. Elisseeff. “An introduction to variable and feature selection”, The Journal of Machine Learning Research, vol. 3, 2003, 1157–1182. [16] R. G. Hammer, P. C. Sentelhas, and J. C. Q. Mariano. “Sugarcane Yield Prediction Through Data Min‐ ing and Crop Simulation Models”, Sugar Tech, vol. 22, no. 2, 2020, 216–225, doi: 10.1007/s12355‐ 019‐00776‐z. [17] J. Hernández Orallo, M. J. Ramárez Quintana, and C. Ferri Ramírez. Introducción a la Minería de Datos, Pearson Educacion. S.A: España, 2004, OCLC: 933368678. [18] C. Huang, X. Huang, Y. Fang, J. Xu, Y. Qu, P. Zhai, L. Fan, H. Yin, Y. Xu, and J. Li. “Sample imbal‐ ance disease classi ication model based on asso‐ ciation rule feature selection”, Pattern Recognition Letters, vol. 133, 2020, 280–286, doi: 10.1016/j.patrec.2020.03.016. [19] Y. Li and Z.‐F. Wu. “Fuzzy feature selection based on min–max learning rule and extension matrix”, Pattern Recognition, vol. 41, no. 1, 2008, 217– 226, doi: 10.1016/j.patcog.2007.06.007. [20] J. Martinez Heras. “Precision, Recall, F1, Accuracy en clasi icación”, October 2020. Section: machine learning. [21] V. B. Núñez, R. Velandia, F. Hernández, J. Melén‐ dez, and H. Vargas. “Atributos Relevantes para el Diagnóstico Automático de Eventos de Tensión en Redes de Distribución de Energía Eléctrica”, Revista Iberoamericana de Automática e Informática Industrial RIAI, vol. 10, no. 1, 2013, 73–84, doi: 10.1016/j.riai.2012.11.007. [22] R. A. V. Ortega, and F. L. H. Suárez. “Evalu‐ ación de algoritmos de extracción de reglas de decisión para el diagnóstico de huecos de ten‐ sión”, 2010, 127. [23] M. S. Pathan, A. Nag, M. M. Pathan, and S. Dev. “Analyzing the impact of feature selection on the accuracy of heart disease prediction”, Healthcare Analytics, vol. 2, 2022, 100060, doi: 10.1016/j.health.2022.100060.
VOLUME 17,
N◦ 1
2023
[25] F. M. Pérez. “Estudio y análisis del funcionamiento de técnicas de minería de datos en conjuntos de datos relacionados con la Biología”, 35. [26] H. Rao, X. Shi, A. K. Rodrigue, J. Feng, Y. Xia, M. Elhoseny, X. Yuan, and L. Gu. “Feature selec‐ tion based on arti icial bee colony and gradi‐ ent boosting decision tree”, Applied Soft Computing, vol. 74, 2019, 634–642, doi: 10.1016/ j.asoc.2018.10.036. [27] M. Ribas García, R. Consuegra del Rey, and M. Alfonso Alfonso. “Análisis de los factores que más inciden sobre el rendimiento industrial azu‐ carero”, vol. 43, no. 1, 2016, 10. [28] A. Rivas Méndez. “Estudio experimental sobre algoritmos de clasi icación supervisada basados en reglas en conjuntos de datos de alta dimen‐ sión”, 2014, Accepted: 2019‐07‐09T15:50:17Z Publisher: Universidad de Holguín, Facultad Informática Matemática, Departamento de Infor‐ mática. [29] M. Schiezaro, and H. Pedrini. “Data feature selec‐ tion based on Arti icial Bee Colony algorithm”, EURASIP Journal on Image and Video Processing, vol. 2013, no. 1, 2013, 47, doi: 10.1186/1687‐ 5281‐2013‐47. [30] K. Topouzelis, and A. Psyllos. “Oil spill feature selection and classi ication using decision tree forest on SAR image data”, ISPRS Journal of Photogrammetry and Remote Sensing, vol. 68, 2012, 135–143, doi: 10.1016/j.isprsjprs.2012.01.005. [31] S. Veenadhari, B. Misra, and C. Singh. “Machine learning approach for forecasting crop yield based on climatic parameters”. In: 2014 International Conference on Computer Communication and Informatics, Coimbatore, India, 2014, 1–5, doi: 10.1109/ICCCI.2014.6921718. [32] M. Widmann. “From Modeling to Scoring: Confu‐ sion Matrix and Class Statistics”, May 2019. [33] H. Zhou, J. Zhang, Y. Zhou, X. Guo, and Y. Ma. “A feature selection algorithm of decision tree based on feature weight”, Expert Systems with Applications, vol. 164, 2021, 113842, doi: 10.1016/j.eswa.2020.113842. [34] L. Zhou, Y.‐W. Si, and H. Fujita. “Predicting the list‐ ing statuses of Chinese‐listed companies using decision trees combined with an improved il‐ ter feature selection method”, Knowledge-Based Systems, vol. 128, 2017, 93–101, doi: 10.1016/ j.knosys.2017.05.003.
[24] B. T. Pham, C. Luu, T. V. Phong, H. D. Nguyen, H. V. Le, T. Q. Tran, H. T. Ta, and I. Prakash. “Flood risk assessment using hybrid arti icial intel‐ ligence models integrated with multi‐criteria decision analysis in Quang Nam Province, Viet‐ nam”, Journal of Hydrology, vol. 592, 2021, 125815, doi: 10.1016/j.jhydrol.2020.125815. 21
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
INVERSE KINEMATICS MODEL FOR A 18 DEGREES OF FREEDOM ROBOT Submitted: 1st September 2022; accepted: 17th July 2023
Miguel Angel Ortega‑Palacios, Amparo Dora Palomino‑Merino, Fernando Reyes‑Cortes DOI: 10.14313/JAMRIS/1‐2023/3 Abstract: The study of humanoid robots is still a challenge for the scientific community, although there are several related works in this area, several limitations have been found in the literature that drive the need to develop an inverse kinematic modeling of biped robots. This paper presents a research proposal for the Bioloid Premium robot. The objective is to propose a complete solution to the inverse kinematics model for a 18 DOF (Degrees Of Freedom) biped robot. This model will serve as a starting point to obtain the dynamic model of the robot in a subsequent work. The proposed methodology can be extended to other biped robots. Keywords: bioloid premium robot, forward kinematics, inverse kinematic, kinematic chain.
1. Introduction The problem of study related to the kinematics of biped robots has been widely studied in the scienti ic community; in the literature have been found several limitations in the models of biped robot kinematics. This drives the need to develop an inverse kinemat‐ ics model for the Bioloid robot of 18 DOF. Speci i‐ cally, the problem of the lack of study of the kinemat‐ ics of the upper train in biped robots arises [1–13, 19–26]. Due to the high number of degrees of free‐ dom and the complexity involved in the calculation of the inverse and forward kinematics equations, most authors have the objective of modeling only the lower train of the robots, either using commercial robots such as Nao with 12 DOF legs [1], HYDROïD which has 8 active DOF per leg [2], Scout [3] and NWPUBR‐ 1 [4] with 12 DOF legs, Ostrich Bionic with 13 DOF legs [5], Cassie with 20 DOF legs [6], or robots wich are author’s design with 12 DOF [7–9] 10 DOF [10–12] and 9 DOF [13]. All of these research papers calculate the forward kinematics model by taking one of the robot’s feet as supporting foot. In other works, it is possible to obtain the for‐ ward and inverse kinematics solution for both legs and arms, using the HRP‐2 robot with 12 DOF legs [14], DARwIn‐OP with 6 DOF per leg [15], AXIS with 12 DOF legs [16], NAO with 21 DOF [17], Digit robot with 20 DOF [18], but these models propose the torso or pelvis of the robot as the initial frame. The Bioloid robot has been used by the scien‐ ti ic community to perform several studies related to kinematics, dynamics and control. Most of the works 22
obtain the kinematic model of the legs, taking into account only one foot as the initial frame [19–24]; proposing two different cases where the supporting foot is either the right or the left foot [25], in [26] the torso is taken as the initial frame. In [27] the kinematic model of the robot legs and arms is obtained but uses the torso and pelvis as initial frames. All the works mentioned previously calculate the kinematic modeling considering the Denavit– Hartenberg method to represent the position and ori‐ entation of the end‐effector. On the other hand, the authors have not estab‐ lished a complete inverse kinematic model for a 18 DOF bipedal robot. Therefore, the Bioloid Premium robot with 18 DOF is proposed as a study target. The main motivation in this paper is to develop a method‐ ology based on the Denavit‐Hartenberg method to obtain the forward and inverse kinematic model for a 18 DOF Bioloid Premium robot. In the present work we propose to obtain the com‐ plete kinematic model of the Bioloid robot, consid‐ ering four open kinematic chains, where the initial frames are the support feet, and we have the left and right pelvis as end‐effector frames; the pelvis is also proposed as another initial frame to have the left and right hand as the other end‐effector frames. The paper is organized as follows. In Section 2 the Denavit‐Hartenberg method is applied to calculate the geometric parameters of the robot. In Section 3, forward kinematic model is obtained. The equations of inverse kinematics of the robot are computed in Sec‐ tion 4. Finally, the conclusions are given in Section 5.
2. Denavit‐Hartenberg Parameters The key idea is to generate four open kinematic chains to describe the position and orientation of each link of the Bioloid Premium robot. Using the Denavit‐ Hartenberg method, the frames and parameters of each link, as well as the position and orientation of each joint of the robot are presented in Figure 1. We can observe that the supporting right and left feet are proposed as the initial frames Σd0 (xd0 , yd0 , zd0 ) and Σd0 (xd0 , yd0 , zd0 ), then the irst two kinematic chains goes up to the pelvis frame, from this point three open kinematic chains can be considered, ∑ one of them has the left foot end‐effector frame 12 (x12 , y12 , z12 ), while the second chain takes into account the right hand end‐effector frame Σd3 (xd3 , yd3 , zd3 ) and inally, the third chain considers the left hand end‐effector frame Σi3 (xi3 , yi3 , zi3 ).
2023 © Ortega-Palacios et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems
N◦ 1
VOLUME 17,
2023
Table 3. Denavit‐Hartenberg parameters of the left arm Link 1 2 3 4
αi −π/2 π/2 0 0
li 0 l3 l4 l5
θi −π/2 θb4 θb5 θb6
di d2 d3 0 0
Table 4. Value of the joints corresponding to the home position of the robot legs θ1 π/2
θ2 0
θ3 0
θ4 0
θ5 −π/2
θ6 0
Table 5. Value of the joints corresponding to the home position of the robot arms θb1 −π/2
θb2 0
θb3 0
θb4 π/2
θb5 0
θb6 0
d1 = 33 mm, d2 = 118 mm, d3 = 73 mm, Figure 1. Frames assigned to the joints of the Bioloid robot Table 1. Denavit‐Hartenberg parameters of the legs Link 1 2 3 4 5 6 7
αi π/2 −π/2 0 0 π/2 π/2 0
li 0 0 l1 l2 0 0 0
θi 0 θ1 θ2 θ3 θ4 θ5 θ6
di d1 0 0 0 0 0 0
Table 2. Denavit‐Hartenberg parameters of the right arm Link 1 2 3 4
αi π/2 −π/2 0 0
li 0 l3 l4 l5
θi −π/2 θb1 θb2 θb3
di d2 d3 0 0
l1 = l2 = 76 mm, l3 = 16 mm, l4 = 66 mm, l5 = 108 mm.
3. Forward Kinematics To calculate the forward kinematics of the robot, the transformation matrix de ined in Equation (1) was used. (cos(θ ) −sin(θ )cos(α ) sin(θ )sin(α ) l cos(θ )) i
i Hi−1 =
sin(θi ) 0 0
i
i
cos(θi )cos(αi ) sin(αi ) 0
i
i
i
i
−cos(θi )sin(αi ) li sin(θi ) cos(αi ) di 0 1
(1) Where the superscript i represents the number of the current joint and the subscript i − 1 indi‐ cates the number of the previous joint. Therefore, i Hi−1 is the homogeneous transformation matrix rep‐ resenting the rotation and translation of joint i with respect to joint i − 1. To simplify the results obtained, the following compact notation is used: sin(θi ) = Si , cos(θi ) = Ci ,
Table 1 presents the Denavit‐Hartenberg param‐ eters for the kinematic chain corresponding to the ∑ robot legs, which relates the frame (x , y , z 1 1 1 ) and 1 ∑ frame 12 (x12 , y12 , z12 ). Table 2 shows the Denavit‐Hartenberg parame‐ ters for the kinematic chain corresponding to the right arm of the robot, which ∑ ∑ relates the frame (x , y , z ) and frame d1 d1 d1 d1 d3 (xd3 , yd3 , zd3 ). Table 3 has the Denavit‐Hartenberg parameters for the kinematic chain corresponding ∑ to the left arm of the robot, which relates the frame i1 (xi1 , yi1 , zi1 ) ∑ and frame i3 (xi3 , yi3 , zi3 ). The robot’s home position is given by the angles shown in Tables 4 and 5. To de ine the value of the variables corresponding to the leg links, the real measurements of the Bioloid Premium robot leg joints were used:
sin(θi + θj ) = Si,j , cos(θi + θj ) = Ci,j where i, j denote the joint number. matrix H01 relating the frame ∑ The transformation ∑ 0 (x0 , y0 , z0 ) to frame 1 (x1 , y1 , z1 ) corresponding to the robot’s foot is shown in (2), where α = π/2, θ = l = 0. 1 0 0 0 0 0 −1 0 H01 = (2) 0 1 0 d1 0 0 0 1 The homogeneous transformation matrices corresponding to the ∑ leg joints from the frame ∑ (x , y , z ) to frame 1 1 1 1 6 (x6 , y6 , z6 ) are as follows: C1 0 −S1 0 S1 0 C1 0 H12 = 0 −1 0 0 0 0 0 1 23
Journal of Automation, Mobile Robotics and Intelligent Systems
C2 S2 H23 = 0 0 C3 S3 4 H3 = 0 0 C4 S4 5 H4 = 0 0 C5 S5 6 H5 = 0 0 C6 S6 7 H6 = 0 0
−S2 C2 0 0
0 0 1 0
−S3 C3 0 0
0 0 1 0
0 0 1 0
S4 −C4 0 0
0 0 1 0
S5 −C5 0 0
−S6 C6 0 0
l1 C 2 l 1 S2 0 1 L2 C3 L 2 S3 0 1 0 0 0 1 0 0 0 1 0 0 0 1
0 0 1 0
∑
0 −1 Hbb1 = 0 0 0 −1 Hbb5 = 0 0
0 0 1 0
−1 0 0 0
0 0 −1 0
1 0 0 0
0 0 d2 1 0 0 d2 1
Cb4 Sb4 b6 Hb5 = 0 0 Cb5 Sb5 b7 Hb6 = 0 0 Cb6 Sb6 b8 Hb7 = 0 0
2023
b6 (xb6 , yb6 , zb6 ) are as fol‐
0 0 1 0
l3 Cb4 l3 Sb4 d3 1 l4 Cb5 l4 Sb5 0 1 l5 Cb6 l5 Sb6 0 1
Sb4 −Cb4 0 0
−Sb5 Cb5 0 0
0 0 1 0
−Sb6 Cb6 0 0
0 0 1 0
Therefore, the∑forward kinematics relating the right foot frame 0 (x 0 , y0 , z0 ) and the right pelvis ∑ end‐effector frame 6 (x6 , y6 , z6 ), is calculated employing Eq. (5). H06 = H01 H12 H23 H34 H45 H56 H67
(5)
∑ The forward kinematics relating the pelvis frame ∑b (xb, yb , zb ) and the right hand end‐effector frame b3 (xb3 , yb3 , zb3 ), is calculated employing Eq. (6). b2 b3 b4 Hbb4 = Hbb1 Hb1 Hb2 Hb3
(6)
The∑forward kinematics relating the right foot frame ∑ b (xb , yb , zb ) and the left arm end‐effector frame b6 (xb6 , yb6 , zb6 ), is calculated using Eq. (7). b6 b7 b8 Hbb8 = Hbb5 Hb5 Hb6 Hb7
(7)
(3)
4. Inverse Kinematics
(4)
The matrix H07 can be computed using de forward kinematic model. Then, by successively multiplying i H07 by the inverse matrix of Hi−1 , seven matrixes can be obtained:
The homogeneous transformation matrices cor‐ responding to the joints of the ∑ ∑ right arm, from the frame b (xb , yb , zb ) to frame b3 (xb3 , yb3 , zb3 ) are as follows: Cb1 0 −Sb1 l3 Cb1 Sb1 0 Cb1 l3 Sb1 b2 Hb1 = 0 −1 0 d3 0 0 0 1 Cb2 Sb2 0 l4 Cb2 Sb2 Cb2 0 l4 Sb2 b3 Hb2 = 0 0 1 0 0 0 0 1 Cb3 −Sb3 0 l5 Cb3 Sb3 Cb3 0 l5 Sb3 b4 Hb3 = 0 0 1 0 0 0 0 1 The homogeneous transformation matrices corre‐ sponding to the joints of the left arm, from the frame 24
∑
b (xb , yb , zb ) to frame
lows:
b1 The ∑transformation matrix H∑ relating the b frame (x , y , z ) to frame (x b b b b1 , yb1 , zb1 ) b b1 corresponding to the right shoulder of the robot is b5 shown in (3). ∑ The transformation matrix ∑ Hb relating the frame b (xb , yb , zb ) to frame b4 (xb4 , yb4 , zb4 ) corresponding to the left shoulder of the robot is shown in (4).
N◦ 1
VOLUME 17,
H07 = H01 H12 H23 H34 H45 H56 H67 (H01 ) (H12 ) (H23 ) (H34 ) (H45 ) (H56 )
−1
−1
−1
−1
−1
−1
(H01 )
(H12 )
(H23 )
(H34 )
(H45 )
H07 = H12 H23 H34 H45 H56 H67
−1
−1
−1
−1
−1
H07 = H23 H34 H45 H56 H67
(H01 )
(H12 )
(H23 )
(H34 )
−1
−1
−1
−1
H07 = H34 H45 H56 H67
(H01 )
(H12 )
(H23 )
−1
−1
−1
H07 = H45 H56 H67
(H01 )
(H12 )
−1
−1
H07 = H56 H67
(H01 )
−1
H07 = H67
i The elements of matrix Hi−1 are as follows: n x o x a x px n y o y a y py i Hi−1 = n z o z a z pz 0 0 0 1
Where the matrix noa can be de ined as follows: nx o x a x noa = ny oy ay nz o z a z
Journal of Automation, Mobile Robotics and Intelligent Systems
−1
−1
−1
(H23 ) (H12 ) (H01 ) H04 = H34 C2 S2 0 −l1 C 1 S1 0 0 −S2 C2 0 0 −1 0 0 0 0 −S1 C1 0 0 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 n x o x a x px 0 0 1 −d1 ny oy ay py 0 −1 0 0 n z o z a z pz 0 0 0 1 0 0 0 1 C3 −S3 0 l2 C3 S3 C 3 0 l 2 S3 (8) = 0 0 1 0 0 0 0 1 r1,1 oy S2 + ox C1 C2 + oz C2 S1 r2,1 oy C2 − ox C1 S2 − oz S1 S2 r3,1 o z C 1 − o x S1 0 0 a y S2 + a x C 1 C 2 + a z C 2 S1 a y C 2 − a x C 1 S2 − a z S1 S2 a z C 1 − a x S1 0 P y S 2 − l 1 + Px C 1 C 2 + Pz C 2 S 1 − d 1 C 2 S 1 P y C 2 − Px C 1 C 2 − Pz S 1 S 2 + d 1 S 1 S 2 Pz C 1 − Px S 1 − d 1 C 1 1 C3 −S3 0 l2 C3 S3 C 3 0 l 2 S3 = (9) 0 0 1 0 0 0 0 1 where r1,1 = ny S2 + nx C1 C2 + nz C2 S1
2023
r2,1 = ny C2 − nx C1 S2 − nz S1 S2
4.1. Inverse Kinematics of Legs The kinematic decoupling method presented in [28, 29] is used to simplify the robot’s legs inverse kinematic model, which consists of the separation of orientation and position in robots with 6 degrees of freedom; Robots usually have three additional degrees of freedom, located at the end of the kinematic chain, and those axes generally intersect at a point informally called the robot’s wrist. Thus, given a desired inal position and orientation, the position of the cutting point (robot wrist) is established by calculating the values of θ1 , θ2 and θ3 , and then from the orientation data and those already calculated, the values of the rest of the joint variables θ4 , θ5 and θ6 are obtained. Similarly, the three hip axes of the robot are considered as the wrist of a robot manipulator, for which reason the position of the cutting point of the three axes of the hip, at this point, the origins of the reference systems of the three coincide. degrees of freedom of the hip. Then, the irst three joints of the leg can be calculated taking into account the matrixes H01 , H12 , H23 , H34 , which were obtained in the direct kinematics model. Therefore, using the inverse matrix, the following matrix equation can be determined:
N◦ 1
VOLUME 17,
r3,1 = nz C1 − nx S1 Analyzing Eq. (9) it is possible to match the 16 terms that a matrix contains, in other words, 16 equa‐ tions can be proposed and the one that is most friendly to clear the joint variable can be chosen. Therefore, from (9) the angles θ1 , θ2 , θ3 can be calculated. First, θ3 is calculated using the (3,4) term on both sides of the equation, as follows: Pz cos(θ1 ) − Px sin(θ1 ) − d1 cos(θ1 ) = 0 ( ) Pz − d 1 θ1 = arctan Px
(10)
Then θ2 is calculated using the (2,4) term on both sides of the equation: oy sin(θ2 ) + ox cos(θ1 )cos(θ2 ) + oz cos(θ2 )sin(θ1 ) = 0 ) ( ox cos(θ1 ) + oz sin(θ1 ) θ2 = arctan −oy (11) Subsequently, θ1 is calculated using the term (3,3) on both sides of the equation: A = Py cos(θ2 ) − Px cos(θ1 )sin(θ2 ) − Pz sin(θ1 )sin(θ2 ) + d1 sin(θ1 )sin(θ2 ) B = Py sin(θ2 ) − l1 + Px cos(θ1 )cos(θ2 ) + Pz cos(θ2 )sin(θ1 ) − d1 cos(θ2 )sin(θ1 ) l2 sin(θ3 ) A = l2 cos(θ3 ) B ( ) A θ3 = arctan B
(12)
The next step is to ind the joint variables θ4 , θ5 and θ6 , using the matrix equation (8), in which it is not necessary to use the homogeneous transforma‐ tion matrices because there are no translations, only rotations, for this reason reason you can use only the rotation submatrices. The rotation matrix from 0 to 6 can be written in a generic way through the noa matrix, which is nothing more than the total rotation matrix that has been carried out with the last coordinate system that corresponds to the hip on the transversal axis. Using the Denavit‐Hartenberg parameters from Table 3 it is possible to de ine the rotation matrix R47 as observed in R47 = R45 R56 R67 (13) where:
C4 R45 = S4 0
S4 C5 −C 4 R56 = S5 0 0 C6 −S6 0 R67 = S6 C6 0 0 0 1
0 0 1
0 0 1
S5 −C5 0
25
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
S4 S6 + C 4 C 5 C 6 R47 = C5 C6 S4 − C4 S6 C 6 S5
C 6 S4 − C 4 C 5 S6 −C4 C6 − C5 S4 S6 −S5 S6
C 4 S5 S4 S5 −C5 (14) The rotation matrix from 0 to 3 is found with the parameters α, θ and l from Table 2.
Link 1 2 3 4 5
R04 = R01 R12 R23 R34 (15) 1 0 0 C1 0 −S1 C1 R01 = 0 0 −1 R12 = S1 0 0 1 0 0 −1 0 C2 −S 2 0 C3 −S3 0 R23 = S2 C2 0 R34 = S3 C3 0 0 0 1 0 0 1 Thus: C2,3 C1 R04 = S2,3 C2,3 S1
−S2,3 C1 C2,3 −S2,3 S1
C2,3 C1 −1 T (R04 ) = (R04 ) = −S2,3 C1 −S1
−S1 0 C1
S2,3 C2,3 0
C2,3 S1 −S2,3 S1 C1 (16)
T
R47 = (R03 ) R07
From Eq. (18) the terms that generate a friendly equation are chosen to clear the joint variables θ4 , θ5 and θ6 . First, θ4 is calculated using the term (3,3) on both sides of Eq. (18), as follows: C = ay cos(θ2 + θ3 ) − ax sin(θ2 + θ3 )cos(θ1 ) − az sin(θ2 + θ3 )sin(θ1 ) D = ay sin(θ2 + θ3 ) + ax cos(θ2 + θ3 )cos(θ1 ) + az cos(θ2 + θ3 )sin(θ1 )
(19)
Then θ5 is calculated using the term (2,2) as follows: cos(θ4 )sin(θ5 ) D = −cos(θ5 ) az cos(θ1 ) − ax sin(θ1 )
( θ5 = arctan
) D −(cos(θ4 )(az cos(θ1 ) − ax sin(θ1 ))) (20)
Then, θ6 is calculated using the term (1,2), as fol‐ lows: sin(θ5 )sin(θ6 ) oz cos(θ1 ) − ox sin(θ1 ) = cos(θ6 )sin(θ5 ) nz cos(θ1 ) − nx sin(θ1 ) ) ( −(oz cos(θ1 ) − ox sin(θ1 )) θ6 = arctan (21) nz cos(θ1 ) − nx sin(θ1 ) The equations to ind the angles of legs are shown in Table 6. It is important to mention that the previous pro‐ cess is the same to calculate the value of joint positions θ1 , θ2 , θ3 , θ4 , θ5 and θ6 of both legs. 4.2. Inverse Kinematics of Arms
(17)
S4 S6 + C 4 C 5 C 6 C 6 S 4 − C 4 C 5 S6 C 4 S5 C5 C6 S4 − C4 S6 −C4 C6 − C5 S4 S6 S4 S5 C 6 S5 −S5 S6 −C5 C2,3 C1 S2,3 C2,3 S1 nx o x a x = −S2,3 C1 C2,3 −S2,3 S1 ny oy ay −S1 0 C1 nz o z a z (18)
sin(θ4 )sin(θ5 ) C = cos(θ4 )sin(θ5 ) D ( ) C θ4 = arctan D
6
Equation ( ) 1 θ1 = arctan PzP−d x ) ( )+oz sin(θ1 ) θ2 = arctan ox cos(θ1−o y (A) θ3 = arctan (C) B θ4 =(arctan D ) D θ5 = arctan −(cos(θ4 )(az cos(θ 1 )−ax sin(θ) 1 ))) ( z cos(θ1 )−ox sin(θ1 )) θ6 = arctan −(o nz cos(θ1 )−nx sin(θ1 )
−
Substituting Eqs. (15) and (16) and the matrix noa in the equation we have:
26
2023
Table 6. Inverse kinematics equations of the robot’s legs
Thus:
N◦ 1
To obtain the inverse kinematics of the left arm, consider the elements of matrix Hdd4 , which is shown in Eq. (6): nbx obx abx px nby oby aby py Hdd4 = (22) nbz obz abz pz 0 0 0 1 Then, from (6) the following matrix equation is de ined: −1
−1
−1
b4 b1 b3 b2 (Hb2 ) (Hb1 ) (Hb0 ) Hbb4 = Hb3 Cb1 Sb1 0 −l3 Cb2 Sb2 0 −l4 −Sb2 Cb2 0 0 −1 d3 0 0 0 −Sb1 Cb1 0 0 0 1 0 0 0 0 1 0 0 0 1 nbx obx abx pbx 0 −1 0 0 0 nby oby aby pby 0 1 −d 2 −1 0 0 0 nbz obz abz pbz 0 0 0 1 0 0 0 1 Cb3 −Sb3 0 l5 Cb3 Sb3 Cb3 0 l5 Sb3 = 0 0 1 0 0 0 0 1 nbx Sb2 − nby Cb1 Cb2 + nbz Cb2 Sb1 nbx Cb2 + nby Cb1 Sb2 − nbz Sb1 Sb2 nbz Cb1 + nby Sb1 0
Journal of Automation, Mobile Robotics and Intelligent Systems
obx Sb2 − oby Cb1 Cb2 + obz Cb2 Sb1 obx Cb2 + oby Cb1 Sb2 − obz Sb2 Sb2 obz Cb1 + oby Sb1 0 abx Sb2 − aby Cb1 Cb2 + abz Cb2 Sb1 abx Cb2 + aby Cb1 Sb2 − abz Sb2 Sb2 abz Cb1 + aby Sb1 0 Cb3 −Sb3 0 l5 Cb3 Sb3 Cb3 0 l5 Sb3 = 0 0 1 0 0 0 0 1
Link
r1,4 r2,4 r3,4 1 (23)
r2,4 = Px Cb2 + d3 Cb2 + l3 Sb2 + Py Cb1 Sb2 − Pz Sb1 Sb2 + d2 Sb1 Sb2 r3,4 = Pz Cb1 + Py Sb1 − d2 Cb1 Taking the quotient of the elements (3,4) of both sides of Eq. (23) the angle θb1 is calculated as follows: Pz cos(θb1 ) + Py sin(θb1 ) − d2 cos(θb1 ) = 0 ( ) Pz − d 2 θb1 = arctan (24) −Py Considering the element (1,3) of both sides of Eq. (23) the angle θb2 is calculated as follows: ax sin(θb2 ) − ay cos(θb1 )cos(θb2 ) + az cos(θb2 )sin(θb1 ) = 0 ( ) ay cos(θb1 ) − az sin(θb1 ) θb2 = arctan ax (25) Using the element (2,4) and (1,4) of both sides of the equation, the angle θb3 is calculated: E = Px cos(θb2 ) + d3 cos(θb2 ) + l3 sin(θb2 ) + Py cos(θb1 )sin(θb2 ) − Pz sin(θb1 )sin(θb2 ) + d2 sin(θb1 )sin(θb2 ) − Py cos(θb1 )cos(θb2 ) + Pz cos(θb2 )sin(θb1 ) − d2 cos(θb2 )sin(θb1 ) l5 sin(θb3 ) E = l5 cos(θb3 ) F
θb3 = arctan(E, F )
Cb5 Sb5 0 −l4 Cb4 Sb4 0 −l3 −Sb5 Cb5 0 0 0 1 −d3 0 0 0 1 0 Sb4 −Cb4 0 0 0 0 0 1 0 0 0 1 nbx obx abx pbx 0 −1 0 0 0 0 −1 d2 nby oby aby pby 1 0 0 0 nbz obz abz pbz 0 0 0 1 0 0 0 1 Cb6 −Sb6 0 l5 Cb6 Sb6 Cb6 0 l5 Sb6 = 0 0 1 0 0 0 0 1 nbx Sb5 − nby Cb4 Cb5 − nbz Cb5 Sb4 nbx Cb5 + nby Cb4 Sb5 + nbz Sb4 Sb5 nbz Cb4 − nby Sb4 0 obx Sb5 − oby Cb4 Cb5 − obz Cb5 Sb4 obx Cb5 + oby Cb4 Sb5 + obz Sb4 Sb5 obz Cb4 + oby Sb4 0 abx Sb5 − aby Cb4 Cb5 − abz Cb5 Sb4 abx Cb5 + aby Cb4 Sb5 + abz Sb4 Sb5 abz Cb4 − aby Sb4 0 Cb6 −Sb6 0 l5 Cb6 Sb6 Cb6 0 l5 Sb6 = 0 0 1 0 0 0 0 1
u1,4 u2,4 u3,4 1 (27)
where, u1,4 = Px Sb5 − l4 − l3 Cb5 − d3 Sb5 − Py Cb5 Cb5 − Pz Cb5 Sb4 + d2 Cb5 Sb4 + Pz Sb4 Sb5 − d2 Sb4 Sb5 r3,4 = Pz Cb4 − Py Sb4 − d2 Cb4 Taking the quotient of the elements (3,4) of both sides of Eq. (27) the angle θb4 is calculated as follows:
(26)
θb3 = arctan(E, F )
The equations to ind the angles of right arm are shown in Table 7. To obtain the inverse kinematics of the right arm, consider the equation shown in Eq. (27). The following matrix equation is de ined: (Hbb5 )
3
u2,4 = Px Cb5 − d3 Cb5 + l3 Sb5 + Py Cb4 Sb5
F = Px sin(θb2 ) − l4 − l3 cos(θb2 ) + d3 sin(θb2 )
−1
2023
+ Pz Cb2 Sb1 − d2 Cb2 Sb1
b6 (Hb5 )
2
Equation ) ( z −d2 θb1 = arctan P−P y ( ) a cos(θb1 )−az sin(θb1 ) θb2 = arctan y ax
1
r1,4 = Px Sb2 − l4 − l3 Cb2 + d3 Sb2 − Py Cb1 Cb2
−1
N◦ 1
Table 7. Inverse kinematics equations of the robot’s right arm
where,
b7 (Hb6 )
VOLUME 17,
−1
b8 Hbb8 = Hb7
Pz cos(θb4 ) − Py sin(θb4 ) − d2 cos(θb4 ) = 0 ( ) Pz − d 2 θb4 = arctan (28) Py Considering the element (1,3) of both sides of Eq. (27) the angle θb5 is calculated as follows: ax sin(θb5 ) − ay cos(θb4 )cos(θb5 ) − az cos(θb5 )sin(θb4 ) = 0 27
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 8. Inverse kinematics equations of the robot’s right arm Link 2
Equation ) ( 2 θb4 = arctan PzP−d y ( ) a cos(θb1 )+az sin(θb1 ) θb5 = arctan y ax
3
θb6 = arctan(G, H)
1
( θb5 = arctan
ay cos(θb1 ) + az sin(θb1 ) ax
2023
Corresponding author
References ) (29)
G = Px cos(θb5 ) − d3 cos(θb5 ) + l3 sin(θb5 ) + Py cos(θb4 )sin(θb5 ) + Pz sin(θb4 )sin(θb5 ) − d2 sin(θb4 )sin(θb5 ) H = Px sin(θb5 ) − l4 − l3 cos(θb5 ) − d3 sin(θb5 ) − Py cos(θb4 )cos(θb5 ) − Pz cos(θb5 )sin(θb4 ) + d2 cos(θb5 )sin(θb4 ) l5 sin(θb6 ) G = l5 cos(θb6 ) H (30)
The equations to ind the angles that correspond to the joints of the left arm are shown in Table 8.
5. Conclusion This paper presents a complete solution of the inverse kinematics model using the Denavit‐ Hartenberg methodology for a 18 DOF robot. The forward kinematics model allowed to represent the Bioloid Premium robot. Unlike the other geometric methods, our research proposal considers the decoupling kinematic method, taking the feet and the pelvis as points of origin, gen‐ erating 4 open kinematic chains to calculate the joint positions of both arms and legs of the robot in a three‐ dimensional space (x, y, z), consequently it is possible to determine the inal position of each end‐effector of the robot, taking the supporting feet as ixed reference frame. This methodology is an important step forward to obtaining the differential kinematics and subse‐ quently calculating the dynamic model of the robot in a later work. On the other hand, the proposed methodology can be extended to other biped robots. AUTHORS Miguel Angel Ortega-Palacios∗ – Language and Knowledge Engineering (LKE), Benemérita Univer‐ sidad Autónoma de Puebla, Puebla, México, e‐mail: miguel.ortegap@alumno.buap.mx. Amparo Dora Palomino-Merino – Facultad de Ciencias de la Electrónica, Benemérita Universidad Autónoma de Puebla, Puebla, México, e‐mail: amparo.palomino@correo.buap.mx. 28
N◦ 1
Fernando Reyes-Cortes – Facultad de Ciencias de la Electrónica, Benemérita Universidad Autónoma de Puebla, Puebla, México, e‐mail: fernando.reyes@correo.buap.mx. ∗
Using the element (2,4) and (1,4) of both sides of the equation, the angle θb6 is calculated:
θb6 = arctan(G, H)
VOLUME 17,
[1] J. Fierro, J. A. Pámanes, V. Santibanez, G. Ruiz and J. Ollervides. “Condiciones para una marcha ele‐ mental del robot NAO,” AMRob Journal, Robotics: Theory and Applications, no. 4(1), pp. 13–18, 2014. [2] S. Bertrand, O. Bruneau, F. B. Ouezdou and S. Alfayad. “Closed‐form solutions of inverse kinematic models for the control of a biped robot with 8 active degrees of freedom per leg,” Mechanism and Machine Theory, vol. 49, pp. 117– 140, 2012, doi: 10.1016/j.mechmachtheory. 2011.10.014. [3] O. Ruiz, Análisis cinemático y dinámico de un robot bípedo de 12 GDL internos utilizando la formulación Newton‐Euler, Universidad Nacional Autónoma de México, México: MS Thesis, 2014. [4] J. Zhang, Z. Yuan, S. Dong, M. T. Sadiq, F. Zhang and J. Li. “Structural design and kinematics simu‐ lation of hydraulic biped robot,” Applied Sciences, vol. 10, no. 18, p. 6377, 2020, doi: 10.3390/ app10186377. [5] J. Che, Y. Pan, W. Yan and J. Yu. “Kinemat‐ ics Analysis of Leg Con iguration of An Ostrich Bionic Biped Robot,” International Conference on Robotics and Control Engineering, pp. 19–22, 2021, doi: 10.1145/3462648.3462652. [6] Y. Gong, R. Hartley, X. Da, A. Hereid, O. Harib, J. K. Huang and J. Grizzle. “Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway,” In 2019 American Control Conference (ACC), pp. 4559–4566, 2019, doi: 10.23919/ACC.2019.8814833. [7] J. Che, Y. Pan, W. Yan and J. Yu. “Leg Con iguration Analysis and Prototype Design of Biped Robot Based on Spring Mass Model,” In Actuators, vol. 11, no. 3, p. 75, 2022, doi: 10.3390/act11030075. [8] Y. Hu, X. Wu, H. Ding, K. Li, J. Li and J. Pang. “Study of Series‐parallel Mechanism Used in Legs of Biped Robot,” 7th International Conference on Control, Automation and Robotics (ICCAR), pp. 97–102, 2021, doi: 10.1109/ICCAR52225. 2021.9463499. [9] E. Yılmazlar and H. Kuşçu. “Walking pattern generation and control for a bipedal robot,” Machines. Technologies. Materials, vol. 15, no. 3, pp. 99–102, 2021. [10] T. D. Huy, N. C. Cuong and N. T. Phuong. “Control of biped robot with stable walking,” American
Journal of Automation, Mobile Robotics and Intelligent Systems
Journal of Engineering Research (AJER), vol. 2, pp. 129–150, 2013. [11] D. Bharadwaj and M. Prateek. “Kinematics and dynamics of lower body of autonomous humanoid biped robot,” International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8(4), pp. 141–146, 2019. [12] K. Cherfouh, J. Gu, U. Farooq, M. U. Asad, R. Dey and V. E. Balas. “Bilateral Teleoperation Control of a Bipedal Robot Gait Using a Manipulator,” IFAC-PapersOnLine, vol. 55, no. 1, pp. 765–770, 2022, doi: 10.1016/j.ifacol.2022.04.125. [13] D. A. a. A. Vivas. “Modelado y control de un robot bípedo de nueve grados de libertad,” In VIII Congreso de la Asociación Colombiana de Automática, 2009. [14] S. Kajita, H. Hirukawa, K. Harada and K. Yokoi. “Kinematics,” in Introduction to humanoid robotics, Springer Berlin Heidelberg, 2014, pp. 19–67, doi: 10.1007/978‐3‐642‐54536‐8. [15] E. H. Franco and R. V. Guerrero. “Diseño Mecánico y Análisis Cinemático del Robot Humanoide AXIS,” Pistas Educativas, no. 35(108), 2018. [16] R. L. Williams. “DARwin‐OP humanoid robot kinematics,” In International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, vol. 45035, pp. 1187–1196, 2012, doi: 10.1115/DETC2012‐ 70265. [17] N. Ko inas, E. Orfanoudakis and M. G. Lagoudakis. “Complete analytical inverse kinematics for NAO,” In 2013 13th International Conference on Autonomous Robot Systems, pp. 1–6, 2013, doi: 10.1109/Robotica.2013.6623524. [18] G. A. Castillo, B. Weng, W. Zhang and A. Hereid. “Robust feedback motion policy design using reinforcement learning on a 3D digit bipedal robot,” International Conference on Intelligent Robots and Systems (IROS), pp. 5136–5143, 2021, doi: 10.1109/IROS51168.2021.9636467. [19] M. A. Meggiolaro, M. S. Neto and A. L. Figueroa. “Modeling and Optimization with Genetic Algo‐ rithms of Quasi‐Static Gait Patterns in Planar Biped Robots,” In Congreso Internacional de Ingeniería Mecatrónica y Automatización (CIIMA 2016), pp. 1–10, 2016.
VOLUME 17,
N◦ 1
2023
[21] A. B. Krishnan, S. Aswath and G. Udupa. “Real Time Vision Based Soccer Playing Humanoid Robotic Platform,” In Proceedings of the 2014 International Conference on Interdisciplinary Advances in Applied Computing, pp. 1–8, 2014, doi: 10.1145/2660859.2660966. [22] J. R. Cerritos‐Jasso, K. A. Camarillo‐Gómez, J. A. Monsiváis‐Medina, G. Castillo‐Alfaro, G. I. Pérez‐ Soto and J. A. Pámanes‐García. “Kinematic Mod‐ eling of a Humanoid Soccer–Player: Applied to BIOLOID Premium Type A Robot,” In FIRA RoboWorld Congress, Vols. Springer, Berlin, Hei‐ delberg, pp. 49–63, 2013, doi: 10.1007/978‐3‐ 642‐40409‐2_5. [23] H. D. Chiang and C. S. Tsai. “Kinematics Anal‐ ysis of a Biped Robot,” In Proceeding of International Conference on Service and Interactive Robots, 2011. [24] C. A. M. Domínguez and E. M. Sánchez. “Análisis estático y dinámico de un robot bípedo durante la fase de soporte simple de un ciclo de mar‐ cha,” In Memorias del XXIII Congreso Internacional Anual de la SOMIM, 2017. [25] L. E. Arias, L. I. Olvera, P. J. A. and J. V. Núñez. “Patrón de marcha 3D de tipo cicloidal para humanoides y su aplicación al robot Bioloid,” Revista Iberoamericana de Ingeniería Mecánica, Vols. 18(1), 3, 2014. [26] D. A. B. Montenegro, Generación de Trayectorias para un Robot Bípedo basadas en Captura de Movimiento Humano, 2016. [27] J. V. Nunez, A. Briseno, D. A. Rodriguez, J. M. Ibarra and V. M. Rodriguez. “Explicit analytic solution for inverse kinematics of bioloid humanoid robot,” In 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium, pp. 33–38, 2012, doi: 10.1109/SBR‐ LARS.2012.62. [28] M. V. Granja Oramas. “Modelación y análisis de la cinemática directa e inversa del manipulador Stanford de seis grados de libertad”, Bachelor’s thesis, Quito, 2014. [29] E. H Franco, R. V. Guerrero, “Diseño Mecánico y Análisis Cinemático del Robot Humanoide AXIS”, Pistas Educativas, 35(108), 2018.
[20] G. Reyes, J. A. Pamanes, J. E. Fierro and V. Nunez. “Optimum Walking of the Bioloid Humanoid Robot on a Rectilinear Path,” In Computational Kinematics. Springer, Cham, pp. 143‐151, 2018, doi: 10.1007/978‐3‐319‐60867‐9_17.
29
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
EEG SIGNAL ANALYSIS FOR MONITORING CONCENTRATION OF OPERATORS Submitted: 9th August 2022; accepted: 29th September 2022
Łukasz Rykała DOI: 10.14313/JAMRIS/1‐2023/4 Abstract: Often, operators of machines, including unmanned ground vehicles (UGVs) or working machines, are forced to work in unfavorable conditions, such as high tem‐ peratures, continuously for a long period of time. This has a huge impact on their concentration, which usu‐ ally determines the success of many tasks entrusted to them. Electroencephalography (EEG) allows the study of the electrical activity of the brain. It allows the determination, for example, of whether the operator is able to focus on the realization of his tasks. The main goal of this article was to develop an algorithm for determining the state of brain activity by analyzing the EEG signal. For this purpose, methods of EEG sig‐ nal acquisition and processing were described, including EEG equipment and types and location of electrodes. Particular attention was paid to EEG signal acquisition, EEG signal artifacts, and disturbances, and elements of the adult’s correct EEG recording were described in detail. In order to develop the algorithm mentioned, basic types of brain waves were discussed, and exem‐ plary states of brain activity were recorded. The influ‐ ence of technical aspects on the recording of EEG sig‐ nals was also emphasized. Additionally, a block diagram was created which is the basis for the operation of the said algorithm. The LabVIEW environment was used to implement the created algorithm. The results of the research showing the operation of the developed EEG signal analyzer were also presented. Based on the results of the study, the EEG analyzer was able to accurately determine the condition of the examined person and could be used to study the concentration of machine operators. Keywords: Electroencephalography, EEG, signal process‐ ing, Fourier analysis, LabVIEW, biofeedback, operator concentration, UGV
1. Introduction The method of studying the electrical activity of the brain is called electroencephalography (EEG). The complex electrical activity of the brain produces highly irregular EEG signals. Attempting to record the elec‐ trical representation of brain activity is a technically dif icult activity. The main problem in this type of measurement is the need to amplify the human brain potentials about a million times and convert them into a waveform. The extra‐cerebral potentials, which mainly consist of the movements of the examined person, are also ampli ied, while their amplitude 30
often exceeds the amplitude of the cortical poten‐ tials. The lack of consideration and correction of this phenomenon, similar artifacts, and the interfer‐ ence itself make the recording of mentioned signals unreadable [1]. In recent years, progress has been made in the ield of electroencephalography, which has resulted in many new applications [1, 2]. Advances in tech‐ nology have signi icantly improved EEG machines. As a result, the availability and the number of users of these devices have increased signi icantly. For many years, EEG has been the basic research in medicine—for example, in the diagnosis and treat‐ ment of epilepsy. It is often the only possible alter‐ native to imaging examination, such as computed tomography. Electroencephalography is also used in psychia‐ try and psychology. Moreover, neurofeedback enables people to improve their health by using signals from their bodies. Using the phenomenon of bio‐ logical feedback (biofeedback) of the EEG signal, children suffering from concentration disorders are successfully treated. Biological feedback is used to regain movement for people with muscle paralysis [1–4]. Biofeedback can also be used in non‐medical areas, which is the subject of scienti ic research [5–12]. It could be used to study the concentration of opera‐ tors of unmanned ground vehicles (UGVs) or working machines. It is known that such employees often have a huge responsibility resulting from the work they perform. The use of, for example, heavy equipment requires enormous concentration and carries a con‐ siderable risk of a large amount of damage in the event of a potential operator error. UGV operators are also exposed to enormous stress during the execution of tasks (especially in the case of remote control or teleoperation), which can contribute to the failure of the mission. Therefore, it is necessary to maintain concentration at a high level and possibly check this factor every time period, which is possible with the use of EEG. The main purpose of the article is to develop an algorithm with an implementation that would enable the determination of the state of brain activity. In the implementation of the mentioned goal, particu‐ lar attention was paid to the selection of the pro‐ gramming environment with the help of which the algorithm was developed and the possibility of its subsequent application to study the concentration of machine operators.
2023 © Rykała. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
2. Methods of Signal Acquisition and Processing Complex brain activity produces highly irregu‐ lar EEG signals and the aforementioned irregularity makes it very interesting because of its importance in modern technology and medicine [1, 13, 14]. 2.1. Source of EEG Signals Most likely, the main sources of the EEG sig‐ nal are neurons, and more precisely, they may be action potentials, inhibitory postsynaptic potentials (IPSP), and long‐term depolarization of neurons. Action potentials induce short (up to 10 ms) local cur‐ rents in the axon with a limited electric ield. In turn, the postsynaptic potentials are longer (50–200 ms) and have a larger electric ield [1, 13, 14]. 2.2. EEG Electrodes The electrodes are transmitters through which the electrical potentials of the cortex are transferred to the ampli ication device. Due to the shape and hairiness of the skin of the head, the requirements for EEG electrodes should meet two conditions: they should have a relatively small contact surface and provide comfort to the examined person. Standard EEG elec‐ trodes are small disks made of non‐reactive metals (Fig. 1). For this purpose, several types of metals are used, including gold, silver, or silver chloride. The electrode must be in close contact with the skin to ensure low impedance and thus minimize environ‐ mental and electrode artifacts. There are also other types of electrodes, the so‐called needle electrodes, but due to their invasiveness and high resistance, they are used rarely [1]. In the research carried out as part of this study, contact electrodes were used (Fig. 1), which are a combination of a contact surface (diameter of about 5 mm) and a plastic holder mounted in a cap covering the entire head. The most commonly used electrodes are Ag/AgCl [13].
Figure 2. Standard electrode positions. Own elaboration based on [1]
2.3. Electrode Location The arrangement of the electrodes is standardized. The most commonly used electrode placement system is the International System 10–20 [1]. This system cor‐ responds approximately to the anatomical structure of the brain and is based on precise measurements of the skull with the use of several characteristic points. Figure 2 shows the locations of the electrodes in the mentioned system. The use of specialized caps as in Fig. 1 allows for omitting each measurement, which greatly facilitates the examination. Each of the electrodes corresponds to a large anatomical region of the brain. Moreover, odd numbers refer to the left hemisphere of the brain, and even numbers refer to the right hemisphere. The symbols of the electrodes correspond to the Latin names of the regions: F – frontal area, Fp – prefrontal area, P – parietal area, O – occipital area, T – temporal area, A – ear electrodes, C – central area, Sp1/Sp2 – wedge electrode [1, 13, 14]. 2.4. Types of Electrode Leads
Figure 1. EEG cap used in the measurements
The summation of the inhibitory postsynaptic (IPSP) and excitatory (EPSP) potentials in the neural network causes the generation of electric currents lowing in the cells. The phenomenon of the current low creates ields that move centrifugally away from the place where the electric phenomenon occurs. The ield impact decreases with increasing distance from the source. This necessitates the most accurate place‐ ment of the electrodes so that the recorded signals best re lect the phenomena under study. Therefore, two types of leads are used: unipolar and bipolar [1, 13, 14]. Measurement with the unipolar leads (used in measurements) records changes in voltage between one electrode and the point representing the refer‐ ence potential. This method, however, is not free from artifacts, such as an artifact from an alternating cur‐ rent network with a frequency of 50 Hz. The bipolar 31
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
leads, on the other hand, extend the number of elec‐ trode combinations. In this solution, both electrodes represent the bioelectric activity of the brain, and the resulting record is a representation of the potential difference between the two measuring points used. In this case, a signal that has an equal effect on both sources will not cause a potential difference. Said method of connecting the electrodes offers a greater number of possible connections depending on the diagnostic need [1]. 2.5. EEG Equipment Electroencephalographic recorders are digital devices in which the analog signal is converted into digital and the following technical parameters are associated with their operation: sampling frequency, recording speed, and sensitivity. In addition, all EEG devices must also include resistance‐capacitive circuits, which are both low‐pass and high‐pass ilters. Low‐pass ilters attenuate unwanted high frequencies, such as muscle action potentials. Usually, they are set at 60 Hz to cut the 50 Hz artifact disturbance. In turn, high‐pass ilters similarly attenuate low‐frequency signals [15]. The average characteristics of the analyzed EEG recorder are a large number of channels (8‐32) and the use of a program switch at the input of the device, which enables the connection of each measure‐ ment path with each electrode (unipolar system) or with a pair of electrodes (bipolar system). The ALIEN recorder with 32 channels was used in the research (Fig. 3) [16]. Another characteristic feature of the device is the built‐in electrode impedance measurement system. A small value of impedance allows one to obtain a good electrical contact, and thus obtain a better measurement, with a smaller distance from noise. The correct value of the impedance between the elec‐ trode and the scalp should not exceed 5 kΩ, although in some measurements even 20 kΩ is allowed [16].
Figure 3. EEG module used in the measurements 32
Figure 4. Graphical representation of the electrode resistances
3. EEG Signal Acquisition TruScan software was used to acquire the EEG signal. The default sampling frequency is set to 128 Hz; it is also necessary to set the values of other param‐ eters, such as sensitivity (70 µV), and the band‐ pass ilter, which will determine the recorded fre‐ quency band, which is interesting for the analy‐ sis [16]. Single electrodes can be connected or the above‐ mentioned cap can be used (contains 32 optimally positioned electrodes). Examples of the resistance val‐ ues of individual electrodes are shown in Fig. 4. The red color means an unacceptable resistance value, which makes the measurement impossible. Yel‐ low indicates resistance at the limit of admissibility, while the minimum transient resistance is described by two colors: blue and green. In the case of red color, the position of the corresponding electrode should be corrected. 3.1. EEG Signal Artifacts and Disturbances The phenomenon of artifacts, or undesirable phe‐ nomena that distort the analyzed signal, is associated with the measurements of all signals, especially those with such a small amplitude. Depending on their ori‐ gin, artifacts are divided into: ‐ physiological: their cause is the functioning of human organs that are not the subject of research in the electrodiagnostic examination carried out at a given moment, ‐ technical: their cause is primarily the measurement method itself, imperfection of the equipment used, and the occurrence of physical phenomena com‐ pletely unrelated to a given electrodiagnostic mea‐ surement in the measuring space [1, 13]. Physiological artifacts cannot be eliminated by using more precise equipment, but it is possible to sig‐ ni icantly reduce their impact by trying to create the best possible conditions [1, 13]. The most important EEG signal artifacts are [1]: ‐ “crackling” of the electrode: caused by the leaky contact of the electrode with the human skin.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
This causes a very sudden, but short‐term increase in the impedance of the electrode. ‐ action muscle potentials: the most common artifact, which, when present in large numbers, completely interferes with EEG measurements. ‐ 50 Hz artifact (from AC network): an artifact arising as a result of high impedance or often bad grounding related to the proximity of an electrical apparatus, which appears as a rhythmic frequency of 50 Hz in Europe. ‐ tremor artifacts: caused by repetitive limb move‐ ments, which also cause head movements. The arti‐ fact causes small oscillations that affect the occipital electrodes.
Figure 7. Laboratory‐recorded adult EEG signals containing motion artifacts
‐ chewing artifacts: muscle action potentials charac‐ terized by a frequency consistent with the move‐ ment of the jaws (Fig. 5). ‐ artifacts related to tongue movements: they are characterized by slow, chaotic potentials indicating delta waves in the analysis (Fig. 6). ‐ motion artifacts: common, characterized by high amplitudes and the rapid decay of value. They coex‐ ist with body and head movements and are directly related to muscle artifacts (Fig. 7). ‐ blinking artifact: it is characterized by high ampli‐ tude potentials, the deviations of which are syn‐ chronous with the huge downward inclinations of the curve (Fig. 8).
Figure 5. Laboratory‐recorded adult EEG signals containing chewing artifacts
Figure 6. Laboratory‐recorded adult EEG signals containing artifacts related to tongue movements
Figure 8. Laboratory‐recorded adult EEG signals containing blink artifacts
Figure 9. Laboratory‐recorded adult EEG signals containing artifacts related to the sideways movement of the eyeballs
‐ artifacts related to sideways movements of the eye‐ balls: characteristic, sharply delimited out‐of‐phase potentials. This phenomenon is directly caused by the action of the lateral straight muscles (Fig. 9). If physiological artifacts are found in the EEG signals, then their in luence can be largely reduced by proper description. Some artifacts, such as eyelid blinking or eye movements, can be identi ied because their presence is characterized by a characteristic dis‐ tinctness of the measurement itself. A very dif icult issue is also the problem of automatic identi ication 33
Journal of Automation, Mobile Robotics and Intelligent Systems
of the disturbances generated as a result of these artifacts [1]. On the other hand, technical artifacts are usually the result of improper electrode placement, poor skin contact, and the actual conduct of tests in an envi‐ ronment that may be electromagnetically disturbed (cables, transformers, etc.). However, a lot of EEG lab‐ oratories are currently equipped with shielded rooms, which eliminates the risk of this type of interfer‐ ence [1]. 3.2. Elements of the Correct EEG Recording of an Adult The recording of the bioelectrical activity of the brain consists of brain wave rhythms of different frequencies and amplitude, and their number and distribution determine the regularity or possible pathologies. Among the brain waves, there are the following waves: delta, theta, alpha, beta, and gamma, and each of them is responsible for a speci ic type of brain activity: sleep, concentration, tension, and so on [1, 13, 14]. Delta waves (0.5–4 Hz) are an indicator of focal brain damage. Delta wave is the slowest of all brain waves. It appears during deep sleep. On the other hand, they do not appear in the normal EEG recording of an adult in the waking state, because their presence always indicates brain dysfunction [1, 13, 14]. Theta waves (4–8 Hz) are customary in the EEG of adult wakefulness, but their absence does not neces‐ sarily mean any dysfunction. These waves are asso‐ ciated with states of concentration, intense thinking, and visualization [1, 13, 14]. The alpha rhythm (8–12 Hz) is the main basic rhythm of an adult normal EEG. The alpha rhythm is de ined as a rhythmic frequency between 8 Hz and 13 Hz (sometimes 12 Hz), which is usually the highest in the occipital region. Alpha wave is the axis of the bioelectrical activity of the brain. Moreover, this wave is directly related to the state of concentration. Excess alpha may indicate problems in learning processes. The basic characteristic of these waves is that they show best when the person is relaxed and awake with their eyes closed [1, 13, 14]. Beta waves (12–40 Hz) are the background of most people’s brain waves. The rhythm occurs and domi‐ nates in the state of consciousness when a person is awake and receives signals from the environment with all senses. Beta waves are divided into: ‐ Low waves, the so‐called SMR (12–15 Hz),
VOLUME 17,
N◦ 1
2023
3.3. Sample EEG Signals of an Adult During the EEG measurements, in addition to recording numerous examples of artifacts described in Section 3.2, the following states were recorded: ‐ State of relaxation The record of the state of relaxation shown in Fig. 10 was recorded during the examination of the author of the article, who was awake during the measure‐ ments with his eyes closed. Small deviations and a relatively uniform signal low in all leads indicate the dominant alpha rhythm. ‐ Hand movement The recording shown in Fig. 11 was recorded during the examination of the author of the article, who moved his hands during the measurements. Visible signi icant sudden changes in amplitudes indicate artifacts related to the functional muscle potential. The artifact interferes with the results, especially on the frontal, parietal and temporal electrodes, but the signals from the rest of the electrode pairs are only minimally disturbed. ‐ Blinking and moving the head The record shown in Fig. 12 was recorded during the examination of the author of the article, who moved his head and blinked his eyelids during the mea‐ surements. The visible signi icant sudden changes in amplitudes, especially in the Fp1‐F3 pair, indicate the dominant nature of the blink artifact. On the other hand, the artifact related to the head move‐ ment is visible in the form of “waving” also on the pair of Fp1‐F3 electrodes.
Figure 10. Record of the relaxed state of a healthy adult a d
ove e t
‐ Medium waves, the so‐called Beta 1 (15–20 Hz), ‐ High waves, the so‐called Beta 2 (20–32 Hz) [1, 13, 14]. Gamma waves (32–200 Hz) are responsible for experiencing strong emotions and associative pro‐ cesses. The frequencies in the range: 32–50 Hz are the only frequency group found in any part of the brain. This is why it is assumed that when the brain has to process information in different parts simultaneously, it uses the 40 Hz frequency to process information simultaneously [1, 13, 14]. 34
Figure 11. Record of muscle artifacts: hand movement
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Figure 12. Record of muscle artifacts: blinking eyelids and moving the head
4. Development of the EEG Signal Analysis Algorithm The main goal of the developed algorithm is to analyze the recorded EEG signals in order to obtain information about the present frequency bands. Con‐ idence that the detected frequencies are the brain‐ waves being searched is obtained by examining the amplitude of the iltered band and the mean square value of the signal band. The scope of the aforemen‐ tioned indicators that meet the problem under con‐ sideration is presented in Table 1. In turn, the irst and second columns of the table lists the names of the brain wave along with the considered frequency band. However, there are numerous exceptions to the accepted norms for brain waves. In such cases, a very helpful indicator in the assessment of the human con‐ dition is the percentage of Fourier transform ampli‐ tude values of speci ic frequency bands in a noise‐free signal. It is especially useful when the analysis is based on the limitations of the amplitude, and root mean square (RMS) values described in Table 1 do not give satisfactory results. By calculating what percentage of the entire frequency spectrum is a given frequency band occupying the area assigned to brain waves, it is easy to determine what the real state of the examined person is. Real‐time observation of changes in the content of individual phases in the signal is the basis of biofeedback training. Figure 13 shows an algorithm for analyzing the recorded EEG signal. To work properly, it needs infor‐ mation about the frequency value with which the EEG signal was sampled. This information is necessary to determine the Nyquist limit (half the value of the Table 1. Division of brain waves according to frequency bands [1, 13, 14] Brain Frequency Amplitude RMS wave range [Hz] [µV] [µV] Delta (0.5–4) approx. 50 <20 Theta (4–7.5) <30 10 (max. 15) Alpha (7.5–13) 20–100 6–10 SMR (12–15) <20 4 Beta 1 (12–30) <20 3 Beta 2 (20–30) <20 <6–8 Gamma (31–45) ND ND
Figure 13. Block diagram of the algorithm
sampling frequency). The algorithm assumes that it is enough to narrow the frequency range to (0.5–45) Hz in order to analyze the basic brain wave bands. Of course, it is possible to examine the signal in a broader spectrum, but it does not add any additional information about the condition of the examined per‐ son. Moreover, measurements with frequencies close to 50 Hz should be done carefully due to the presence of an artifact from the electrical network with the aforementioned frequency [17–20]. After the initial iltering of the signal, a series of 7 band‐pass ilters are performed. The main band of the analyzed signal should be divided into smaller bands corresponding to their physiological counter‐ parts (frequency ranges shown in Table 1). Then, cal‐ culation operations (RMS, amplitudes) are performed, as well as a parallel Fourier transform along with the calculation of the partial sum of the amplitudes of a given band. The above algorithm provides important information about the individual frequency bands of the signal through the mentioned mathematical oper‐ ations. These data can be used to build a system sup‐ porting the interpretation of the EEG signal itself. The basic parameter that should be taken into account when interpreting the EEG signal is the so‐ called theta/beta ratio, which represents the share of slow waves in the work of the brain. For an adult, the factor is 1–2, while for children it is usually 2–3. On the other hand, people who have problems with concen‐ tration usually show the value of the discussed coef‐ icient above 3. An important parameter in assessing the correctness of the EEG signal is the content of 35
Journal of Automation, Mobile Robotics and Intelligent Systems
the Beta 2 band. This parameter is compared with the content of the SMR and Alpha bands. The Beta 2 band content parameter should be lower than the SMR. If not, high Beta 2 levels may be due to muscle artifacts (tight neck muscles, etc.). Moreover, Beta 2 often increases its value locally as a result of strongly experienced emotions [1, 13]. Observing the percentage of individual bands may also help in the interpretation of the results, thanks to the use of the physiological consequences of the occur‐ rence of subsequent bands described in Section 3.1, including: ‐ the high percentage of gamma waves may indicate that the tested person is moving, ‐ the high percentage of theta waves may indicate that the examined person is in a state of sleep [1, 13, 14]. This information is not decisive in the process of EEG analysis but is a great help because it highlights some deviations from the norm and signals some negative phenomena. Often the signal from individual electrodes is tested to obtain con idence in the results of the EEG signal analysis. This approach is usually more ef icient than the simultaneous analysis of the EEG signal from several probes. 4.1. Implementation of the Algorithm in the LabVIEW Environment The LABVIEW environment was chosen to imple‐ ment the algorithm. Figure 14 shows the appearance of the front panel of the EEG analyzer. The user has to enter the ile path, input the sam‐ pling rate, and set how many electrodes are to be taken into account during the signal analysis. To aid in the selection of leads, a map of the person’s head was created, showing the position and arrangement of the electrodes. Each time an electrode is selected, the appropriate electrode indicator lights up on the map and thus indicates the region of the brain that is taken into account during the examination. After specifying the electrodes, the user has to input the size of the data packet. It speci ies the number of samples analyzed per second and is displayed in the graphs.
Figure 14. The appearance of the front panel of the EEG analyser 36
VOLUME 17,
N◦ 1
2023
Figure 15. Signals from all electrodes in the time domain and their Fourier transform in the case of person’s state of concentration
4.2. EEG Measurements The results of the analysis of selected EEG sig‐ nals are presented in the following section. The irst two sections concern the analysis of the two signals recorded in the laboratory. The person whose brain‐ waves were recorded was an adult male (author of the article). The irst section (4.2.1) shows the recorded brain waves while solving a crossword puzzle, while the second section (4.2.2) shows the EEG signals dur‐ ing everyday activity. 4.2.1. Adult’s state of concentration Figure 15 shows the obtained graphs of the anal‐ ysed signals together with their Fourier transforms. As can be seen in Fig. 15, time domain signal wave‐ forms are in the range (−80 µV, 80 µV). The occurring sudden increases in the signal amplitude are related to the presence of muscle artifacts, which in this case means that the examined person made movements with the eyeballs and eyelids. It should be mentioned that the examined person sat on a chair during the examination and was supposed to focus on solving the crossword puzzle while trying not to make any movements with the limbs, torso, and head. In turn, in the frequency domain, the signal has the highest values for low frequencies up to about 10 Hz. The visible sudden jump in the amplitude value recorded for the frequency of 50 Hz is related to the occurrence of the previously presented “50 Hz” artifact from the alternating current network. As can be seen in Fig. 16, the most numerous fre‐ quency band are delta waves, which are present in the analyzed signal: about 33%. The theta wave band turned out to be also very numerous in this respect, the content of which in the signal is: about 14%. High values of this indicator for waves with such low fre‐ quencies are also due to the factory settings of the EEG adapter (hardware delay). The Theta / Beta ratio is in the optimal range: (1.2), which means that the patient has no problems with concentration. In turn, relatively high contents of Beta1 and Alpha waves, around 11%, testify to the very fast work of the brain, which is dominant in solving intellectual problems. The content of the Beta 1 wave is the lowest among the entire frequency band of Beta waves, which is a normal phenomenon.
Journal of Automation, Mobile Robotics and Intelligent Systems
Figure 16. Visualization of the results by the EEG analyser in the case of person’s state of concentration
Figure 17. Signals from all electrodes in the time domain and their Fourier transform in the case of person’s movement As can be seen in Fig. 16, brain waves achieve very low mean square mean values (values on the right in Fig. 16), which are not consistent with those shown in Table 1. It is for this reason that none of the bands has met all the amplitude limits (left LED in Fig. 16). 4.2.2. Movement of an adult Figure 17 shows the obtained graphs of the ana‐ lyzed signal together with their Fourier transforms. As can be seen in Fig. 17, the time domain signal is much more complex than that shown in the previous section in Fig. 15. In this case, the examined person made a large number of different movements: eye movements, eyelid blinking, to more complex hand or head movements. It should be mentioned that the examined person sat on a chair during the examina‐ tion. The sensitivity of the equipment and the rela‐ tively short wiring made it impossible to move with the whole body. Thus, all the mentioned artifacts con‐ tribute to signal irregularity and huge amplitude val‐ ues in the time domain: a maximum of about 200 µV. In the frequency domain, on the other hand, most of the signal lies in the low frequency range down to about 10 Hz. The “50 Hz” artifact is hardly noticeable here due to the domination of other artifacts in the signal under consideration. As can be seen in Fig. 17, the most numerous frequency band are delta waves, which account for
VOLUME 17,
N◦ 1
2023
Figure 18. Visualization of the results by the EEG analyzer in the case of person’s movement approximately 47% of the analyzed signal. This is the result of numerous artifacts overlapping the signal. Some of the artifacts, such as blinking and eye move‐ ments, cause sudden spikes in the signal amplitude and, through their almost immediate action, introduce additional high frequencies into the signal spectrum. On the other hand, some other artifacts, mainly related to movement, such as movements of the limbs and head, grow freely coexisting with the movements of the body and head. As already mentioned, the move‐ ments of the examined person during the recording of the signal could not be too dynamic, mainly due to the sensitivity of the equipment used to record the EEG signal. It is for this reason that the discussed move‐ ments were slow, which was expressed in the form of such a huge percentage of the Fourier transform amplitude in the entire EEG signal. Relatively low content of alpha waves and the entire beta wave band results from the dominant in lu‐ ence of artifacts on the EEG signal. Gamma waves at a relatively high level of 12% indicate the presence of these motion artifacts. The Theta Beta ratio is around 2.14 and it is not in the optimal range: (1, 2), but only slightly beyond its limit. As can be seen in Fig. 18 all brain waves except for the delta wave reach very low mean square mean values (values on the right side of Fig. 18), which are not consistent with those presented previously in Table 1. 4.3. Discussion In order to present the results of the EEG analyzer performance, two characteristic measurements of the EEG signal were selected: the state of concentration and movement. In all cases, the measurements were taken from an adult. Due to the fact that the discussed cases were quite easy to register, the registration of muscle artifacts (presented in subsection 3.3) is not an easy activity. The created analyzer enables a more accurate assessment of the tested person’s condition than 37
Journal of Automation, Mobile Robotics and Intelligent Systems
the TruScan software dedicated by the manufacturer, used to acquire measurements. The program created in the LabVIEW package allows one to view the results of the measurements in real time. In addition, the user can observe a series of waveforms and the Fourier transforms of the signal (before and after the pre‐ iltering process). It also has the ability to ilter the sig‐ nal in the user‐speci ied band. A number of indicators provide ongoing information about the parameters of the EEG signal. The created analyzer works correctly based on the presented results, but the limited number of recorded EEG signals and the lack of information on the direct in luence of age on the parameters of brain waves contributed to the fact that some of the analyzer’s operating parameters were adopted intuitively. It can lead to an inaccurate mapping of the results of the signal analysis when, for example, the operator is an elderly or very young person.
5. Conclusion In order to present how the implemented EEG sig‐ nal analyzer works, two characteristic measurements of the EEG signal of an adult person were selected: the state of concentration and the daily activity of the brain. Based on the results of the implemented solution, it should be emphasized that in each of the cases it was able to accurately determine the condition of the examined person. On this basis, it was assumed that the developed methodology for conducting the analysis and the adopted algorithm is correct. The implemented solution enables an accurate assessment of the condition of the examined per‐ son and could be used to study the concentration of machine operators. The user can preview the results of the analysis and can observe the Fourier transform of the signal. It also has the ability to ilter the signal in the user‐speci ied band. A number of indicators provide ongoing information about the parameters of the brainwave signal. The algorithm has several “rigidly” adopted parameters in the analysis of brain waves. Moreover, they are not met for every person, which can lead to errors in the results. Attempting to change the current algorithm to an algorithm using fuzzy logic would provide an opportunity to develop the work in the future. In the signal analysis process itself, there is also a wide range of tools that could improve the properties of the implemented algorithm, such as wavelet analysis and neural networks.
AUTHOR Łukasz Rykała∗ – Institute of Robots and Machine Design, Faculty of Mechanical Engineering, Military University of Technology, gen. Sylwestra Kaliskiego 2, 00‐908, Warsaw, Poland, e‐mail: lukasz.rykala@wat.edu.pl. ∗
38
Corresponding author
VOLUME 17,
N◦ 1
2023
References [1] Hoerth M. Rowan’s Primer of EEG, Second Edition. Journal of Clinical Neurophysiology. 2018:1. [2] P. Augustyniak. Przetwarzanie sygnałów elektro‐ diagnostycznych. AGH. 2001. [3] M. Kołodziej, A. Majkowski, R. Rak. Interfejs mózg‐komputer – wybrane problemy rejestracji i analizy sygnału EEG. Przegla̧d Elektrotech‐ niczny. 2009. [4] R. Rak, M. Kołodziej, A. Majkowski. Metrologia w Medycynie, Interfejs-mózg-komputer. WAT. 2011. [5] Y. Zhang, M. Zhang, Q. Fang. “Scoping Review of EEG Studies in Construction Safety.” International Journal of Environmental Research and Public Health. 2019;16(21):4146, doi: 10.3390/ijerph16214146. [6] P. Li, R. Meziane, M. Otis, H. Ezzaidi, P. Cardou. A Smart Safety Helmet using IMU and EEG sensors for worker fatigue detection. 2014 IEEE Interna‐ tional Symposium on Robotic and Sensors Envi‐ ronments (ROSE) Proceedings. 2014, IEEE. [7] H. Jebelli, S. Hwang, S. Lee. “EEG‐based work‐ ers’ stress recognition at construction sites.” Automation in Construction. 2018;93:315–324, doi: 10.1016/j.autcon.2018.05.027. [8] S. Hwang, H. Jebelli, B. Choi, M. Choi, S. Lee. “Measuring workers’ emotional state during construction tasks using wearable EEG.” Journal of Construction Engineering and Management. 2018;144(7):04018050, doi: 10.1061/(ASCE)CO.1943‐7862.0001506. [9] S. Saedi, A. Fini, M. Khanzadi, J. Wong, M. Sheikhkhoshkar, M. Banaei. “Applications of electroencephalography in construction.” Automation in Construction. 2022;133:103985, doi: 10.1016/j.autcon.2021.103985. [10] G. N. Ranky, S. Adamovich. Analysis of a com‐ mercial EEG device for the control of a robot arm. Proceedings of the 2010 IEEE 36th Annual Northeast Bioengineering Conference (NEBEC) 2010, IEEE, doi: 10.1109/NEBC.2010.5458188. [11] Y. Li, G. Zhou, D. Graham, A. Holtzhauer. “Towards an EEG‐based brain‐computer interface for online robot control.” Multimed. Tools Appl. 2016; 75: 7999–8017, doi: 10.1007/s11042‐ 015‐2717‐z. [12] X. Gu, Z. Cao, A. Jolfaei, P. Xu, D. Wu, T. Jung, C. Lin. EEG‐based Brain‐Computer Interfaces (BCIs): A Survey of Recent Studies on Signal Sensing Technologies and Computational Intelli‐ gence Approaches and their Applications. arXiv 2020, doi: 10.48550/arXiv.2001.11337 [13] P. Abhang. Introduction to EEG‐ and Speech‐ Based Emotion Recognition. Elsevier Science; 2016. [14] W. Tatum. Handbook of EEG interpretation. Demos Medical; 2014.
Journal of Automation, Mobile Robotics and Intelligent Systems
[15] M. Sou ineyestani, D. Dowling, A. Khan. “Elec‐ troencephalography (EEG) Technology Appli‐ cations and Available Devices.” Applied Sciences. 2020;10(21):7453, doi: 10.3390/app10 217453. [16] DEYMED: https://deymed.com/truscan‐eeg (access 22.06.2022). [17] R. Lyons. Understanding digital signal processing. Upper Saddle River, N.J.: Prentice Hall; 2011.
VOLUME 17,
N◦ 1
2023
[18] R. Typiak, Ł. Rykała, A. Typiak. “Con iguring a UWB Based Location System for a UGV Operating in a Follow‐Me Scenario.” Energies. 2021;14(17):5517, doi: 10.3390/en14175517. [19] M. Owen. Practical signal processing. Cambridge: Cambridge University Press; 2012. [20] T. Holton. Digital Signal Processing: Principles and Applications. Cambridge: Cambridge Univer‐ sity Press; 2021.
39
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
AUTOMATED ANONYMIZATION OF SENSITIVE DATA ON PRODUCTION UNIT Submitted: 11th January 2022; accepted: 7th September 2022
Marcin Kujawa, Robert Piotrowski DOI: 10.14313/JAMRIS/1‐2023/5 Abstract: The article presents an approach to data anonymization with the use of generally available tools. The focus is put on the practical aspects of using open‐source tools in conjunction with programming libraries provided by suppliers of industrial control systems. This universal ap‐ proach shows the possibilities of using various operating systems as a platform for process data anonymization. An additional advantage of the described approach is the ease of integration with various types of advanced data analysis tools based both on the out‐of‐the‐box approach (e.g., business intelligence tools) as well as customized solutions. The discussed case describes the anonymiza‐ tion of data for the needs of sensitive analysis by a wider group of recipients during the construction of a predictive model used to support decisions. Keywords: Data anonymization, Sensitive data, open‐ source tools, Industrial data processing, Historian data anonymization, Honeywell DCS, IT/OT integration, Oper‐ ational technology
1. Introduction In recent years, cybersecurity has been gaining increasing importance both in the everyday life of an average Facebook user as well as, if not mainly, in the industrial world. Companies often standing at the forefront of the industrial revolution 4.0 and handling large volumes of the data must ensure that data is adequately protected against unauthorized access or analysis. Research activities related to data anonymization nowadays have big impact on 4.0 industry. Previous studies reported various structures and technologies of data anonymization [1, 3], such as encryption [2], continued data delivery [4], and adaptive predictive models [5]. Various techniques of data protection, such as disturbance, anonymization, and cryptogra‐ phy, are described in the literature [6]; however, these present a different approach from the one presented in this paper. Samad et al. [7] also describes the use of anonymized data and their use in the context of arti icial intelligence and machine learning methods. This article presents an approach to data anonymization with the use of publicly available tools and physical equipment, using the example of process data from a petrochemical plant’s production installation. In addition to legal safeguards related to 40
data protection, it is worth spending some time to maintain a competitive advantage. In this case, the focus is on cryptographic techniques that hinder the processing of unauthorized data. When working on a project with sensitive data, it is often necessary to transfer data to external entities, which results in a speci ic approach to data. The aim of the discussed case is to make the data available to a research university by a national‐scale critical production facility. The analysis of various types of solutions was de‐ termined by reservations as to the completeness of the security of data collected and secured with the use of accessible services on the market. When you use cloud services, for example, one of your platform’s managed disks handles encryption and decryption in a fully transparent manner, using envelope encryption. This encrypts data using a data encryption key, which is in turn protected with keys. However, when an attempt is made to verify by an unauthorized user, no access to data is permitted; similarly, full access to data by authorized users is permitted. It is not possible to verify whether the access to data is limited, and the data is stored in open form, whether or not the data is encrypted. The remainder of this paper is organized as follows. The method of downloading raw data is described in Section 2. The process of data anonymiza‐ tion is presented in Section 3. The result of the analysis is illustrated in Section 4. Concluding remarks are listed in the last section.
2. Technical Aspects Data are collected from an industrial facility us‐ ing the available information technology (IT) systems included in the IT / operational technology (OT) lay‐ ered model shown in Figure 1. It is a general struc‐ tural outline for the Honeywell Distributed Control System (DCS) system on which the analyses were performed. The value measured by the analog/digital (A/D) converter (see Level 0) is processed by the DCS con‐ trollers (see Level 1). Depending on the system con‐ iguration, it can be stored by the system server for the purposes of visualization and ongoing data anal‐ ysis at operator stations (see Level 2). If long‐term data archiving is necessary from the company’s point of view, it is necessary to implement the so‐called
2023 © Kujawa and Piotrowski. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
attention should be paid to the appropriate prepara‐ tion of data in several aspects: ‐ Data ownership, ‐ Data security,
“historian” in Level 3. For the purposes of arti icial intelligence (AI) and machine learning (ML), signi i‐ cant amounts of data should be processed. The ex‐ act amount depends on the methods used. The data scientist can operate on data at level 4 in the IT environment.
‐ Data processing. To obtain data in a proper way, it is necessary to regulate the approach to data ownership. In large companies, often there are internal procedures regu‐ lating legal aspects depending on the type of data, for example, sensitive data, or personal data. Moreover, to make sure that the approach to sharing data is correct, it is worth consulting the SOC Department, if the company has one. The Security Operational Cen‐ ter (SOC) opinion is also important when it comes to data classi ication and possible restrictions related to General Data Protection Regulation (GDPR). The production plant from which data have been collected is based mainly on Honeywell production solutions, therefore process data have been collected via a historian (Honeywell Uniformance Process His‐ tory Database, a non‐sql database). To download data a dedicated library has been used which enables the use of the C# language to create an application that aggregates data in an appropriate manner. In this way, the obtained data in an open form are transferred for encryption. It was decided not to use the available programming tools in the .Net environment for data anonymization due to the gradual expansion of the system towards a decision support system and direct integration with DCS. To maintain consistency in the target solution, data anonymization was performed using Python.
3. Data Anonymization
3.3. Data Encryption Algorithm and Use of Open‐Source Libraries
Figure 1. Industrial network layers
The article presents one of the possible ap‐ proaches to data anonymization. In the following sec‐ tions, the tools necessary to download and process data, as well as process automation, will be discussed. 3.1. Environment To properly approach the topic of collecting, anonymizing, and processing data for the purposes of building models of the analyzed objects, many tools need to be applied. The main ones are: ‐ Visual Studio 2019: C# development environment, ‐ PyCharm: Python development environment, to generate the Python code from which the publicly available libraries for data encryption come from, which is described in more detail in Section 3.4, ‐ application integration library for data historization from the historian application supplier, ‐ Anaconda: an environment for running Python ap‐ plications on the Windows operating system. Linux or Mac‐based programs can be used with‐ out installing any additional software if the company policy allows it. 3.2. Data pre‐processing Regardless of the vendor and architecture of the system from which we obtain the data, special
In the discussed case, publicly available Werkzeug [8, 9] libraries were applied as starting elements. These libraries are primarily used for user authentication in web applications created in Python. The unconventional use of libraries allows one to obtain code ensuring data anonymity by using crypto‐ graphic techniques in a quick and transparent manner. The diagram of data handling is presented in Figure 2. The applied algorithm guarantees checking the system status, which includes: ‐ correct communication between the application server and the historian server; ‐ reliability of the data determined by the historian. In the irst step, starting from the block de‐ scribed as “start,” we download data from the histo‐ rian (PHD – Process History Database). In the next step, the system status is checked from the side of the correctness of system connections, and when it is correct, we can proceed to the next steps leading to saving anonymized data based on data download. The data con idence level should then be checked as an internal quality parameter for the PHD system. If the collected data has suf iciently high con idence, we can proceed to the steps characteristic for proper data anonymization. In the next steps, we collect the hardware key used in the code, the process point name 41
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Figure 3. Diagram of the data anonymization and decision support system
Figure 2. Overall data processing schema is generated (in the irst run, in the next steps). The last step is to encrypt the ile content and save it in an anonymous form. Then we wait for the next iteration and start the process anew. Such anonymization of data ensures two‐dimensional veri ication as part of reading anonymous data. It is necessary to know the hardware key for the processed data and to have a list of points because the generated variable name cannot be processed backwards to the input string. It is only possible to determine whether the anonymous vari‐ able string comes from a given input string. 3.4. Automatization of Data Anonymization The main advantage of automating data encryp‐ tion is reducing the amount of data processed at a time, which allows using anonymization on an on‐ going basis. The use of the presented algorithm al‐ lows for download of current data and updating the output iles, despite the variable ile name at each application run. Changing the ile name with each ac‐ cess, consisting in adding successive lines containing encrypted measurement data, additionally increases data security. The described operations take place on the appli‐ cation server presented in Figure 3. In the OT section, you can see three elements: the operator station, DCS server, and PHD server. Data is downloaded via the PHD server, that is, the server responsible for long‐term archiving of process data. This approach guarantees the additional separation of the external system from the critical systems such 42
as distributed control systems. An additional isolating element is a centrally located irewall in the DMZ zone separating IT from OT. Data separated in this way is downloaded by the application server (APP Server), whose purpose is to continuously download the cur‐ rent data and subject it to anonymization and saving to lat iles to make it available to external entities. Another function of the application server is the use of data to supply predictive models where the pre‐ diction result is saved in a database (DB). To build a universal visualization layer, a visualization of data previously read from the database was implemented on the prepared server hosting the web applications. An additional advantage is the possibility of using the same server to service the operator station, that is, the client on the OT side. 3.5. Data Provided for Experiment In production installations, each of the measure‐ ments is de ined in an unambiguous way that al‐ lows its identi ication on the scale of the entire production plant and the type of measured physi‐ cal quantity. During the experiment, the focus was on representative temperature measurements. These measurements have been renamed for the article. In the naming convention used, the irst three digits symbolize the number of the production installation, the letter in the fourth position symbolizes a mea‐ sured physical quantity (T – temperature, P – pressure, F – low), the letter in the ifth position indicates the DCS point type (I – Indication, C – controller), the character after underscore is a unique point number within the production facility. Table 1 presents the values of data increase concerning the original iles for representative measurements. All discussed mea‐ surements saved in lat iles represent the same time range (they have the same number of lines). The initial volume of iles may be different due to different mea‐ suring points and different scale of measured values. Based on the presented Table 1, a non‐linear data increase rate can be noticed. 3.6. Changing the Length of Encryption Key Changing the ile content encryption type may af‐ fect the smooth operation of the system and its re‐ source capacity. Figure 4 shows the time needed to encrypt the content of representative iles in rela‐ tion to various encryption algorithms (Secure Hash
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Table 1. File attributes
Name before anonymization 001TI_001A 001TI_001B 001TI_002 001TI_003 001TI_004 001TI_005 001TI_006
Name after anonymization
qel6sW7q$027183e8f1289e8d31bd53262ddc4be8418a013517 d83507c09832b8e10c35e5 HbWQjw9W$134b5e2b3417d529bf40c392722c08c7e970b784 b 64ad4380ef5479fc1e0334 YqWNmnUI$4d76684336497c7eaffaa70dcdecb548d4e69274a14 93010cf0d4e2ef056c889 E50OuzmP$e55ad3917b6586527c2e143e67fcc183e39c9057023 d0384b65ce5d7550a8f5b SjYI5v72$b7864a9b8d42f9365ba99a057e6bb14da13e25735de dcebe29792051c3c0a31b 6tm5q8Wo$de72d08788095917f16f95cac3e6010512c16929935 78db9dde91e1ff98a5bf1 G7mTL5ZW$7fe898c919e78e47e195ab9f42d27c2bdddb0ab1b94 1f0df968ce2437261b6b7
Table 2. SHA algorithms Variant SHA 256 SHA 1 SHA 224 SHA 384 SHA 512 SHA 3‐224 SHA 3‐256 SHA 3‐385 SHA 3‐512 SHA 512‐224 SHA 512‐256
Algorithm SHA 2 SHA 1 SHA 2 SHA 2 SHA 2 SHA 3 SHA 3 SHA 3 SHA 3 SHA 2 SHA 2
Output Size in bits/ Internal State Size in bits/ Block size in bits 256/256/512 160/160/512 224/256/512 384/512/1024 512/512/1024 224/512/1152 256/512/1088 385/1024/832 512/1024/576 224/512/1024 256/512/1024
Number of lines / File Size Before Anonymization [KB] / File Size After Anonymization [KB] / Size increase [%] 8126/299/ 992/331,78 8126/283/ 992/350,53 8126/275/ 992/360,73 8126/275/ 992/360,73 8126/275/ 992/360,73 8126/275/ 992/360,73 8126/275/ 992/360,73
ile processing time. An interesting phenomenon is the different effectiveness of encryption algorithms depending on the input data, which can be seen in Figure 4.
4. Protection of Access The data encryption itself can be treated as one of the layers of data access security. However, in many cases it is worth using many‐factor authentication based on software and hardware security. For this reason, data is stored on an encrypted drive, in ac‐ cordance with corporate policy. Moreover, possible access to a physically encrypted data storage device is granted by RestApi using authentication methods based on a registered user using a variable token with a de ined validity for double authentication.
5. Conclusion
Figure 4. Change the type of data encryption inside a batch file Algorithm – SHA). The in luence of the encryption algorithm on the time of anonymization operations was presented in Table 2. By introducing different encryption methods into the algorithm, we obtain different algorithm circulation times, which has a direct impact on the
When anonymizing data, attention should be paid to the selection of appropriate tools for the current needs and the data being processed. Care should be taken that the methodologies used are not redundant in relation to the risks they are intended to counteract. This approach will allow you to choose the optimal solution in terms of inancial as well as acceptable from the side of security control. When selecting the appropriate encryption algo‐ rithm, pay attention to the following factors: ‐ Regardless of the scienti ic aspect, the guidelines of the internal security services play a decisive role in choosing the method of securing sensitive data in enterprises of critical infrastructure. Therefore, the development team and the security team should work closely together. ‐ Costs related to the consumption of computing re‐ sources – Despite the relatively ef icient work of the algorithm with data incrementally (continuously), 43
Journal of Automation, Mobile Robotics and Intelligent Systems
when we process large volumes of data, initial costs may accumulate. ‐ Costs related to the storage and transfer of data should be obtained, regardless of whether the model is cloud‐based or based on local infrastructure. Current IT solutions allow the use of scalability of systems and applications to optimize costs. Moreover, thanks to the possibility of a wide use of open‐source tools and applications, we can use resources more effectively thanks to the experience of the community. In this case, the data encryption itself should be seen only as one of the security layers, because the IT se‐ curity department will have the decisive opinion, and the discussed algorithms can only be a proposal to supplement one of the data security layers. AUTHORS Marcin Kujawa – Gdańsk University of Technology, Faculty of Electrical and Control Engineering, Poland, e‐mail: marcin.kujawa2@pg.edu.pl. Robert Piotrowski∗ – Gdańsk University of Tech‐ nology, Faculty of Electrical and Control Engineering, Poland, e‐mail: robert.piotrowski@pg.edu.pl. ∗
Corresponding author
ACKNOWLEDGEMENTS This work was supported by the Ministry of Science and Higher Education (now: Ministry of Education and Science) under the “Implementing Doctorate” pro‐ grame, No. DWD/3/41/2019. The authors wish to ex‐ press their thanks for the support.
References [1] Sánchez, D., J. Soria‐Comas, and J. Soria‐Comas. Automatic Anonymization of Textual Documents: Detecting Sensitive Information via Word Embeddings. New Zealand, 2019.
44
VOLUME 17,
N◦ 1
2023
[2] Nabywaniec, D. Anonymisation and masking of sensitive data in companies. ISBN: 978‐83‐283‐ 5681‐8 (in Polish), 2019. [3] Devaux, E. How “anonymous” is anonymized data? https://medium.com/statice/how‐anonymous‐ is‐anonymous‐c92ad265a3e3 (2021‐11‐20). [4] Dholakia, J. Building Python apis with lask, laskrestplus and swagger ui. https://medium.com/a nalytics‐vidhya/swagger‐ui‐dashboard‐with‐ flask‐restplus‐api‐7461b3a9a2c8 (2021‐11‐20). [5] Grus J. Data Science from Scratch: First Principles with Python, ISBN: 978‐83‐283‐4603‐1, 2018. [6] Murthy, S., A. Abu Bakar, F. Abdul Rahim, and R. Ramli. A Comparative Study of Data Anonymization Techniques. 2019 IEEE 5th International Con‐ ference on Big Data Security on Cloud (Big‐ DataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS), 2019, pp. 306–309, doi: 10.1109/BigDataSecurity‐HPSC‐IDS.2019.00063. [7] Samad, A.A.; Arshad, M.M.; Siraj, M.M. Towards Enhancement of Privacy-Preserving Data Mining Model for Predicting Students’ Learning Outcomes Performance. 2021 IEEE International Conference on Computing (ICOCO), 2021, pp. 13–18, doi: 10.1109/ICOCO53166.2021.9673544. [8] Kenedy, P. What is werkzeug?. https://testdriven.i o/blog/what‐is‐werkzeug/ (2021‐11‐20). [9] Grinberg, M. Flask Web Development: Developing Web Applications with Python, 2nd edition, ISBN: 978‐83‐283‐6384‐7, 2020.
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
HYBRID ADAPTIVE BEAMFORMING APPROACH FOR ANTENNA ARRAY FED PARABOLIC REFLECTOR FOR C‐BAND APPLICATIONS Submitted: 5th September 2022; accepted: 29th September 2022
Sheetal Bawane, Debendra Kumar Panda DOI: 10.14313/JAMRIS/1‐2023/6 Abstract: This paper presents the design of a parabolic reflector fed through a patch antenna array feed to enhance its directivity and radiation properties. Adaptive beam for‐ mers steer and alter an array’s beam pattern to increase signal reception and minimize interference. Weight selec‐ tion is a critical difficulty in achieving low SLL and beam width. Low Side Lobe Level [SLL]and narrow beam reduce antenna radiation and reception. Adjusting the weights reduces SLL and tilts the nulls. Adaptive beam formers are successful signal processors if their array output con‐ verges to the required signal. Smart antenna weights can be determined using any window function. Half Power Beam Width and SLL could be used to explore different algorithms. Both must be low for excellent smart antenna performance. In noisy settings, ACLMS and CLMS create narrow beams and side lobes. AANGD offers more control than CLMS and ACLMS. The blend of CLMS and ACLMS is more effective at signal convergence than CLMS and AANGD. It presents an alternative to the conventionally used horn‐based feed network for C‐band applications such as satellite communication. Broadside radiation patterns and 4x4 circular patch antenna arrays are used in the proposed design. 1400 aperture illumination is pro‐ vided by the array’s feed parabolic reflector, whose F/D ratio is 0.36. The proposed design’s efficacy is assessed using simulation analysis. Keywords: microstrip patch, adaptive antennas, parabolic reflector, beamforming, antenna arrays, smart antenna.
1. Introduction Long‐range signal transmission often makes use of re lector‐equipped antennas in satellite communica‐ tion. Feeding antenna and parabolic re lector must be calibrated to ensure the best possible performance in this combination. Material losses, ef iciency, and sep‐ aration between the feeder and re lector all have an impact on antenna performance and gain for a C‐band satellite ground station. The wireless communication paradigm has completely transformed as a result of the rapid proliferation of cellular technology over the past decade. Enhancements to electrical circuits and gadgets have led to a rise in the ability to store, analyze, and transmit massive amounts of data. The size, cost, power, and speed of electronic systems have all taken
on new dimensions as a result of large‐scale integra‐ tion. All engineering ields have effectively exploited these bene its to achieve new levels of development and performance [1]. Long‐distance communication systems, such as satellite communication, radar, and remote sensing, commonly use re lector‐equipped antennas to achieve high gain and the required distances. With a re lector, a high gain antenna consumes less power, has low cross‐polarization, low voltage standing wave ratio, lightweight carbon iber material, and economical fab‐ rication using cheap components [2]. An antenna with a re lector is ideal for transmitting and receiving sig‐ nals at a satellite system’s ground station. Spherical, hyperbolic, parabolic, and cylindrical re lectors are all options for use with a feeder antenna. Horn antennas, log periodic dipole arrays, and spiral antennas are the most popular feeding anten‐ nas [3–5]. Most commonly, the horn antenna is used for long distance point‐to‐point transmission. Several advantages of microstrip antennas have led to their consideration as feed antennas. Because of their thin design and ease of use, they are well‐suited for use in a wide range of electronic systems. It is also straightfor‐ ward to achieve the appropriate beam shape and high gain by grouping them in an array [6]. In order to achieve high‐speed wireless commu‐ nication, high spectral ef iciency, and large capacity, antenna design is a critical issue to resolve. The strong spectral features of the parabolic re lector antenna make it a crucial component in high‐frequency trans‐ mission. Because of its parabolic shape, the antenna is able to produce a narrower radiation beam with greater signal precision. A point‐to‐point communica‐ tion system could bene it greatly from having access to such features. The simplicity, high gain, and direc‐ tivity of the parabolic re lector make it the best option for transmitting radio waves over the air. Because of their high directivity, horn antennas have long been utilized as the feed network for parabolic re lectors in demanding applications like satellite communication. Because of the size and weight of the necessary com‐ ponents, satellite communication has proven dif icult. Electrical distribution to antenna components, on the other hand, has proven to be a major challenge for the team. The intelligent adaptive antenna system made the most of a favorable situation to improve the perfor‐ mance of the antenna array by adjusting the power
2023 © Bawane and Panda. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
45
Journal of Automation, Mobile Robotics and Intelligent Systems
distribution to the individual antenna elements. It is possible to steadily alter the antenna’s mass charac‐ teristics to improve spectrum ef iciency. An antenna array system’s radiation pattern can be reshaped using the beamforming approach, which involves altering the weights in the spatial domain. Setting weights based on time ensures that the required sig‐ nals are segregated from the interfering signal while maximizing the SNR and array output. Microstrip array antenna feeder and parabolic re lector antennas are used in this work to study the ground station performance of a C band satellite sys‐ tem. The antenna in question is a 4x4 microstrip array antenna for C‐band satellite applications. For exam‐ ple, theoretical calculations and simulations utilizing the re lector and a microstrip antenna determined the gain and directivity parameters of the antenna system. Material losses, ef iciency, focal length, and other aspects must be taken into account in order to get the best results. In order to improve the adaptive beamforming technique’s performance over the patch array feed, the antenna elements’ power distribution was modi ied. The array feed re lector’s mathematical structure represents the most signi icant contribution of the proposed design. A Complex Valued Neural Network (CVNN) uses complicated input signals, threshold values, weights, and signal functions. Models are needed for signal processing. Complex‐valued signals require speci ic complex‐valued brain processing models. Complex neural models can process linearly complicated smart antenna signals. Smart antennas analyze signals from multiple sources and interferences to establish the array’s principal beam direction. In this scenario, the signal’s arrival direction must be determined. Side Lobe Level and Half Power Beam Width (HPBW) must be low to avoid interference (SLL). CLMS and AANGD are complex‐valued neural net‐ works studied for use with Smart Antenna System signals. Widrow and Hoff’s 1959 Least Mean Square (LMS) approach determines the gradient vector. This algorithm’s iterative technique creates MMSE, how‐ ever it can’t handle complex data with noise. LMS’s delayed convergence when Eigen values are widely dispersed depends on Eigen structure, according to a study. When covariance matrix Eigen values don’t match, the LMS is data‐dependent and takes a long time to stabilize. CLMS and ACLMS were used to handle complex data in a noise‐free environment. CLMS outperforms ACLMS in signal convergence, but ACLMS outperforms it in HPBW and SLL applica‐ tions. ACLMS outperformed HPBW and SLL in a noisy real‐time setting with a short step size. Therefore, ACLMS is best for noisy environments and CLMS for calm ones. The CLMS and ACLMS algorithms adapt HPBW and SLL. When value and N vary, the parameters change, making it hard to predict a value. AANGD adds two additional model parameters: initial nonlinearity and adaptive amplitude step size [1, 2]. To regulate HPBW and SLL, obtain the required values. Since each algorithm has advantages and disadvantages when 46
VOLUME 17,
N◦ 1
2023
Figure 1. Patch antenna array feed 4x4 (top view)
it comes to the adaptive beam shaping of signals in Smart Antennas, CLMS and AANGD have been merged to create the Hybrid Model in this paper. Here is how the rest of the essay is structured: In Section 2, the mathematical design of a patch antenna array and a parabolic re lector is covered. Section 3, an adaptive method based on a hybrid smart antenna is developed for use with a patch antenna array feed re lector. In Section 4, we present a simulation analysis of the proposed method to assess its viability, and in Section 5, we wrap up the paper.
2. System Preliminaries 2.1. Patch Antenna Array Feed The horn antenna has long served as the primary source of power for the parabolic re lector it feeds. Due to the horn antenna’s size and weight, deploy‐ ing the feed network in applications requiring exact dimensioning is dif icult. To tackle this problem, a patch antenna feed network may be able to balance illumination loss and spillover loss. The edge of the re lector dish is often reduced by 10 dB in order to achieve this trade‐off. It is essential that the radiation pattern is on par with or better than standard feed networks, with a beamwidth of up to −10 dB. For C‐band satellite applications, a 4x4 microstrip array antenna is discussed in this article [10]. Figure 1 shows the three layers of the 4x4 microstrip patch antenna’s structure. 4x4 patches produce radiation from the top layer. Proximity coupling supplying the microstrip line is found in the middle layer. ‘Ground plane,’ as the name suggests, is the lowest stratum. The FR‐4 substrate used in the antenna design has a permittivity 4.3 and 1.6 mm thick. Overall, the antenna is 170 × 170 mm. At 4.148 GHz and 734 MHz, the 4x4 microstrip array antenna’s gain and bandwidth are 13.7 dB each, according to the antenna’s character‐ ization data. The top view of the 4x4 microstrip array antenna is shown in Figure 2. The patch con iguration was deemed to be the optimum design for the desired requirements after parameterization. Figure 1 depicts the ideal setup.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
descent in complex domain statistics. In [2], the CLMS algorithm is explained in detail. The weight update and output of the CLMS algorithm are as follows: 1) The weight vector’s stochastic gradient adaptation is given by w(n + 1) = w(n) + µx[n]e∗ [n],
w(0) = 0 (3)
2) The CLMS algorithm’s output is calculated as y = xH (n)w(n) → w(n + 1) = w(n) + µx[n]e∗ [n] (4) 2.4. Adaptive Amplitude Nonlinear Gradient Decent Algorithm (AANGD)
Figure 2. Basic geometry of parabolic antenna 2.2. Parabolic Reflector In order to combine the two basic components, a feeder antenna is typically placed before a parabolic re lector. It is important to consider the re lector shape, re lector angle, and feed placement in rela‐ tion to the re lector diameter (f/D ratio). To achieve high gain and low cross‐polarization ratios, antenna radiation dispersion and projections on the re lector surface are critical. Using a parabolic re lector in front of a feed antenna is shown in Figure 3. It is crucial to take into consideration the re lector’s diameter, focal length, and parabolic depth in the construction of the re lector itself. A variety of theoretical parabolic antenna approaches are examined in this work in order to make predictions about the gain potential of different sources. The F/D ratio of the parabolic re lector used in this investigation was a deciding factor in its selection. Since 0.36 is a frequent ratio for industrial microwave applications [11], this value has been used. Figure 2 shows a common parabolic re lector’s geometric design. It can be deduced from the geometric calculations that the focal distance (F ) of the parabolic re lector can be represented in terms of the diameter D and depth C of the parabola as
Regularization parameter epsilon is used to prevent divergence for inputs close to zero. J(k) derivative esti‐ mates with respect to epsilon estimators are used in ANGD’s ‘linear’ updates. Weight update gradient’s step size (ampli ication factor) can be adjusted [11, 12] to achieve this. The fact that these algorithms are based on two interconnected unconstrained optimization techniques affects their resilience to parameter initial values (for weights and step size). The following is the ANGD’s core algorithm: e(n) = d(n) − ϕ(xT (n)w(n))
(1)
Also the aperture illumination can be written as [ (F ) ] 8 D Aperture Illumination = 2 tan−1 ( F )2 16 D −1 ∼ = 1400 [ ( ) ] D 2
F −C
(2) .
2.3. Complex Least Mean Square Algorithm (CLMS) Taking advantage of the LMS’s stability and robust‐ ness, Widrow et al. introduced the CLMS [9, 10] method in 1975, which can process several complex signals at the same time. Improve complex data mod‐ elling and get usable indings using stochastic gradient
(6)
ϕ(xT (n)w(n)) = λ(n)ϕ(xT (n)w(n))
(7)
w(n + 1) = w(n) + α(n)e(n)ϕ∗ (xT (n)w(n)x∗ (n)) α(n) =
D2 F = 16C
Considering tan θ =
Based on the standard weight update, [9–11] present the adaptive learning rates α(n) for the CLMS, normalized CLMS, and normalized adaptive nonlinear gradient descent (ANGD) algorithms. µ for CLMS µ for CNLMS α(n) = |x(n)|22 + ε (5) µ for ANGD |ϕ(n)2 ||x(n)|22 + ε
(8)
µ |ϕ∗ (xT (n)w(n))|2 |x(n)|22 + e(n) (9)
ρ λ(n + 1) = λ(n) + |e∗ (n)ϕ̄(xT (n)w(n)) 2 + e(n)ϕ̄∗ (xT (n)w(n))|
(10)
3. Proposed Smart Patch Array Feed Design The feed network for the parabolic re lector’s patch antenna array is designed to outperform a tra‐ ditional horn‐based antenna. In order to generate the required radiation characteristics in a speci ic direc‐ tion, a smart antenna’s excitation level and phase can be adjusted for the array elements. By adjust‐ ing the ilter’s weights, adaptive iltering makes it possible to produce any kind of radiation pattern. The disparity between ideal and achieved radiation is employed as a controllable variable. An intelligent 47
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
given to the parabolic re lector to achieve the desired radiation characteristics.
4. Result Analysis
Figure 3. Adaptive array structure antenna allows for the modi ication of the radiation pattern. Figure 3 depicts the beamforming process using adaptive algorithms. The antenna array elements are shown as P = {p1 , p2 , . . . , pk } and the respective weights are rep‐ resented as w = {w1 , w2 , . . . , wk }. For adaptive beam shaping of signals in smart antennas, CLMS and AANGD have been combined to form the hybrid model because each method has advantages and limi‐ tations. The control of HPBW vs SLL in CLMS is poor, despite the good convergence of the array output sig‐ nal to the goal signal. HPBW vs SLL control is good with additional control parameters, similar to AANGD, despite the low convergence of the array output signal towards the target signal. In the analysis of CLMS and AANGD based on experimental data, the following model suggests com‐ bining these two algorithms as a hybrid known as Hybrid of CLMS and AANGD. As an example, consider the following hybrid algorithm: yC (n) = xH (n)wC (n) → wC (n + 1) = wC (n) + µC e(n)x(n)
(11)
yA (n) = hT (n)z(n) + g T (n)z ∗ (n)
(12)
h(n + 1) = h(n) + µA e(n)z ∗ (n)
(13)
g(n + 1) = g(n) + µA e(n)z(n)
(14)
eC (n) = d(n) − xT (n)wC (n)
(15)
eA (n) = d(n) − z aT (n)wA (n)
(16)
Yhybrid = λ∗ YC (n) + (1 − λ)∗ YA (n)
(17)
ehybrid (n) = x − Yhybrid (n)
(18)
Whybrid (n + 1) = λ(n)WC (n) + (1 − λ)WA∗ (n) (19) λ(n + 1) = λ(n) + µh Re(ehybrid (n) (YC (n) − YA (n)))
(20)
Here λ is the mixing parameter, µC , µA and µh are the step sizes for CLMS, AANGD and Hybrid model respectively. The outcome of this feed network is then 48
Simulation analysis through MATLAB software is used to evaluate the proposed design’s radiation properties. Table 1 lists the antenna characteristics that were taken into consideration throughout this experiment. With these characteristics, an electromagnetic sce‐ nario is constructed for a smart antenna‐based array feed for the parabolic re lector. The simulations are performed with a 1 kHz random noise input source. In order to simulate the real‐time environment of the Smart Antenna System, the noise component was taken into consideration alongside the input signal, and the effectiveness of the CLMS and AANGD algo‐ rithms was examined with different values of N. When the step size parameter value is decreased in a noisy environment, the performance of the selected algo‐ rithms improves. It takes more effort and iterations to get the algorithm to behave as expected when random noise is introduced into the equation. There is no single direction in which HPBW may be expected to rise or fall, according to the indings of ACLMS. In a noisy environment, low HPBW and SLL can be observed at very low values, but the trend of the data is inconsistent. A decrease in HPBW and SLL can be noticed in this comparison as the adaptive amplitude step size is reduced from its original value of 0.001. One of the most exciting aspects of this devel‐ opment is the fact that it hasn’t been seen in previous neural algorithms. The CLMS method converges faster than the AANGD technique to the target signal. Thus, a hybrid model of these two methods is developed and assessed using the same input and desired signals in order to take use of each algorithm’s greatest quali‐ ties. Compared to CLMS and ACLMS, this new hybrid algorithm surpassed them in terms of convergence to the target signal. An example of this technique’s simulated results is provided in Figure 5, which shows how phase, magnitude, and accuracy all interact. These re lector’s radiation properties are shown in Figures 6 and 7. For the patch antenna array, the radiation pattern is shown in Figure 8. Directivity is weak due to an uneven Table 1. Antenna parameters Antenna Parameters Frequency of operation (f0 ) Length of Feed line Field of Radius Radiated Power (Watts) Directivity(dB) Effective angle(deg) E(theta) E(phi) Gain(dB) Intensity(Watts)
Value 6 GHz 9.5 mm 0.5 mm 0.046 14.832 0.41 steradian 81.68 82.68 14.8287 0.112
Journal of Automation, Mobile Robotics and Intelligent Systems
N◦ 1
VOLUME 17,
2023
Figure 4. Normalized pattern of reflector Figure 7. Plot of RLE‐LMS algorithm for patch array feed Magnitude response of Desired signal @30 degrees, interferers @-20 and @-70 degrees 10 5 0
(dB)
-5 -10 -15 -20 -25 -30
Figure 5. Directivity of the parabolic reflector
Figure 6. Radiation pattern of patch antenna array without beamforming power distribution in feeding the re lector antenna. A correction to this can be accomplished by allowing beam creation and feed networks to adapt as needed to signal orientation direction, which will increase SOI’s penetration capabilities for sky wave propaga‐ tion. Beamforming techniques are used to apply beam‐ forming to patch array feed.
-80
-60
-40
-20 0 20 Angle(degrees)
40
60
80
Figure 8. Magnitude response for patch array feed The results shown in the above igures are gen‐ erated through a detailed mathematical modelling and respective simulation analysis over MATLAB soft‐ ware. The performance of the proposed technique can clearly be ensured over the real time applications like high frequency satellite communication in C‐band. Owing to the correctness and accuracy of the math‐ ematical model, the proposed design when imple‐ mented using the high‐end technological resources can be guaranteed. Due to the unavailability of high‐ end costly resources, the novelty of the proposed research is veri ied through the highly reliable simula‐ tion tool which is very closed to the real time scenar‐ ios. However, for the future prospects of the research, we can enhance the smartness of the adaptive algo‐ rithms using the deep learning models like arti icial neural network, convolutional neural network and so on. The performance of the proposed design can be studied over spectrum of other satellite communica‐ tion as well.
5. Conclusion The patch array feed suggested in this paper is based on smart antennas. A controlled power distri‐ bution is applied to the parabolic re lector to enhance 49
Journal of Automation, Mobile Robotics and Intelligent Systems
its radiating properties. A thorough analytical analy‐ sis was carried out to look into the antenna design’s numerous spectrum properties. The proposed work employs a 4x4 circular patch array for the feed net‐ work. According to estimates, the parabolic re lector has an F/D ratio of 0.36 and operates at a frequency of 6GHz.The re lector’s aperture illumination is set at 1400. In Smart Antennas, adaptive nonlinear gradient descent and complexly valued neural networks like CLMS are taken into account while producing adap‐ tive beamforming signals. Various criteria, such as the number of array elements, learning rate, initial non‐ linearity value, and adaptive amplitude step size, are taken into account in both quiet and noisy scenarios. While AANGD exceeds CLMS in terms of good control over these two parameters’ adaptation, the analysis of [12] reveals that the latter is superior to the former when it comes to convergence of the target signal, and CLMS outperforms ACLMS in both noisy and noiseless situations. These algorithms are useful for working with signals that exhibit complex dynamic behavior. In order to improve the overall performance of the Smart Antenna System, a hybrid method integrating the best aspects of both models, including CLMS and AANGD, is proposed. Also, it may be possible to combine the CLMS and AANGD models for further improvement. AUTHORS Sheetal Bawane∗ – Medicaps University, India, e‐mail: bawane11@gmail.com. Debendra Kumar Panda – Electronics Engineer‐ ing Dept. Medi‐Caps University, India, e‐mail: deben‐ drakumarpanda@gmail.com. ∗
Corresponding author
References [1] F. B. Gross, Smart Antennas for Wireless Commu‐ nications with Matlab. New York: McGraw‐Hill, 2005. [2] J. Li and P. Stoica, Robust Adaptive Beamforming. New Jersey: John Wiley & Sons, Inc., 2006. [3] Robert S. Elliott, Antenna Theory and Design, Wiley‐Inter Science, 2005.
50
VOLUME 17,
N◦ 1
2023
[4] Veerendra, Md Bakhar, Vani R.M. Smart antennas for next generation cellular mobile communica‐ tions. imanager’s Journal on Digital Signal Pro‐ cessing 2016; 4(3): 6–11. [5] Amara Prakasa Rao, N.V.S.N. Sarma. Performance Analysis of Kernel Based Adaptive Beamform‐ ing for Smart Antenna Systems. Proc. of the IEEE Int. Conference on Microwave and RF 2014, pp. 262–265. [6] L. Thao, D. Loc, N. Tuyen. “Study comparative of Parabolic array antenna and phased array antenna”, VNU Journal of Science, vol. 30, no. 3, pp. 31–36, 2014. [7] M. Yasin, Pervez. Akhtar, Performance Analy‐ sis of LMS and NLMS Algorithms for a Smart Antenna System, International Journal of Com‐ puter Applications, Vol. 4, No. 9, 2010, pp. 25–32. [8] D. K. Panda, “DRLMS Adaptive Beamforming Algorithm for Smart Antenna System”, Interna‐ tional Journal of Applied Engineering Research, vol. 13, no. 8, pp. 5585–5588, 2018. [9] D.M. Motiur Rahaman, Md. Moswer Hossain, Md. Masud Rana. “Least Mean Square (LMS) for Smart Antenna”, Universal Journal of Communi‐ cations and Network, pp. 16–21, 2013. [10] T. N. Ferreira, S. L. Netto, and P. S. R. Diniz. “Direction‐of‐arrival estimation using a direct‐ data approach,” IEEE Trans. Aerosp. Electron. Syst., vol. 47, no. 1, pp. 728–733, Jan. 2011. [11] D. P. Mandic, Yili Xia and Ali H Syad. “An Adap‐ tive Diffusion Augmented CLMS algorithm for Distributed Filtering of Non‐Circular Complex Signals”, IEEE Signal Processing Letters, Vol. 18, No. 11, 2011. [12] M. Lertsutthiwong, T. Nguyen, and B. Hamdaoui. “Ef icient wireless broadcasting through joint network coding and beamforming,” Int. J. Digit. Multimedia Broadcasting, vol. 2012, no. 342512, pp. 1–15, 2012.
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
REAL‐TIME FACE MASK DETECTION IN MASS GATHERINGS TO REDUCE COVID‐19 SPREAD Submitted: 12th August 2022; accepted: 29th September 2022
Swapnil Soner, Ratnesh Litoriya, Ravi Khatri, Ali Asgar Hussain, Shreyas Pagrey, Sunil Kumar Kushwaha DOI: 10.14313/JAMRIS/1‐2023/7 Abstract: The Covid 19 (coronavirus) pandemic has become one of the most lethal health crises worldwide. This virus gets transmitted from a person by respiratory droplets when they sneeze or when they speak. According to leading and well‐known scientists, wearing face masks and maintain‐ ing six feet of social distance are the most substantial protections to limit the virus’s spread. In the proposed model we have used the Convolutional Neural Network (CNN) algorithm of Deep Learning (DL) to ensure efficient real‐time mask detection. We have divided the system into two parts—1. Train Face Mask Detector 2. Apply Face Mask Detector—for better understanding. This is a real‐ time application that is used to discover or detect the person who is wearing a mask at the proper position or not, with the help of camera detection. The system has achieved an accuracy of 99% after being trained with the dataset, which contains around 1376 images of width and height 224×224 and also gives the alarm beep message after the detection of no mask or improper mask usage in a public place. Keywords: Covid, Machine learning, Face mask detec‐ tion, Deep Learning.
1. Introduction Covid (coronavirus disease) was irst identi ied in Wuhan, a city in central China, in December 2019, and its irst case in India was reported in Kerala on 27th January 2020. Around 152 million people are affected [20] by this pandemic. By 2022 it had become a part of our lifestyle as a normal cold and cough. Covid is a contagious disease that gets trans‐ mitted when an infected person comes in contact with another person, particularly when individuals sneeze or otherwise release droplets into the air. The spread of this coronavirus rises from person to person [1]. According to researchers the only way to protect from, or to prevent a coronavirus transmission is by using sanitizer of at least 70% alcohol, using a face mask, and maintaining social distance. These steps are the only way by which we can at least stop the spread of this virus. No medicine can completely eradicate this problem, but vaccines exist (although they are not 100% effective). Prevention is better than cure, so fol‐ lowing the guidelines provided by the government is a necessary step towards improvement in transmission rates.
Figure 1. Deep learning structure
Figure 2. The classic structure of CNN In this application, we are trying to build a real‐ time system application (face mask detector) that will detect whether a person is wearing a mask or not; it also checks if the mask is in the right posi‐ tion (properly covering the nose and mouth). Man‐ ual monitoring can be done, but technology can help in preventing infection. Deep Learning (DL) (Fig. 1), part of Machine Learning, is used ef iciently in many projects for detection, recognition, recommendations, and so on. It allows us to analyze massive data in an ef icient way (fast and accurate). We had decided to use the same proposed learning in our face mask detection model. This real‐time face mask detec‐ tion model can be integrated with surveillance cam‐ eras [8] and can detect in real time without the need of any website [17]. This type of model can be implemented in public gatherings like railway station gates, airport entrance checking gates, mall entrances, and so on. It can be further implemented in col‐ lege and school auditoriums with the addition of a database and can be made better. We used a dataset from Kaggle (prajna Bhandari). This model will use many libraries, such as opencv, keras, tensor low. It uses Convolution neural network (CNN) Figure 2, mobilenetv2, VGG16. These libraries and algorithms have their importance and can provide an ef icient model.
2. Literature Review Militante & Dionisio et al. [5] have presented their research paper on “Face Mask Detection” and a dataset
2023 © Soner et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
51
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Table 1. Comparison of different Covid models Papers/Attribute Mohammad Marufur Rahman et al. [2] Nieto‐Rodríguez et al. [6] Das, Ansari & Basak et al. [4] Militante & Dionisio et al. [5] Wadii Boulila et al. [7] S. V. Kogilavani [21]
Accuracy 98.7% 95% 95.77% 96% 93.4% 97.68%
of around 25,000 images is being used, with a pixel resolution of 224×224 with an accuracy of 96%. To replicate the stimulation of the human brain Arti icial Neural Network (ANN) is used. In this work, Rasp‐ berry pi detects the proper mask‐wearing in public areas, and if someone is entering without a mask, sets off an alarm to create awareness. Das, Ansari & Basak et al. [4] have presented their research paper on mask detection as well. In their research work, the 2 datasets have been compared on the basis of accuracy and loss results in the dataset. OpenCV, Ten‐ sorFlow, and Keras libraries were used to get the favorable result. Das et al. have de ined their own datasets for detection: datasets contain a total of 1376 images, 690 with masks and 686 without masks. A 2nd dataset was taken from Kaggle, which has a total of 853 images and has been separated into two classes (with mask and without a mask). They have trained their model with 20 epochs and with a test split of 90% training and 10% validation data. Based on the mentioned dataset, the conclusion is an accuracy rate of 95.77%, and 94.58% for dataset1 and dataset2, respectively. Wadi et al. [7] have pre‐ sented their research on real‐time facemask detec‐ tion based on deep learning concepts. The proposed model works on an of line as well as online approach. Of line approach works on Deep Learning (DL). With the help of DL, it will be easier to detect the face mask and whether it is properly worn or not. The next approach is an online approach which deals with deploying the Deep Learning model (DL) so that it can detect masking in real time. Mobile Net V2 tech‐ nology is used for the research results. The authors made the comparison between several technologies— namely, ResNet50, DenseNet, and VGG16—but the best result was assured by MobileNetV2 in terms of training time and accuracy. This proposed system detects a face mask with 99% accuracy. Mohammad Marufur Rahman et al. [2] created their research on Automated Systems. They have used the technology named Deep Learning (DL). The researchers have made use of a deep learning algorithm in order to detect masks on the face. The facial images which the researchers have used are of two types (with mask and without a mask). The research done by the authors has an accuracy rate of 98.7%. Nieto‐ Rodríguez et al. [3] have presented their research paper, “System for Medical Mask Detection in the Operating Room through Facial Attributes.” They have designed a method to detect surgical masks in an operating room. The researchers have used the com‐ bination of two detectors, one for faces and another 52
Face count No No No No No No
Voice Recommendation No No No No No No
Algorithm CNN CNN CNN ANN R‐CNN, YOLO VGG16
for medical masks, which leads to enhancement of the model’s performance, thus achieving 95% rate of accuracy in detecting faces with surgical masks. Its limitation is that the narrow method would not work in cases with faces more than a distance of 5 m away from the camera. Manoj et al.’s [19] Deep Learning‐based paper emphasizes images, especially chest X Rays and CT scan. Input is taken as pictures and work on pattern and cluster structures con irms the actual status of the virus in the human body. The author shows how graphic design will give better and clearer results in understanding the problem. This image learning based model gives us a reliable result to diagnose the virus’s actual position in the body. S. V. Kogilavani et al. [21] published a paper on how SARS‐CoV‐2 virus affects the Covid‐19 and the cause of pandemic. In this paper the authors focus on how radiological techniques can support the system when lack of RT_PCR kits were not properly available. Due to the city scan concept and applying a Deep Learn‐ ing algorithm (CNN architecture VGG, Desenet121) on datasets, they found 97.68 and 97.53 respective accuracy.
3. Methodology 3.1. Phase 1: Training Dataset In the dataset there are 1376 overall images: 690 with mask and 686 without mask [24]. The images in the dataset are collected from Kaggle, in which the overall proportion of mask and no mask images are the same, which makes the dataset balanced. To avoid Over itting, we did Splitting the dataset into three different parts. The three parts are the Training dataset, test dataset, and validation dataset. In the training dataset, we train the model, in which the model observes and learns from the given dataset. In the Validation dataset, the hyper parameters, such as learning, and other parameters are selected. If a model has a greater number of hyper parameters, then we need large numbers in the validation dataset; a model with a lesser number of hyper parameters requires a smaller amount of validation dataset. This makes them easy to update and tune. In our dataset we have taken 60% of the dataset as Training dataset, 20% of the dataset as validation dataset, and remaining 20% as test dataset. When our model is performing well in the val‐ idation dataset, then the learning can be stopped by the training dataset mentioned in Figures 3 and 4.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Figure 3. Without mask
Figure 4. With mask
Training the model Training the model is the most important step, as we used about 4095 images as a dataset, which is further divided into masked and no mask images. This will be used for training the model. We used keras TensorFlow as a basic building block for the model and Convolutional neural network (CNN) algorithm was also used. 3.2. Phase 2: Deployment In the Deployment process, the irst step of the procedure is to provide the training on the face mask detector and then the second step is to load the face mask detector, followed by performing face detection, and then the last step is to classify the face with mask and without a mask. Figure 5 clearly de ines the steps to understand the process.
Training model
Off-line Processing
Now in this part, before moving ahead, we need to capture an image by the computer or pc’s webcam and it requires pre‐processing. For resizing and normaliz‐ ing the images we have to do pre‐processing of images to maintain the uniformity of the input images. After that we have to normalize the image, and then try to set the pixel range between 0 to 1. This normalization process helps the learning algorithm, by which learn‐ ing is faster and captures necessary features from the images.
3.3. Approach Training Data Masked
Training
Unmasked
Unmasked Predicting oiutcome
Model
On-line Processing
Image processing
Figure 5. Predictive approach model
Flow Chart
Figure 6. Face mask detection flow diagram 53
Journal of Automation, Mobile Robotics and Intelligent Systems
4. Inferences We implemented our model on various live images mentioned in the igures below, which contained faces with and without masks. Some of the screenshots of the implementation are shown below:
Figure 7. Interface of face mask detection
In our proposed system we used Deep Learning algorithm (convolutional neural network, CNN) with an accuracy of 99.99%. This also had a sound alert system and face counting feature. In our proposed system we had introduced an alert system: that is, if a person with no mask was detected then a beep alert will sound as a warning, otherwise a voice message you are allowed to go in will be there.
Figure 8. Final output of proposed work 54
VOLUME 17,
N◦ 1
2023
In our proposed system we had one more feature: the number of faces will be counted. As a person comes in the range of the camera they will be added to a count of faces. Mathematical Model This mathematical representation of the problem space detection will comprise the layer ilter that needs to be put over the point‐to‐point representa‐ tion with (1*1) layer on which new features will be evaluated. A method to represent the actual “depth‐ wise separation” convolution block placed in the logic to form depthwise and pointwise convolution. The result produced by our method is somehow identical to the earlier mapped approach de ined in convolu‐ tion. With the modi ication in the speed our earlier concept. Mobile Net architecture is included in the proposed building block architecture in initial phase there are standard 33 convolutions in irst access layer point to be nested by 13 iterations of this building blocks. These blocks have different category so their access dimension are based on number of levels of pooling between layers. While demonstrating the task of space representation we opt to select a feasible mode of approach to avoid compression stride of 2 layers is introduced. It’s impacted the output channel with associated pointwise management by the factor of 2. In this manner the output response with 7x − 7x 1024 extract map in response to a 224 x 224 x 3
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Figure 9. Code for sound alert
Figure 10. Code for face count input values in image. The convolution layer will be optimized by batch normalization which comes into force for activation with ReLU6 mobile Net. Identical to the ReLu and have protected against large activation mapping against the safeguard approach of doing the things. y = min(max(0, x)6)
In depth features, input is expressed by unit con‐ volution over the feature space before the output will be placed to produce
While comparing the existing ReLU we found it’s advantageous under low precision calculations. MobileNet V2’s building block will be de ined under the separable convolutions. In in‐depth con‐ volution, the input space matrix is got multiplied by the convolution operand by unit time, producing a feature space as the output. Against features map at Xout layer, the sum of the input feature space map represented and accessed via Input Xin in the corre‐ sponding layered model these are standard denotes for input and output in convolution network with Xin (as input) and Xout (as Output) with speci ic weight at convolution stages
CCostdws = Kw ∗ KH ∗ Xin ∗ fw ∗ fH
Wstd = Cin ∗ Kw ∗ KH ∗ Cout Wgtstd = Xin ∗ Kw ∗ KH ∗ Xout
Wgtdws = Kw ∗ KH ∗ Xout
Depthwise convolution layer will produce compu‐ tation result in the form of layers which is as follows:
+ Xin ∗ Xout ∗ fw ∗ fH
‐ Measurements of the kernel’s height and width, respectively, are written as Kw *KH. ‐ The computational burden of producing result fea‐ ture maps of dimension fw* fH is:
(3)
This process will decrease the weighting and cal‐ culations over wide ranges in Input layers for Xin range. DWS can be used as standard method for con‐ volution in the computation for decreasing cost by some factors. Under varying computing points over the depth and space access point in category. All input measured with single iltering while expression describes convolution depth. F(a, b, i) =
(1)
(2)
m ∑ m ∑
M
(v,v,i)
v=1 v=1
∗ N(a + v − 1, v − 1, i)
(4)
where m denotes convolutional kernels of dimension m x m x cin in‐depth and cin form x m x cin convolu‐ tional kernels in size. By applying the nth ilter from M to the nth input channel of N, the nth channel of the 55
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
iltered result feature vector F is generated. The result of the depth‐wise convolution is clustered linearly, and this is used to generate new features in the point‐ wise convolution. Equation expresses a convolution at a single location. P(a,b,j) =
cin ∑
M(v,v,i) ∗ Q(i,j)
(5)
i=1
An expression for computation of DWS may be as follows: Cdws = m2 ∗ cin ∗ h.w + xin ∗ xout ∗ h ∗ w
(6)
This second Speci ication for the MobileNet V2 under neural network will allow using small screens. The image classi ier will be enhanced with other clas‐ si iers to detect and improve results in tandem. Evaluation Metric The confusion matrix will be calculated for input in the Learning/representation of the task using the following table Actual No(0) Actual Yes(1)
Predicted No (0) TN FN
Predicted Yes (1) FP TP
So here TN means True Negative, FN means False Negative, FP means False Positive, TP means True Pos‐ itive Accuracy: The overall performance for the system will be described with all representational classes of the input to the output need to be mapped in overall improvement based on classes that derive the fruitful and needed results. It should be a notation of classes that needed actual performance to be represented over time
Figure 11. Comparison of algorithms with accuracy Table 2. Results (comparison of different algorithms) Algorithm Lenet‐5 AlexNet VGG‐16 Inception‐V1 Inception‐V3 Resnet‐50 Xception Inception‐v4 Inception‐Resnet MobileNetV2 (Proposed Work)
Accuracy 0.886 0.85 0.91 0.92 0.94 0.95 0.93 0.91 0.97 0.99
Precision 0.89 0.85 0.88 0.94 0.92 0.94 0.95 0.92 0.94 0.98
Recall 0.96 0.90 0.90 0.90 0.90 0.96 0.93 0.96 0.96 0.99
F1-score 0.93 0.87 0.88 0.92 0.90 0.94 0.93 0.93 0.94 0.98
F1 Score: The F‐score is a statistic de ine to evaluate the overall performance of the computation model. It is the representation of values taking the harmonic mean of the model’s precision and recall.
Accuracy = (TP + TN)/(TP + TN + FP + FN) F1 Score = 2 ∗ (Precision ∗ Recall)/(Precision + Recall) Precision: Input and the level of decision with positive inter‐ vals that are speci ied over positive criteria to the tagged values in leveling. The level of accuracy rec‐ ognized in a positive sample can be evaluated by the point placed for looking at its precision. Precision = TP/(TP + FP) Recall: To calculate recall, the process includes dividing the total number of positive samples by the number of positive samples that were correctly identi ied as positive. De ined by the percentage of positive sam‐ ples that were correctly identi ied. The recall statistic will tell how well a model can recognize positive data and provides an evaluation of this ability. The higher the recall, the higher the number of genuine positives that are found. Recall = TP/(TP + FP) 56
Table represent the comparison of different algo‐ rithm with comparisons. Here the results show how our work’s accuracy is give better results with other algorithms.
5. Conclusion In this paper, we have developed a real‐time face mask detection application by working on an ef icient CNN i.e. convolutional neural network model based on MobileNetV2. A dataset of about 1376 images has been used i.e. from Kaggle and other sources. Our pro‐ posed system will predict whether a person is wearing a face mask or not and if a person with no mask is detected then an alert sound will be there, Moreover, it will keep the count of people that will be present in the video stream. This real‐time application after being trained and tested has achieved an accuracy of about 99.99%. The model inds its use in public gatherings like airports, schools, of ices, etc.
Journal of Automation, Mobile Robotics and Intelligent Systems
6. Future Scope We are aware that how quickly the Deep Learning area is evolving and how intensively major organi‐ zations are conducting research. Researchers in the ield of machine learning, especially deep learning researchers, are attempting to do groundbreaking research. As a result, more variants in this model may appear in the future, which can improve its ef iciency in terms of implementation and perfor‐ mance. Real‐Time Face Mask Detection in Mass has an immensely boundless scope in the future due to the technology used in it. It can be modi ied as and when required, as it is versatile in terms of extension and open to updating according to the latest technology. Some aspects can be further modi ied such as face‐ recognizing distance can be extended, a Graphics pro‐ cessing unit (GPU) can be used for a large amount of the database and quick processing, data storage can be made server‐based and can be integrated with multiple cameras at the same time. AUTHORS Swapnil Soner∗ – Jaypee University of Engineering & Technology, Raghogarh, Guna (M.P.), India, e‐mail: Swapnil.soner@gmail.com. Ratnesh Litoriya – Medi‐Caps University, Indore, India, e‐mail: litoriya.ratnesh@gmail.com. Ravi Khatri – I Nurture Education Solutions Pvt. Ltd. India, e‐mail: ravikhatri1010@gmail.com. Ali Asgar Hussain – Medi‐Caps University, Indore, India, e‐mail: ali.asgar@medicaps.ac.in. Shreyas Pagrey – Chameli Devi Group of Institutions, Indore, India, e‐mail: shreyas.pagare@gmail.com. Sunil Kumar Kushwaha – Medi‐Caps University, Indore, India, e‐mail: sunilietkushwaha@gmail.com. ∗
Corresponding author
References [1] Pranad Munjal, Vikas Rattan, Rajat Dua, and Varun Malik. “Real‐Time Face Mask Detection using Deep Learning” Journal of Technology Management for Growing Economies, vol. 12, no. 1 (2021), pp. 25–31. DOI: 10.15415/jtmge. 2021.121003. [2] Mohammad Marufur Rahman, Saifuddin Mahmud, Md. Motaleb Hossen Manik, Jong‐ Hoon Kim, and Md. Milon Islam. “An Automated System to Limit COVID‐19 Using Facial Mask Detection in Smart City Network” Published in: 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). DOI: 10.1109/IEMTRON‐ ICS51293.2020.9216386. [3] A. Nieto‐Rodríguez, M. Mucientes, and V. M. Brea. “System for Medical Mask Detection in the Oper‐ ating Room Through Facial Attributes”. In Pattern Recognition and Image Analysis, Roberto Paredes, Jaime S. Cardoso, and Xosé M. Pardo
VOLUME 17,
N◦ 1
2023
(Eds.) 2015, Springer International Publishing, Cham, 138–145. [4] A. Das, M. Ansari, and R. Basak. “Covid‐19 Face Mask Detection Using TensorFlow, Keras and OpenCV.” 2020 IEEE 17th India Council Interna‐ tional Conference (INDICON), New Delhi, India. 2020. DOI: 10.1109/INDICON49873.2020.934 2585. [5] S. V. Militante and N. V. Dionisio. “Real‐Time Face Mask Recognition with Alarm System using Deep Learning”. 2020 11th IEEE Control and Sys‐ tem Graduate Research Colloquium (ICSGRC), Shah Alam, Malaysia. 2020. DOI: 10.1109/ICS‐ GRC49013.2020.9232610. [6] F. M. Javed Mehedi Shamrat, Sovon Chakraborty, Md. Masum Billah, Md. Al Jubair, Md Saidul Islam, and Rumesh Ranjan, “Face Mask Detection using Convolutional Neural Network (CNN) to reduce the spread of Covid‐19”. 5th International Con‐ ference on Trends in Electronics and Informatics (ICOEI 2021) Tirunelveli, India, 3‐5, June 2021. DOI: 10.1109/ICOEI51242.2021.9452836. [7] Wadii Boulila, Adel Ammar, Bilel Benjdira, and Anis Koubaa. “Securing the Classi ication of COVID‐19 in Chest X‐ray Images: A Privacy‐ Preserving Deep Learning Approach.” Image and Video Processing (eess.IV). DOI: 10.48550/ arXiv.2203.07728. [8] M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M. Khalifa. “A Hybrid Deep Transfer Learn‐ ing Model with Machine Learning Methods for Face Mask Detection in the Era of the COVID‐ 19 Pandemic. Measurement, 167, 2021. 108288. DOI: 10.1016/j. [9] S. Soner, R. Litoriya, and P. Pandey. “Explor‐ ing Blockchain and Smart Contract Technol‐ ogy for Reliable and Secure Land Registration and Record Management.” Wireless Pers Commun 121, 2495–2509. 2021. DOI: 10.1007/s11277‐ 021‐08833‐1. [10] P. Pandiyan. “Social Distance Monitoring and Face Mask Detection Using Deep Neural Net‐ work.” (2020, December 17). Retrieved from: https://www.researchgate.net/publication /347439579_Social_Distance_Monitoring_and_ Face_Mask_Detection_Using_Deep_Neural_Netw ork, DOI: 10.15415/jtmge.2021.121003. [11] S. Soner, R. Litoriya, and P. Pandey. “Making Toll Charges Collection Ef icient and Trustless: A Blockchain‐Based Approach” 2021 3rd Inter‐ national Conference on Advances in Comput‐ ing, Communication Control and Networking (ICAC3N). IEEE, DOI: 10.1109/ICAC3N53548. 2021.9725447. [12] H. Vagrecha, A. Tuteja, A. S. Mandloi, A. Dube, and S. Soner. “Coders Hub ML Edu. Platform Sys‐ tems (May 24, 2021).” Proceedings of the Inter‐ national Conference on Smart Data Intelligence (ICSMDI 2021). Available at SSRN: https://ss 57
Journal of Automation, Mobile Robotics and Intelligent Systems
rn.com/abstract=3851962 or DOI: 10.2139/ ssrn.3851962. [13] X. Wang, X. Le, and Q. Lu. “Analysis of China’s Smart City Upgrade and Smart Logistics Devel‐ opment under the COVID‐19 Epidemic,” J. Phys. Conf. Ser., vol. 1570, p. 012066, 2020. DOI: 10. 3390/s21113838. [14] J. Won Sonn, and J. K. Lee. “The smart city as time‐ space cartographer in COVID‐19 control: the South Korean strategy and democratic control of surveillance technology,” Eurasian Geogr. Econ., pp. 1‐11, May. 2020. DOI: 10.1016/j.scitotenv. 2020.142391. [15] P. Ghosh, F. M. Javed Mehedi Shamrat, S. Shultana, S. Afrin, A. A. Anjum, and A. A. Khan. “Optimiza‐ tion of Prediction Method of Chronic Kidney Dis‐ ease Using Machine Learning Algorithm,” 2020 15th International Joint Symposium on Arti i‐ cial Intelligence and Natural Language Process‐ ing (iSAI‐NLP), Bangkok, Thailand, 2020, pp. 1‐6. DOI: 10.1109/iSAI‐NLP51646.2020.9376787. [16] Preeti Nagrath, Rachna Jain, Agam Madan, Rohan Arora, Piyush Kataria, and Jude Hemanth. 2021. “SSDM V2: A real time DNN‐based face mask detection system using single shot multibox detector and MobileNetV2.” Sustainable Cities and Society 66 (2021), 102692. DOI: 10.1016/j. scs.2020.102692. [17] Institute for Health Metrics and Evaluation. 2021. COVID‐19 Projections. Retrieved April 27, 2021. page 1665. DOI: 10.1016/S0140‐6736 (21)02143‐7. [18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. “You Only Look Once: Uni ied, Real‐Time Object Detection. In 2016 IEEE Conference on Com‐ puter Vision and Pattern Recognition (CVPR).” 2016. 779–788. DOI: 10.1109/CVPR.2016.91. [19] M. V. Manorj Kumar et al. “Detection of Covid‐19 Using Deep Learning Technique and Cost Effec‐ tiveness Evaluation: A Survey”. Front. Artif. Intell., 27 May 2022 Sec. Medicine and Public Health. DOI: 10.3389/frai.2022.912022. [20] Zaid Abdi Alkareem Alyasseri et al. “Review on COVID‐19 diagnosis models based on machine learning and deep learning approaches” Expert Syst. 2022 Mar; 39(3): e12759. PMCID: PMC8420483 PMID: 34511689. DOI: 10.1111/ exsy.12759.
58
VOLUME 17,
N◦ 1
2023
[21] S. V. Kogilavani et al. “COVID‐19 Detection Based on Lung CT Scan Using Deep Learning Tech‐ niques.” Research Article | Open Access Vol‐ ume 2022 | Article ID 7672196 | DOI: 10.1155/ 2022/7672196. [22] Mohamad Alkhodari, and Ahsan, H. Khandoker. “Detection of COVID‐19 in smartphone‐based breathing recordings: A pre‐screening deep learning tool.” Published: January 13, 2022. DOI: 10.1371/journal.pone.0262448. [23] Muhammad Imad et al. “IoT Based Machine Learning and Deep Learning Platform for COVID‐19 Prevention and Control: A Systematic Review” January 2022. In book: AI and IoT for Sustainable Development in Emerging Countries (pp. 523‐536). DOI: 10.1007/978‐3‐030‐906 18‐4_26. [24] https://www.researchgate.net/publication/3 44725412_Covid19_Face_Mask_Detection_Usi ng_TensorFlow_Keras_and_OpenCV#pf2. [25] S. Soner, J. Kukade, “Autonomous Anomaly Detection System for Crime Monitoring and Alert Generation”, Journal of Automation, Mobile Robotics and Intelligent Systems 16(1), 62‐71, DOI: 10.14313/JAMRIS/1‐2022/7. [26] S. Soner, R. Litoriya, P. Pandey, “Making Toll Charges Collection Ef icient and Trustless: A Blockchain‐Based Approach” 2022 4th International Conference on Advances in Computing, Communication Control and Net‐ working (ICAC3N), DOI: 10.1109/ICAC3N56670. 2022. 16‐17 Dec. 2022. [27] S. Soner, A. Jain, A. Tripathi, R. Litoriya. “A novel approach to calculate the severity and priority of bugs in software projects”, 2010 2nd Interna‐ tional conference on education technology and computer, Volume 2, Pages V2‐50–V2‐54 Pub‐ lisher IEEE, DOI: 10.1109/ICETC.2010.5529438.
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
PEOPLE TRACKING IN VIDEO SURVEILLANCE SYSTEMS BASED ON ARTIFICIAL INTELLIGENCE Submitted: 15th October 2022; accepted: 2nd January 2023
Abir Nasry, Abderrahmane Ezzahout, Fouzia Omary DOI: 10.14313/JAMRIS/1‐2023/8 Abstract: As security is one of the basic human needs, we need security systems that can prevent crimes from happen‐ ing. In general, surveillance videos are used to observe the environment and human behavior in a given location. However, surveillance videos can only be used to record images or videos, without additional information. There‐ fore, more advanced cameras are needed to obtain other additional information such as the position and move‐ ment of people. This research extracted this information from surveillance video footage using a person tracking, detection, and identification algorithm. The framework for these is based on deep learning algorithms, a popu‐ lar branch of artificial intelligence. In the field of video surveillance, person tracking is considered a challenging task. Many computer vision, machine learning, and deep learning techniques have been developed in recent years. The majority of these techniques are based on frontal view images or video sequences. In this work, we will compare some previous work related to the same topic. Keywords: Person tracking, Person detection, Person identification, Video surveillance, Artificial intelligence.
Figure 1. Casablanca is giving itself the means to fight against insecurity
Figure 2. Video‐surveillance architecture: live viewing and a posteriori viewing
1. Introduction Nowadays, video surveillance is expanding rapidly, both technologically and economically. It has become one of the essential links in the security policies of governments. This evolution responds to the security needs of every citizen, in line with the increase of delinquency and criminality. Video surveillance is now becoming increasingly necessary to monitor both public and private places. In this context, camera networks are installed in abun‐ dance in the streets, shopping centers, public trans‐ portation, of ices, airports, apartment buildings, etc. A video surveillance system consists essentially of monitoring a multiple number of security cam‐ era feeds at the same time. However, the increase in the number of installed cameras makes it extremely dif icult to manually process the data generated by these cameras. To help security monitoring personnel explore this data, it is necessary to make the video surveillance task by automating some of its functions. Among these include object detection, person detec‐ tion, event and human action recognition, tracking of people, etc. Another application is to recognize people
who leave the ield of view of one camera and reap‐ pear in another. The video surveillance system must then be able to reidentify the person and continue the tracking. The research of the estimation of the 3D move‐ ment of a person is an important ield of computer vision, because of its numerous possible applications: human–computer interfaces, animation, interaction with virtual environments, games, etc. Capturing 3D human motion in real time, with a single or multiple cameras and without markers is a dif icult to achieve. This is due to the ambiguities resulting from the lack of information of depth, partial occultation of human body parts, the elevated number of degrees of freedom, and the variation in the propor‐ tions of the human body, as well as the color of the clothes of the different people present in the scene. For these reasons, the number of works dealing with the estimation of people tracking continues to increase. In this paper, we will present the different approaches to pose estimation and the tracking of people’s movements.
2023 © Nasry et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
59
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
Visual tracking: Visual tracking or visual target track‐ ing is a research topic in computer vision that is applied in a wide range of everyday scenarios. The goal of visual tracking is to estimate the future position of a visual target that has been initialized without the availability of the rest of the video.
3. Literature Review Figure 3. Object tracking in deep learning
2. Methods 2.1. Definition Object tracking is an application of deep learning in which the program takes an initial set of object detec‐ tions and develops a unique identi ication for each of the initial detections, and then tracks the detected objects as they move through frames of a video. In other words, object tracking involves automat‐ ically identifying objects in a video and interpreting them as a set of trajectories with high accuracy. Often there is an indication around the tracked object, for example, a square that follows the object, showing the user where the object is on the screen. Different type of object tracking Object tracking is used in a variety of use cases involving different types of input images. Whether the intended input is an image or video, or real‐time video versus prerecorded video, it impacts the algorithms used to create object‐tracking applications. The kind of input also impacts the category, use cases, and applications of object tracking. Here, we will brie ly describe a few popular uses and types of object tracking, such as video tracking, visual tracking, and image tracking. Video tracking: Video tracking is the process of locat‐ ing a moving object (or multiple objects) over time using a camera. It has many uses, including: human– computer interaction, security and surveillance, video communication and compression, augmented reality, traf ic control, medical imaging, and video editing. Video tracking can be a time‐consuming process due to the amount of data contained in the video. Adding to the complexity is the potential need to use object recognition techniques for tracking, a dif icult prob‐ lem in itself. Image tracking: Image tracking is intended to detect two‐dimensional images of interest in a given input. These images are then continuously tracked as they move through the scene. Image tracking is ideal for datasets with high contrast images (e.g., black and white), asymmetry, few patterns, and multiple iden‐ ti iable differences between the image of interest and other images in the set. Image tracking relies on com‐ puter vision to detect and augment images after the image targets have been predetermined. 60
This section provides a short outline of various algorithms employed in the literature. The section is categorized into ancient generic, machine learning, features, and deep learning‐based ways. A compre‐ hensive survey of various tracking methods are often found in previous studies. The main purpose of tracking techniques is to detect objects in a video object in a video sequence and to keep track of the successive images in order to ind the trajectories of each detected object. Con‐ ventional techniques are generally based on motion and observation models. The motion model involves detecting and predicting the object’s location the appearance of the object and its position in the image. Some researchers have used the model‐based method for object tracking. Many researchers have used machine learning for object tracking, which clas‐ si ies the tracked object, such as boosting, random forest, Hough forest, structural learning, and support vector machine. Some have proposed feature‐based tracking methods, such as Haar‐type features, local binary model, histogram of oriented gradient, scale‐ invariant feature transform, discrete cosine trans‐ form, and shape features [4–7]. Other techniques use Kalman ilters or the Hungarian algorithm. In order to improve the performance of the tracking meth‐ ods, different researchers have combined the infor‐ mation from several indices and presented object tracking methods that combine a feature‐based detec‐ tor with the probabilistic segmentation method. The majority of these methods are mainly developed for frontal view datasets that may suffer from occlusion problems. 3.1. Reviewing Some Related Work Review on: Convolutional Neural Network–Based Person Tracking Using Overhead Views This paper emphasizes on overhead view person tracking using Faster region convolutional neural net‐ work (Faster‐RCNN) in combination with GOTURN architecture [2]. The main work in this paper, the CNN model is used for top view tracking of people in different indoor and outdoor environments. The use of the top view overcomes the various problems encountered in the front view dataset. The authors brie ly explained different tracking algorithms used in literature. They classi ied them into traditional generic methods, machine learning methods, features, and deep learning‐based methods. The authors in the article have tried to explain Faster‐ RCNN person detection for person detection using an
Journal of Automation, Mobile Robotics and Intelligent Systems
overhead view video‐frames approach. Faster‐RCNN has two main steps. The irst step produces region anchors (regions with a probability of occurrence of the probability of occurrence of the object (person)) via the RPN (Region proposal networks). The next step is to classify the object (person) using detected regions and extracts the information from the bounding box [2]. For track‐ ing, the ellipsoid GOTURN is used, which is based on CNN layer architecture. The authors get this result: the Faster‐RCNN detec‐ tion model achieved the true detection rate ranging from 90% to 93% with a minimum false detection rate of up to 0.5%. The GOTURN tracking algorithm achieved similar results with the success rate ranging from 90% to 94%. Review on: Long‐Term Identity‐Aware Multi‐Person Tracking for Surveillance Video Summarization Authors Shoou‐I Yu, Yi Yang, Xuanchong Li, and Alexander G. Hauptmann elaborate a study about a multi‐person tracking algorithm for very long‐ term (e.g. month‐long) multi‐camera surveillance scenarios. The proposed tracker propagates iden‐ tity information to frames without recognized faces by uncovering the appearance and spatial manifold formed by person detections. The algorithm was tested on a 23‐day 15‐camera dataset (4,935 hours total). The authors reviewed work that follows the very popular tracking‐by‐detection paradigm. They explained carefully the four main components of the tracking‐by‐detection paradigm object localiza‐ tion, appearance modeling, motion modeling, and data association [2, 3]. The setting was to see the tracking‐by‐detection‐ based multi‐object as a constrained clustering prob‐ lem. The location hypothesis that is a person detection result can be viewed as a point in the spatial‐temporal space, and the goal is to group the points, so that the points in the same cluster belong to a single trajec‐ tory. A trajectory should follow the mutual exclusion constraint and spatial locality constraint, which are de ined in the following two constraints: ‐ Mutual exclusion constraint: a person detection result can only belong to at most one trajectory. ‐ Spatial‐locality constraint: two person detection results belonging to a single trajectory should be reachable with reasonable velocity, that is, a person cannot be in two places at the same time. The authors propose a tracking algorithm that can be resumed in four main steps: compute Laplacian matrices; compute spatial locality matrix; compute diagonal matrix; compute diagonal matrix. The algo‐ rithm was tested in four datasets for experiments ter‐ race1 [8], Caremedia. 8h the 15 camera Caremedia 8h dataset is a newly annotated dataset that has 49 individuals performing [3], Caremedia 23d. The 15 camera Caremedia 23d dataset is a
VOLUME 17,
N◦ 1
2023
newly annotated data set that consists of nursing home recordings spanning over 23 days [3]. The proposed method was compared with three identity-aware tracking baselines multi‐commodity network low, Lagrangian relaxation, and non‐negative discretization. Therefore, other trackers that did not have the ability to incorporate identity information were not compared [3]. The indings were able to localize a person 53.2% of the time with 69.8% precision. They further per‐ formed video summarization experiments based on their tracking output. Results on 116.25 hours of video showed that they were able to generate a reasonable visual diary for different people, thus potentially open‐ ing the door to automatic summarization of the vast amount of surveillance video generated every day. Review on: Fast Online Object Tracking and Segmentation: A Unifying Approach To allow online operability and fast speed, the authors adopt the fully convolutional SiamMask framework. Moreover, to illustrate that their approach is gnostic to the speci ic fully convolutional method used as a starting point, they consider the popular SiamFC and SiamRPN as two representative examples; they then adapt them to propose their own solution to the SiamMask. The fundamental building block of the track‐ ing system is an of line‐trained fully convolutional Siamese network. This compares an exemplar image z against a large search image x to obtain a dense response map. z and x are, respectively, a w by h crop centered on the target object and a larger crop centered on the last estimated position of the tar‐ get. The two inputs are processed by the same CNN yeilding two feature maps that are cross‐correlated. We have in the gφ equation each spatial element of the response map, which we see as the left side of the equation gφ referring to the response of a candi‐ date window RoW [9]. For SiamFC, the goal is for the maximum value of the response map to correspond to the target location in the search area x. However, in SiamMask, the authors replace the simple cross correlation with depth‐wise cross correlation and pro‐ duce a multi‐channel response map. SiamFC is trained of line on millions of video frames with the logistic laws they refer to as Lsim. The performance of SiamFC was improved by relying on a region proposal network rpn, which estimates the target location with a bound‐ ing box of variable aspect ratio SiamRPN outputs box predections in parallel with classi ication scores, and these are referred as Lbox and Lscore . In SiamMask, the authors point out that besides similarity scores and bounding box coordinates, it is possible for the RoW (response of the candidate window) of a fully convolu‐ tional Siamese network to also encode the information necessary to produce a pixel‐wise binary map. They predict w by h binary masks, one for each RoW using a simple two‐layer neural network hθ . The authors presented two variants; one combines the mask with 61
Journal of Automation, Mobile Robotics and Intelligent Systems
rbn parameters box and score, and the other one com‐ bines the mask with elsin from cmfc. In order to have the comparison against the tracking benchmarks, it is required to have a bounding box as inal represen‐ tation of the target object. The authors showed that the mbr strategy to obtain a rotated bounding box from a binary mask offers a signi icant advantage over popular strategies that simply report excess aligned bounding boxes [10–18]. The authors explain that their method aims at the intersection between the tasks of visual tracking and video object segmentation to achieve high practical convenience [11]. However, in addition to tracking the wire bounding box, it also generates the mask and achieves state‐of‐art performance [12–22]. The performance measure used is the expected average overlap( EAO), which considers both robustness and accuracy of a tracker. As a result, SiamMask can be considered as a strong baseline for OnAVOS. First, it’s almost two orders of magnitude faster than accurate approches, and second, it is competitive with recent Video Object Segmentation (VOS) methods that do not employ ine‐tuning, while being four times more ef icient than the fastest ones; also it doesn’t need a mask for initialization. Review on: A Comparison of Multicamera Person‐Tracking Algorithms Authors A. W. Senior, G. Potamianos, S. Chu, Z. Zhang, and A. Hampapur conducted a study about a comparison of four tracking algorithms that have been applied to people in 3D or 2D multiple cameras in indoor environments. The setting was to present four different approaches that have been taken to tracking a person in indoor scenario instrumented with multiple cameras with overlapping ields of view [1–19]. The irst method was the background subtraction tracker. The second tracker uses a radi‐ cally different approach to the tracking of the speaker, which is the particle ilter tracker. The face detection‐ based tracker was the third method, and the fourth method for tracking the speaker is the edge‐based body tracker to use a 3D model‐based tracker that they developed for articulated body tracking. The data on which they analyzed tracker performance was col‐ lected as part of the CHIL project—a consortium of European Union institutions. All initial data were col‐ lected by the University of Karlsruhe in its smart meet‐ ing room, and consists of video from four calibrated static cameras mounted in the corners of the 5.9 m by 7.1 m room. The indings were that in the particle‐ iltering approach, there is potential for extending this approach to track multiple targets, though occlusion is much more complex, and the feature space is much larger at two dimensions per candidate. The face tracking system relies on face detection, which is not perfect, and cannot be guaranteed with fewer than four cameras, but here works well, and indeed leads to the best system we have hitherto reported on the CHIL data. Finally, the edge alignment technique works very well once initialized but does not recover 62
VOLUME 17,
N◦ 1
2023
from tracking failures. The authors suggested a combination, using the particle iltering approach for detection and initialization and the edge alignment approach for tracking may be feasible. Review on: SimpleTrack: Understanding and Rethinking 3D Multi‐Object Tracking The authors talk about SimpleTrack, which ana‐ lyzes 3D MOT (Multiple Object Tracking) proposed with some very simple yet effective improvements, and I’m happy to see that many of the improve‐ ments are actually adopted by some recent 3D MOT books [27]. Ziqi Pang, Zhichao Li, and Naiyan Wang make the following contributions. First, they sum‐ marize a checking by detection framework. Second, they analyze some video cases. Third, they propose some effective solutions, and inally they also rethink some existing benchmarks so that every researcher can compare fairly to each other [16]. To familiarize with 3D MOT visualization, the notion to give 3D MOT is to track objects coherently over time, which includes both localization and also identi ication. A general tracking by detection frame‐ work is we want to associate the detection boundary boxes to the old track list on every frame, and that is to say, we want to use some association matrix to link every detection boundary box to every motion prediction of the tracks. Then we can use the bounty boxes to update the states of the tracks. The irst data case we notice is related to how we prepro‐ cess the detection boundary boxes, so with the key insight there is a difference between object detection and multiple object tracking, because object detection wants to maximize the map they generally have to out‐ put many redundant body boxes just to improve the recall. However, this will confuse the trackers, then we propose to remove the redundant bounding boxes by more regressive nms. Compared to score iltering, this more aggressive nms is even better because it can keep the spatial diversity of the body boxes. The second part we focus on is how we can do better association. Previously when people do association, they can use some association matrix such as Lu or L2 distance. However if we use IOU, it is not lexible enough if this frame rate is really low, and if the agent is moving abruptly, the IOU may lose target. However, if you lose use of the L2 distance, it is now discriminative enough to be aware of [16]. For example, the orientation and in that case they may associate the response positive detection B to the motion prediction the Blue Bunny boxes instead of the true positive A.
Figure 4. Life cycle management example
Journal of Automation, Mobile Robotics and Intelligent Systems
To overcome the disadvantages of the two associ‐ ated matrixes, we propose to use Glu generalized IOU, which combines the best of two worlds, and you can better know both of the methods. The inal part is about how we can do life cycle management. Life cycle management means you know how we can determine if a track is alive or dead. Most of the works they focus on have better association because they think this is the main source of ID switch. However, after doing some data analysis, we found out that the early termination consumes more than 90% of the ID switches, which is really surprising. So, early termination here means that we have a track we initially assign it to, but then we terminate it, and it is switched to idb so that’s one ID switch. To avoid this thing from happening, we can use a low‐score detec‐ tion body box to indicate the existence of an object. That is to say, if there is a low‐score bounding box corresponding to a track, we don’t have to output that bounty box, but we have to keep that track alive. This thing is called a two‐stage association, and they really improve the performance. Finally, if you look at the performance on the web open dataset and nuisance, our method is really com‐ petitive compared to some related methods, and this proves that our solutions are simple yet effective [16]. A brief notion about some rethinking of the bench‐ mark we mentioned is that the irst is to use higher frame reads, and the second is to use output motion model predictions and design low scores so these two will give you better checking results. Also, it’s better for the motor evaluation. Review on: YOLOv7: Trainable Bag‐of‐Freebies Sets New State‐of‐the‐Art for Real‐Time Object Detectors This new paper called YOLOv7 the trainable bag of freebies that sets a new state‐of‐the art for real‐time object detectors. As the title suggests, it’s a new state‐ of‐the‐art model for real‐time object detection. The paper is all about this video introduction. Before getting into this subject, let’s understand the history of how people arrived at this paper. Alexi took up the YOLO torch from the original author Joseph Redman, who released the irst three YOLO series models. When Redmond quit the computerization industry due to ethical reasons, Alexi maintained his work for the YOLOv3 and also released YOLOv4, want‐ ing to enter the computer vision research stage with cross‐stage partial networks that allowed YOLOv4 and v5 to build more ef icient features. From that, they discovered YOLOv4 and scaled it, which was the irst paper Alexi and Wayne collaborated on. In doing so, they put a YOLOv5 Pytorch implementation over the line. Wang Chiang Yao also released the YOLOr, which introduced new methods with explicit knowledge in neural networks. Now they’re joined again to sample something magical, the yolo v7 model. In this article, we are going to talk about the abstract of the paper, how the algorithm works, what approaches they used, why they used the particular methods model compar‐ isons, and inally why is it so awesome. The YOLOv7
VOLUME 17,
N◦ 1
2023
trainable bag of freebies sets a new state‐of‐the‐art for real‐time object detectors. The paper states that the model can ef iciently pre‐ dict video inputs ranging from 5 fps to 160 fps. YOLOv7 has the highest average precision of 56.8%. YOLOv7 outperforms both transformer‐based object detectors and convolution‐based object detectors. Some of the object detectors that YOLOv7 outperform were YOLOr yellow rx, YOLOv5, etc. [17]. The abstract compares it with YOLOv4, and because both of the models are using bag of freebies, the cost of running the model has been reduced by 50% from the same dataset due to its incredible speed and accuracy. The parameters in the hidden layer of neural networks are also reduced up to 40%. Model scaling has never been easy. They can maintain the original model design and structure of a well‐performing compound. YOLOv7 has achieved 1.5 times higher average precision than YOLOv4. This is a big deal because YOLOv7 has 75% fewer parameters and 36% less computational time than YOLOv4. How does it work? YOLO uses a sole convolution neural network to predict bounding boxes and class probabilities considering the entire image in a single evaluation in one step. For one unit, YOLO predicts multiple bounding boxes. The class probabilities for each box and all the bounding boxes across the classes make it the one‐stage detection model, unlike earlier object detection models, which localize objects and images by using regions of the image with high probabilities of contenders YOLO considers the full image. Now we’ll talk about the architecture of YOLO. Image frames are featured through a backbone, which is then combined and mixed in the neck, and then they are passed along and YOLO predicts the bounding boxes, the classes of the bounding box, and objects of the bounding boxes. Let’s understand each of its mod‐ ules separately. First, the input layer is nothing but the image input you provide. It can be a two‐dimensional array with three channels: red, blue, and green. It can also be a video input at each frame of some image input. What is the backbone? It’s a deep neural net‐ work composed mainly of convolutional layers. The main objective of the backbone is to extract the essential features. The selection of backbone is a key step, as it will improve the performance of object detection. Oftentimes, pre‐trained neural networks are used to train the backbone. Some of the commonly used pre‐trained networks are vgg‐16 imagenet, rou‐ tinenet, resnet50, etc. For YOLOv7, the paper used the following pre‐ trained weights: vovnet, cspvo. and net Elan. We will learn more about why these weights are used during the study. The object detector models insert addi‐ tional layers between backbone and head, which are referred to as a copy of the detectors. The essen‐ tial role of the neck is to collect feature maps from different stages. Usually a neck is composed of sev‐ eral bottom‐up parts and several top‐down parts for enhancement. We use fpn, r b, and pan detection hap‐ pens in the head. 63
Journal of Automation, Mobile Robotics and Intelligent Systems
The head is also called a dense prediction to set the director to decouple the object localization and classi‐ ication task for each module once the detectors make the prediction for this localization and classi ication at the same time. This layer is present only in one stage after detectors like YOLO ssd rpn into self‐detection. They were completely different. Sparse prediction is for two‐stage detectors frcnn and rfcn highly different which does the traditional class probabilities for the model input. Our YOLO is one stage. Together they form the YOLO architecture. Let’s dive deeper into the topics and technical words previously mentioned. The irst term is bag of freebies model, which refers to increasing the model accuracy by making improvements without actually increasing the training cost. The older versions of YOLOv4 also use the bag of freebies models. In this sec‐ tion, we’ll learn some of the trainable bag of freebies used for this particular paper batch. Normalization can be an activation topology, and this part mainly connects the batch normalization layer directly to the convolutional layer. The purpose of this is to inte‐ grate the mean and variance of batch normalization into the bias and rate of the convolution layer at the inference stage [17]. Second, implicit knowledge in YOLOr combined with convolutional feature maps in addition and mul‐ tiplication. Implicit knowledge in YOLOr can be simpli‐ ied to a vector by pre‐computing an inference state. This vector can be combined with the bias and weight of the previous or subsequent convolutional layer. The inal ema model is the technique used in mean teacher, and in the system they use the ema model purely as the inal interference model training opti‐ mizers. The author uses gradient prediction to gener‐ ate course‐de ined hierarchical labels. The author also used extended ef icient layer aggregation networks and performed a model scaling for concatenation‐ based models and identifying connections in one con‐ volutional layer [17]. Finally, the author used Microsoft’s Coco dataset to train the YOLOv7 from scratch without using any other datasets or pre‐trained weights. The model is able to perform with these pre‐trained weights only using the Microsoft Coco dataset. During the research, it was igured out that the average precision was higher when the iou threshold was increased. The iou is noth‐ ing but intersection over union. It is a term used to describe the extent of overlap of two boxes; the greater the region of overlap the greater the iou. We train a model to output a box that its perfectly around an object. For example, in Figure 4, we have a green box and a red box. The green box represents the ground truth, and the red box represents the prediction from our model. The aim of this model would be to keep improving its prediction until the red box and the green box perfectly overlap; that is the iou between the two boxes equals to one coming to layer aggre‐ gation networks. The ef iciency of a YOLO network’s convolutional layers in the backbone is essential to ef icient interference speed when started down the 64
VOLUME 17,
N◦ 1
2023
Figure 5. Example model
Figure 6. Layer Aggregation Network
Figure 7. Model re‐parameterizing path of maximum ef iciency with cross‐state partial networks in YOLOv7. The authors built and researched what happened to be in this topic, keeping in mind the amount of memory it takes to keep players in memory along with the distance it takes the gradient to back prop‐ agate through the layers; the shorter the gradient the more powerful the network will be to learn the inal layer aggregation. They chose elan in an extended version of the elan computational block model. Scal‐ ing all concatenation‐based models will change the input width of some layers and the depth of those models. These provide great support to the model in increasing the accuracy, as the model is now capable of identifying small objects and large objects. The scaling factors the model is dependent on resolution, depth stage, and width model re‐ parameterizing, and use gradient low propagation plans to analyze how re‐parameterized convolutions should be combined with different networks. Rep Conv combines three into three convolution, one into one convolution, and identify connections in one convolution layer. Repconv without identity connection is used to design the architecture of planned re‐parameterized convolution [17, 26].
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N◦ 1
2023
assigner. This auxiliary head was really helpful to increase the ef iciency of the model paper also intro‐ duced extended ef icient layer aggregation networks and compound scaling for model scaling [17, 23].
4. Conclusion
Figure 8. Results The re‐parameterization technique involves aver‐ aging a set of model weights to create a model that is more robust to the general pattern that is trying to model in research. There has been a decent focus on model re‐parameterization. A piece of the net‐ work has its own re‐parameterization strategies. The YOLOv7 author uses gradient low propagation paths to see which model in the network should have re‐ parameterized strategies, which should not model the level ensemble. The weighted average of the weights of a model of different iteration numbers were used to evaluate this sample module level ensemble train‐ ing. Multiple identical models with different training data and the average rates of various training model modules level ensemble have been used in YOLOv7 for re‐parameterization. We split a module into mul‐ tiple identical and different module branches during training and integrate multiple branch modules into a completely equivalent module during inference. That module level ensemble is the next topic of the auxiliary head course. To ind we call the head responsible for the inal output, the lead head and the head used to assist in the training is called the auxiliary head. They use the lead head prediction as guidance to gener‐ ate a cost of buying hierarchy levels. The reason to do this is that the lead head has a relatively strong learning capability, so soft levels in data from it should be representative of the distribution and correlation between source data and the target. The inal level is the same as the soft level generated by the lead head guided label assigner, and the course label is gener‐ ated by allowing more grids to be treated as a possible target by relaxing the constraints of the possible sam‐ ple assignment process results. Comparing the model in front of the existing model the algorithm must be impressive and provides a lot of scope for improve‐ ment. It is able to predict bounding box properly with high con idence. It’s also able to predict images and videos more accurately. The paper has done an impressive job of pre‐ senting this summary. The replacement problem of the re‐parameterization model has been overcome by this gradient low propagation path to analyze how re‐parameterization convolutional networks can be combined with different networks. It combines three into three and one into one convolutional. In one convolutional layer, repconvn, the model was able to overcome the problem of dynamic label assignment by using a course to ind the lead head guided label
Video object tracking is the process of basically monitoring an object throughout a video frame. What that means is you want to localize that object and then you want to be able to predict the trajectory of that object so you know where it’s going to be at the next frame. There are different categories of video object tracking. You could do multi‐object tracker or single object trackers online or of line—basically, if your models are pre‐trained or if it’s going on the ly and detection based. There’s a lot of different applications for video object tracking medical imaging and robotics, even in ields like sports analytics. A tracker has two main components. Typically, the detection component has an appearance model and takes advantage so that leverage is spatial features. The detection models work frame by frame, and then you have the object motion model, which will tie those frames together so you can predict where that detection you localized on the individual frame will actually be on the next frame. In the following frames, some of the main challenges in tracking are occlusion are that if you’re tracking an object, and either the two objects get very close to each other or if it goes behind some other structure and you lose your actual detections, you’ll still have a track on that object because you’re project predicting the trajectory, but you won’t be able to actually see that object. When you can see it again, you need to be able to accurately pair the track that you had with the trajectory to the detection that represents that speci ic object. A lot of tracking systems are sensi‐ tive to appearance and scale changes, which is one of the main reasons why we use deep learning now in the tracking ield because it adds an extra layer that augments pre‐existing tracking methods. As you may know, deep learning models are very computation‐ ally expensive, so to get those working at a real‐time frame rate usually takes 30 frames per second or 60 frames per second,. They have to be pretty lightweight, especially if it’s on something like a self‐driving car or an aircraft. You can’t have so much hardware on it because of the weight, so that’s de initely a challenge that needs to be mitigated. I mentioned one of the largest applications just in the commercial sector is self‐driving cars. This is a gift from a video from Nvidia actually, and this shows some of the driverless car technology. That’s usually some attract object as assigned a bounding box, and that is also assigned a unique object indicator, which helps you from frame to frame across the video sequence keep track of that what that is and save some information about it. One small difference with this video versus some of the methods that we talked about, the mask method, is it’s a segmentation method, so every single picture in the frame has been 65
Journal of Automation, Mobile Robotics and Intelligent Systems
identi ied with the outputs of the systems. Some of the newer detectors like the YOLO algorithm that is tied with a classi ier so you’ll be able to localize the object and classify it at the same time. With the second com‐ ponent, you have the detection and then you use that detection for your object of interest to instantiate your track. Your track is where you’re going to be estimating your trajectory of your object, and the two main ways you can do that are by measurement dynamic models like a common ilter [21, 22]. There are deep learning methods that can also be used to predict the trajectory of objects. Typically, computer vision deep learning methods are kind of more in the detector side, so that’s a pretty novel advancement, and the speci ic neural network that is used in this context is an lstm network, a long short‐ term memory, if anyone is familiar. So once you have your detection object and you have your track, you need to know how they go together. There’s an asso‐ ciation algorithm component that allows you to cal‐ culate the similarity of your detections in your tracks and pair it together so you can update your track estimates with the information that you’re going to get from your detection frame. At this point, assuming that the track information is up until the last frame, your detection frame is your current frame, so you update that. You can keep going forward with your predictions and then the output of that is to get your consistent identity label. You want to know that car number one is car number one from the B frame one to the end of the video sequence, and then there’s a track maintenance step where basically if you’re tracking an object, you don’t get detections on it for a certain amount of time. You might want to do something like delete that track or downgrade it’s trustworthiness score. The current state‐of‐the‐art tracking method is not deep learning, and the reason I’m concluding here is because it gives a good understanding of a lot of the components within a tracker and then the deep learning methods build on to it. The current state‐ of‐the‐art is called Sort. It’s a simple online real‐time tracker. This is a multiple object tracker, and it works in real time as stated at the frame rate because it doesn’t have a deep learning step. This one is going to be speci ically very sensitive to things like occlusions and skill changes. The good thing about this algo‐ rithm is it’s very lightweight. We can use something for the detection step, which is the irst step. We can use something like a CNN‐based detection algorithm. It still has to be one of the more lightweight ones, but we can use that with the rest of the algorithm and still have it running at the frame rate. The most commonly used detection algorithm is the YOLO algo‐ rithm. The reason why that one is used over some other ones is it stands for “you only look once.” How other convolutional neural networks work is there’s a sliding window, which basically means you have to pass over the image something like 2,000 times. This only passes over once, and we save a lot of resource consumption. That way, once we have that detection and we instantiate our track based off the detection, we’ll have that box from the detection. Then there’s the 66
VOLUME 17,
N◦ 1
2023
estimation step with a common ilter, and that takes in our position state. Our velocity state does some dynamic modeling updates and measurements, and there are different noise models that are included into it as well. That allows us to recursively estimate what our position is going to be on our current frame, so after we have our tracks updated our detections are updated. Then there’s that association step, the common method that’s used, which is the Hungarian method, and that’s essentially just a cost minimization algo‐ rithm that uses the Mahalanobus distance at this met‐ ric. That it’s minimizing is the reason why that’s used is because everything coming out of the Coleman ilter is a distribution. So the Mahalanobus distance takes into account the distributions as opposed to some‐ thing like the Euclidean distance, which is just going to be for a single number. Then there’s track main‐ tenance step where you’re going to have counters on things like track age and the association history. Infor‐ mation like what detections it’s been associated with, how many frames you’ve seen it for, others you haven’t had detections on it, etc. can be used cleverly in order to upgrade or downgrade tracks to get to the object that you want to get to. Some images of A Sort tracker. show that when the target cross over there’s that wedge that forms, and that wedge actually indicates that you’re searching for detections in that general area. When they cross or there’s an occlusion, you lose that information and you have ambiguous detections. So this algorithm is sensitive to that, and the way we ix that sensitivity is with deep learning. The irst deep learning algorithm is a deep sort, essentially that sort algorithm. The main challenge in this topic is to ind a bal‐ ance between computational ef iciency and perfor‐ mance. All of these methods have characteristics and limitations under certain circumstances, and they are de ined as follows: Lighting: Light differs in many circumstances; low light adds darkness to the image while higher light adds shadow to the object. Positioning: Template matching requires a uniform position; otherwise, it cannot detect the object, even if it is present in the image. Rotation: The image can be rotated in any direction. In this case, some shapes are unable to be identi ied if the shape matching method is used. Occlusion: Object behind the object is sometimes not completely visible so it cannot be detected, and the useful part can be ignored [24, 25]. Our goal is to detect and track all objects in a scene, and usually these are the types of scenes that we’re looking at. The objects are usually all of the same type, either pedestrian tracking or car tracking, and we have a lot of them so there are a lot of occlu‐ sions. There are problems with different viewpoints, and therefore there are different levels of occlusions, depending also on the viewpoint of the camera. We can
Journal of Automation, Mobile Robotics and Intelligent Systems
also have moving cameras; these are all types of scenes that we want to deal with using a single algorithm. AUTHORS Abir Nasry∗ – Intelligent Processing and Security of System Team, Faculty of Science, Mohammed V University, Rabat, Morocco, e‐mail: nasryabir@gmail.com. Abderrahmane Ezzahout – Intelligent Processing and Security of System Team, Faculty of Science, Mohammed V University, Rabat, Morocco, e‐mail: abderrahmane.ezzahout@um5.ac.ma. Fouzia Omary – Intelligent Processing and Security of System Team, Faculty of Science, Mohammed V Uni‐ versity, Rabat, Morocco, e‐mail: omary@fsr.ac.ma. ∗
Corresponding author
References [1] A. W. Senior, G. Potamianos, S. Chu, Z. Zhang, A. Hampapur. A comparison of multicamera person‐tracking algorithms. IBM T. J. Watson Research Center, PO Box 704, Yorktown Heights, NY 10598, USA. [2] S. Yu, Y. Yang, X. Li, and A. G. Hauptmann. “Long‐Term Identity‐Aware Multi‐Person Tracking for Surveillance Video Summarization,” arXiv:1604.07468v2 [cs.CV] 11 Apr 2017. [3] F. Fleuret, J. Berclaz, R. Lengagne, and P. Fua. “Multicamera People Tracking with a Probabilis‐ tic Occupancy Map,” IEEE TPAMI, 2008, the work was supported in part by the Swiss Federal Of ice for Education and Science and in part by the Indo Swiss Joint Research Programme (ISJRP). [4] Book Matchmoving: The Invisible Art of Camera Tracking, by Tim Dobbert, Sybex, Feb 2005, ISBN 0‐7821‐4403‐9. Peter Mountney, Danail Stoy‐ anov & Guang‐Zhong Yang (2010). [5] Lyudmila Mihaylova, Paul Brasnett, Nishan Cana‐ garajan, and David Bull. “Object Tracking by Par‐ ticle Filtering Techniques in Video Sequences,” in Advances and Challenges in Multisensor Data and Information. NATO Security Through Science Series, 8 (Netherlands: IOS Press, 2007). pp. 260– 268. [6] K. Chandrasekaran (2010). Theses : Paramet‐ ric & non‐parametric background subtraction model with object tracking for VENUS. Rochester Institute of Technology. [7] L. Bao, B. Wu, and W. Liu. “CNN in MRF: T1: Video Object Segmentation via Inference in a CNN‐Based Higher‐Order Spatiotemporal MRF,” IEEE Conference on Computer Vision and Pattern Recognition, 2018, DOI: 10.1109/CVPR.2018.00626. [8] C. Feichtenhofer, A. Pinz, and A. Zisserman. “Detect to Track and Track to Detect,” IEEE International Conference on Computer Vision, 2017.
VOLUME 17,
N◦ 1
2023
[9] B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu. “High Perfor‐ mance Visual Tracking with Siamese Region Pro‐ posal Network,” IEEE Conference on Computer Vision and Pattern Recognition, 2018. [10] M. Danelljan, G. Bhat, F. S. Khan, M. Felsberg, et al. “Eco: Ef icient Convolution Operators for Tracking,” IEEE Conference on Computer Vision and Pattern Recognition, 2017. [11] TY ‐ BOOK, Qiang Wang, Li Zhang, Luca Bertinetto, Weiming Hu, and Philip H.S Torr, PY ‐ 2019/06/01 , T1‐“Fast Online Object Tracking and Segmentation: A Unifying Approach” , DOI: 10.1109/CVPR.2019.00142ER [12] Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu. “Distractor‐Aware Siamese Networks for Visual Object Tracking,” European Conference on Computer Vision, 2018. [13] T. Yang, and A. B. Chan. “Learning Dynamic Mem‐ ory Networks for Object Tracking.” In European Conference on Computer Vision, 2018. Vol. 1. ISBN 9780549524892. [14] Background subtraction is the process by which we segment moving regions in image sequences. “Basic Concept and Technical Terms”. Ishikawa Watanabe Groupe Laboratory, University of Tokyo. Retrieved 12 February 2015. [15] JOUR, M., Peter, S. Danail Y, Guang‐Zhong. “Three‐Dimensional Tissue Deformation Recov‐ ery and Tracking: Introducing Techniques Based on Laparoscopic or Endoscopic Images,” JO‐IEEE Signal Processing Magazine, 27, July, 2010.SP‐14, EP‐24. [16] Z. Pang, Z. Li, and N. Wang. “Simpletrack: Under‐ standing and Rethinking 3D Multi‐Object Track‐ ing,” arXiv:2111.09621v1 [cs.CV] 18 Nov 2021. [17] C. Wang, A. Bochkovskiy, and H. M. Liao. “YOLOv7: Trainable Bag‐of‐Freebies Sets New State‐of‐the‐Art for Real‐Time Object Detectors,” arXiv:2207.02696v1 [cs.CV] 6 Jul 2022. [18] P. Dai, R. Weng, W. Choi, C. Zhang, Z. He, and W. Ding: “Learning a Proposal Classi ier for Multiple Object Tracking.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, 2443–2452. [19] TY ‐ BOOK, J. N. Zaech, A. Liniger, D. Dai, M. Danelljan, and L. Van Gool. “Learnable Online Graph Representations for 3D Multi‐Object Tracking,” IEEE Robotics and Automation Letters, 2022 PY ‐ 2021/04/23. [20] L. Lin, H. Fan, Y. Xu, and H. Ling. “Swintrack: A Simple and Strong Baseline for Transformer Tracking,” arXiv preprint arXiv:2112.00995, 2021. [21] J. Pang, L. Qiu, X. Li, H. Chen, Q. Li, T. Dar‐ rell, and F. Yu: “Quasidense Similarity Learning for Multiple Object Tracking,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 164–173. 67
Journal of Automation, Mobile Robotics and Intelligent Systems
N◦ 1
2023
[22] F. Zeng, B. Dong, T. Wang, X. Zhang, and Y. Wei. “Motr: End‐to‐End Multiple‐Object Tracking with Transformer,” arXiv preprint arXiv:2105.03247, 2021.
[25] X. Zhang, X. Wang, and C. Gu. “Online Multi‐Object Tracking with Pedestrian Re‐Identi ication and Occlusion Processing,” The Visual Computer, vol. 37, no. 5, 2021, pp. 1089–1099.
[23] J.‐N. Zaech, A. Liniger, M. Danelljan, D. Dai, and L. Van Gool. “Adiabatic Quantum Computing for Multi Object Tracking,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8811–8822.
[26] K. Cho, and D. Cho. “Autonomous Driving Assistance with Dynamic Objects using Traf ic Surveillance Cameras,” Applied Sciences, vol. 12, no. 12, 2022, p. 6247.
[24] X. Han, Q. You, C. Wang, Z. Zhang, P. Chu, H. Hu, J. Wang, and Z. Liu. “Mmptrack: Large‐ Scale Densely Annotated Multi‐Camera Multi‐ ple People Tracking Benchmark,” arXiv preprint arXiv:2111.15157, 2021.
68
VOLUME 17,
[27] A. Cioppa, S. Giancola, A. Deliege, L. Kang, X. Zhou, Z. Cheng, B. Ghanem, and M. Van Droogenbroeck. “Soccernet‐Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3491– 3502.
VOLUME 17, N◦ 1 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
MODEL‐FREE SLIDING MODE CONTROL FOR A NONLINEAR TELEOPERATION SYSTEM WITH ACTUATOR DYNAMICS Submitted: 4th July 2022; accepted: 3rd January 2023
Henni Mansour Abdelwaheb, Kacimi Abderrahmane, Belaidi AEK DOI: 10.14313/JAMRIS/1‐2023/9 Abstract: Teleoperation robotic systems control, which enables humans to perform activities in remote situations, has become an extremely challenging field in recent decades. In this paper, a Model Free Proportional‐Derivative Slid‐ ing Mode Controller (MFPDSMC) is devoted to the syn‐ chronization problem of teleoperation systems subject to actuator dynamics, time‐varying delay, model uncer‐ tainty, and input interaction forces. For the first time, the teleoperation model used in this study combines actuator dynamics and manipulator models into a single equation, which improves model accuracy and brings it closer to the actual system than in prior studies. Further, the proposed control approach, called Free, involves the simple mea‐ surement of inputs and outputs to enhance the system’s performance without relying on any knowledge from the mathematical model. In addition, our strategy includes a Sliding Mode term with the MFPD term to increase system stability and attain excellent performance against external disturbances. Finally, using the Lyapunov func‐ tion under specified conditions, asymptotic stability is established, and simulation results are compared and provided to demonstrate the efficacy of the proposed strategy. Keywords: Model Free Sliding Mode Controller, Teleoper‐ ation robotic systems, Actuator dynamics, Time‐varying delay, Model uncertainty.
1. Introduction Manipulation tasks in hazardous, inaccessible, and extreme environments pose signi icant challenges for modern industrial technology [1–5], including space works engineering, drilling, robotic surgery, nuclear detection, undersea exploration, and even public health interventions such as physical distanc‐ ing restrictions to limit the spread and transmis‐ sion of the novel coronavirus [6]. Teleoperation sys‐ tems are a secure solution for overcoming these chal‐ lenges. Such systems are de ined as the intercon‐ nection of ive elements: a human operator exert‐ ing force on a local master manipulator coupled via a communication channel to a remote slave manip‐ ulator interacting with an environment to obtain a sense of telepresence via force feedback. Stability, syn‐ chronization, and transparency are the three funda‐ mental objectives for a teleoperation system operat‐ ing under numerous nonlinearities, uncertainties, and
data transmission delays. Many practical approaches have been proposed, and they can be broadly classi ied into model‐based methods and non‐model methods. Designing a model‐based controller necessitates a val‐ idated mathematical model that must incorporate all of the operating circumstances of the process, which poses speci ic challenges for practical model‐based controllers. Thus, model‐based control such as com‐ puted torque control [7], inverse dynamics control [8], model reference adaptive control [9], etc., faces many obstacles in achieving the desired performance due to the presence of nonlinearities, uncertainties, trans‐ mission delays, and non‐passive external interaction forces. In contrast, a non‐model controller does not require prior information about the system dynam‐ ics; it adjusts online with the manipulator’s unknown dynamics. That’s why non‐model controllers, such as the Proportional_Integral_Derivative controller (PID), are often suitable and more practical for the indus‐ trial system. However, such schemes are sensitive to parameter variations and external disturbances. They could yield poor performances when the system presents a vast operating domain. As a result, a more sophisticated and long‐lasting controller is necessary to improve the performance of the nonlinear teleoper‐ ation system. Several nonlinear approaches have been developed and proposed in the literature to cope with the outcomes of the above‐mentioned issues. Z. Chen et al. developed a least square adaptive algorithm for a robust control to meet the passiv‐ ity of teleoperation systems and to deal with the transparency‐stability trade‐off under time‐varying delays [10]. A novel Barrier Lyapunov Function associ‐ ated with an adaptive control algorithm is addressed for bilateral teleoperation systems to cope with out‐ put constrained issues and to achieve ixed‐time convergence performance under time delays, system uncertainties and external disturbances [11]. To guar‐ antee the system’s stability, Tong et al. [12] convert the power signals of the system using wave‐variable transformation. However, the system transparency performance can be largely decreased due to the issue of wave re lection [13]. The adaptive law in [14] deals with dynamic and kinematic uncertainties and ensur‐ ing the system’s stability and ef icient functionality, but under constant time delay. In [15], a proposed nonlinear adaptive fuzzy backstepping controller has been presented for master and slave robots to handle the nonlinearities and uncertainties. A new adaptive
2023 © Abdelwaheb et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
69
Journal of Automation, Mobile Robotics and Intelligent Systems
neural synchronization control was presented in [16] for bilateral teleoperation systems with time‐varying delay and unknown backlash‐like hysteresis. The interaction force issue between a surgical manipulator and the patient’s tissues has been tackled in [17] using a nonlinear disturbance observer (DOB) with a sliding mode controller. The experimental results show the effectiveness of such a method. Uti‐ lizing a velocity observer based on a novel nonsingular fast integral terminal sliding mode (NFITSM) surface, Yana et.al. employed NFITSM based inite‐time con‐ troller to cope with the synchronization problem of a teleoperation system in the presence of uncertain‐ ties [18]. Recently, a very interesting approach called Model‐ free control (MFC) that has been introduced by Fliess et al. [19] provides good results for practical processes. Such an algorithm which is based on an equiva‐ lent ultra‐local model of the system updated online using only input and output measurement con‐ tain normally proportional (P), proportional‐integral (PI), proportional‐derivative (PD), or proportional‐ integral‐derivative (PID) controllers and compensated terms for the estimation errors. The gains of the MFC controller can be tuned through the estimations of the uncertainties, which bring out better performance compared with the classical controller. Many schol‐ ars have successfully applied the MFC scheme to deal with uncertainties and external disturbances in many areas, essentially attitude control of a quadrotor [20], a two‐wheeled inverted pendulum [21], lapping‐wing lying robot [22], robotic exoskeleton [23], experimen‐ tal green‐houses [24], glycemia regulation of type‐ 1 diabetes [25], thermal processes [26], wheeled autonomous vehicles [27], twin‐rotor aerodynamic system [28] and more. To the best of our knowl‐ edge, there is no literature on the model‐free control for teleoperation systems. Furthermore, the major‐ ity of the literature cited above skipped the actuator dynamics and only considered the manipulator body dynamics. In practice, neglecting the dynamics of actu‐ ators might result in a loss of system performance or stability. For all these needs, this paper investigates a novel model‐free proportional‐derivative controller based on the sliding mode approach for teleoperating robotic systems, including actuator dynamics, model uncertainties, and time‐varying delay. The main fea‐ tures of this work are as follows: irst, the pro‐ posed scheme is completely model‐free: that is, it only requires the measurement of inputs and outputs to improve the system’s performance without depend‐ ing on any information from the mathematical model. Second, compared to previous studies, the teleoper‐ ation model presented in this paper performs bet‐ ter in terms of accuracy and reduction of unmodeled disturbances as it incorporates more details of the actual teleoperation system by taking into account the dynamics of the actuators. Third, to overcome the implementation problem, our controller was pre‐ sented as a PD controller, which is extremely useful 70
VOLUME 17,
N◦ 1
2023
in practice, especially for non‐linear systems such as teleoperation robotics. Finally, the stability analysis of the closed‐loop system is established using the Lyapunov function, and the simulation validation is presented to highlight the effectiveness of the pro‐ posed controller. The rest of the paper is structured as follows: In Section 2, which follows the introduction, the nonlinear teleoperation manipulators, including actuator dynamics, are modeled as well as properties that are involved throughout the paper. The proposed MFPDSMC controller with the stability analysis is explained in Section 3. In Section 4, simulation results are provided, followed by the conclusions in Section 5.
2. System Modeling and Problem Formulation A master‐slave teleoperation system is generally expressed in dynamic equations as follows Mm (qm )q̈m + Cm (qm , q̇m )q̇m + Gm = τm + τh Ms (qs )q̈s + Cs (qs , q̇s )q̇s + Gs = τe − τs (1) where i = m, s stands for the master/slave manipula‐ tor, respectively; qi , q̇i and q̈i are the position, velocity, and acceleration of the master and slave dynamic sys‐ tems respectively; Mi is the positive‐de inite inertia matrix; Ci is the matrix of centripetal and coriolis torque; Gi is the gravitational torque; τh and τe are the human‐operator torque and the environment torque, respectively. τm and τs are the control input of teleop‐ eration manipulators. Recently, there has been a lot of interesting research into the effect of actuator dynamics on the response of manipulator robots [29]. If we consider an armature DC motor as an actuator in each joint, it can be expressed as: Jai θ̈ai + Bai θ̇ai = τai − gr τi
(2)
The gear ratio gr relating the joint position qi and a motor shaft position θai is described as: gr =
θai qi
(3)
where τa is the motor torque; Jai = diag(Ja1 , Ja2 , . . . Jan ) is the moment of inertia matrix of the motor combined with the gearbox inertia and Bai = diag(Ba1 , Ba2 , . . . Ban ) represent the viscous friction matrix of the motor shaft. Including the actuator dynamic expression (2), the model expression (1) will be rewritten as: Mhm q̈m + Chm q̇m + Ghm = τam + gr τh Mhs q̈s + Chs q̇s + Ghs = gr τe − τas where
(4)
Mhm = Mm gr + Jam gr−1 Mhs = Ms gr − Jas gr−1 Chm = Cm gr + Bam gr−1 Chs = Cs gr − Bas gr−1 Ghi = Gi gr
The manipulator dynamics and the combined manipulator‐motor dynamics have the following properties:
Journal of Automation, Mobile Robotics and Intelligent Systems
Property 1 The inertia matrix Mh is bounded and positive de inite which means MhT = Mh ; mmin ∥qi ∥2 ≤ qiT Mh qi ≤ mmax ∥qi ∥2 .
Property 2 The robotic manipulator is a passive system which means the matrix 12 Ṁh − Ch is skew symmetric i.e., (
) 1 Ṁh − Ch q̇i = 0 2
(5)
3.1. The Proposed MFPDSMC Controller A Model‐Free Control is a nonlinear control in which the mathematical model of a system is replaced by an ultra‐local model equation with a small num‐ ber of parameters. Those parameters are updated just by the input–output information of the system. The expression of ultra‐local modelling is given by: y (n) = F + ατa
(6)
where ‐ y is the output of the plant. ‐ n is the order of time derivation of the output y (generally n is chosen equal to 1 or 2). ‐ F is the unknown part of all exogenous perturba‐ tions and unmodeled dynamics such as nonlineari‐ ties and uncertainties. ‐ τa is the control torque. ‐ α is an arbitrary constant parameter chosen such that y (n) and ατa have the same size. By taking n = 2. The estimate of F is de ined as follows: F̂ = ŷ (2) − ατa (7) where ŷ is the estimate of y. For the estimation of y, various strategies are used based on algebraic methods [30]. To avoid the alge‐ braic loop issues, we take a irst‐order derivative plus a low‐pass ilter to generate ŷ: H=
K1 s T1 s + 1
)2 (8)
The MFPDSMC control law for master and slave robots can be de ined as follows: ( ) 1 τam = (−F̂m + q̂¨s (t − T ) − Kp em − Kd ėm ) α −(KSm + µsign(Sm )) ( ) 1 τas = (−F̂s + q̂¨m (t − T ) − Kp es − Kd ės ) α −(KSs + µsign(Ss )) (9)
2023
em = qm (t) − qs (t − T )
(10)
es = qs (t) − qm (t − T )
(11)
Sm = ėm − λem
(12) (
)2
K1 s q̂¨m (t − T ) = q̈m (t − T ) T1 s + 1 ( )2 K1 s q̈s (t − T ) q̂¨s (t − T ) = T1 s + 1
(13)
(14)
Using Equation (7), the teleoperation system model can be rewritten as: q̈m (t) = Fm + ατam q̈s (t) = Fs + ατas
3. Controller Design
(
N◦ 1
where
Ss = ės − λes
where qi ∈ Rn×1 is any nonzero vector.
qi
VOLUME 17,
(15)
By making a good estimate of F , i.e., F̂i ⇒ Fi , combining Eqs. (16) and (10) yields: ëm + Kp em + Kd ėm + (KSm + µsign(Sm )) = 0 ës + Kp es + Kd ės + (KSs + µsign(Ss )) = 0 (16) Note that we end up with a linear differential equa‐ tion with a constant coef icient. The estimated term of F that manages the unknown part of all exogenous per‐ turbations and unmodeled dynamics makes it simple to tune Kp and Kd for satisfactory performance. This is a substantial advantage when contrasted with the classic PD. 3.2. Stability Analysis De ine the Lyapunov function V as: 1 2 1 S + S2 2 m 2 s The time derivative of V is: V =
(17)
V̇ = Sm Ṡm + Ss Ṡs
(18)
Introducing a state variable error such as: xm1 = em xm2 = ėm
(19)
xs1 = es xs2 = ės
(20)
The sliding surfaces are rewritten using the new state variables: Sm = xm2 + λxm1 Ss = xs2 + λxs1
(21)
The time derivative of Sm and Ss are: Ṡm = ẋm2 + λxm2 Ṡs = ẋs2 + λxs2
(22)
Since the same procedure is used to estimate Fi , q̈i , q̈m (t − T ) and q̈s (t − T ), we can de ine the estimation error eest as: eest_m = Fm − F̂m = q̈m − q̂¨m = q̈s (t − T ) − q̂¨s (t − T )
(23)
eest_s = Fs − F̂s = q̈s − q̂¨s = q̈m (t − T ) − q̂¨m (t − T )
(24) 71
Journal of Automation, Mobile Robotics and Intelligent Systems
Introducing (10) in (16), we get:
VOLUME 17,
with
q̈m = Fm − F̂m + q̂¨s (t − T ) − Kp em −Kd ėm − α(KSm + µsign(Sm )) q̈s = Fs − F̂s + q̂¨m (t − T ) − Kp es
−Kp xm1 − (Kd − λ)xm2 = 0 −Kp xs1 − (Kd − λ)xs2 = 0
(25)
p −( λ−K )t d
(34)
To guarantee xi1 ⇒ 0, we must have:
(26)
Kp >0 λ − Kd
(35)
and
(36)
Consequently
Replacing by (26) in (25), we get:
Kp > 0
ẋm2 = Fm − F̂m − (q̈s (t − T ) − q̂¨s (t − T )) −Kp em − Kd ėm − α(KSm + µsign(Sm )) ẋs2 = Fs − F̂s − (q̈m (t − T ) − q̂¨m (t − T )) −Kp es − Kd ės − α(KSs + µsign(Ss ))
λ > Kd
For the irst part of V̇ , V̇ < 0 if and only if ( ) µ α ̸= 0 and K > − γi
(37)
(27) From the statements (20), (21), (24), and (25), the expression (27) follows:
Case 2: If |Si | > γi
ẋm2 = −Kp xm1 − Kd xm2 − α(KSm + µsign(Sm ))
V̇ ≤ Sm [−Kp xm1 − Kd xm2 − αKSm + µα + λxm2 ] + Ss [−Kp xs1 − Kd xs2 − αKSs + µsα + λxs2 ]
ẋs2 = −Kp xs1 − Kd xs2 − α(KSs + µsign(Ss ))
(38)
(28) Considering (28), (22) can be rewritten as:
Additionally
Ṡm = −Kp xm1 − Kd xm2 − α(KSm + µsign(Sm )) + λxm2 Ṡs = −Kp xs1 − Kd xs2 − α(KSs + µsign(Ss ))
2 V̇ ≤ −αKSm − αKSs2
(29)
−Kp xm1 − Kd xm2 − µα + λxm2 = 0
V̇ = Sm [−Kp xm1 − Kd xm2 − α(KSm + µsign(Sm ))
To check Eq. (40), xi2 must be equal to:
−Kp xs1 − Kd xs2 − µα + λxs2 = 0
Using (29), V̇ can be expressed as:
+ λxm2 ] + Ss [−Kp xs1 − Kd xs2 − α(KSs + µsign(Ss )) + λxs2 ]
xi2 = (30)
Case 1: If |Si | ≤ γi With (γi > 0) is the boundary layer thickness of sign(Si ), then: [ V̇ ≤ Sm − Kp xm1 − Kd xm2 ] Sm − αKSm + µ + λxm2 γm [ + Ss − Kp xs1 − Kd xs2 − αKSs ] (31)
Yet [( V̇ ≤ −α
K+
µ γm
)
( ) ] µ 2 Sm + K+ Ss2 γs
Kp xi1 + µα λ − Kd
(40)
(41)
Then
Two cases are considered at this stage:
Ss + µs + λxs2 γs
(39)
with
+ λxs2
72
(33)
K
xi1 = e
We have the time derivative of xm2 and xs2 are:
ẋs2 = q̈s (t) − q̈m (t − T )
2023
Equation (33) is veri ied if
−Kd ės − α(KSs + µsign(Ss ))
ẋm2 = q̈m (t) − q̈s (t − T )
N◦ 1
(32)
Kp ̸= λ; ∀µ, α > 0; xi2 >
Kp xi1 + µα λ − Kd
(42)
4. Simulation Results In this section, simulation results are presented to validate the effectiveness of the proposed MFPDSMC controller for a teleoperation system consisting of a pair of 2‐DOF haptic manipulators. The simula‐ tion was performed using Simulink and the ixed‐step solver ode1 (Euler), with a sampling time of 0.001 seconds for 30 seconds. The parameters of the master‐ slave teleoperation system under actuator dynamics are chosen as follows: ( ) ( ) M11 M12 C11 C12 Mi = , Ci = , M21 M22 C21 C22 ( ) G1 Gi = G2
9)
The forward and backward time delays in the Journal of Automation, Mobile Robotics andrespectively) Intelligent Systems communication channel (Tm and Ts are chosen to be random variables with an upper bound equal to 0.4 seconds (Fig. 1).
0)
VOLUME 17,
2023
The MFPD controllers are as listed below: ( ) 1 τam = (−F̂m + q̂¨s (t − T ) − Kp em − Kd ėm )) α ( ) 1 τas = (−F̂s + q̂¨m (t − T ) − Kp es − Kd ės )) α
1)
where
(
2)
to MC fa on tep 01 the
N◦ 1
Kp2 =
Fig. 1. Forward and backward time delays Figure 1. Forward and backward time delays To highlight the performance of the proposed MFPDSMC controller, comparisons with the classical where Proportional-Derivative controller (PD) and the 2Proportional-Derivative 2 Model controller cos(q2 ) M11 (q) Free = m1 lc1 + m2 lc2 + m2 l12 + 2m2 l1 lc2 (MFPD) have been conducted. The PD controllers are: 2 M (q) = M (q) = m l + m l l cos(q ) 12
21
2 c2
2 1 c2
2
2 M22 (q) = m2 lc2
C11 (q, q̇) = −m2 l1 lc2 sin(q2 )q̇2
5
C12 (q, q̇) = −m2 l1 lc2 sin(q2 )(q̇1 + q̇2 )
) ( ) ( ) 15 0 3 0 90 , Kd2 = , α= 0 20 0 2 100
F̂m = q̂¨m − ατ̂am (t − 1) F̂s = q̂¨s − ατ̂as (t − 1) The proposed MFPDSMC controllers are given as follows: ( ) 1 τam = (−F̂m + q̂¨s (t − T ) − Kp em − Kd ėm ) α − (KSm + µsign(Sm )) ( ) 1 (−F̂s + q̂¨m (t − T ) − Kp es − Kd ės ) τas = α − (KSs + µsign(Ss ))
C21 (q, q̇) = m2 l1 lc2 sin(q2 )q̇1 C22 (q, q̇) = 0 G1 = m2 glc2 cos(q1 + q2 ) + m1 glc1 cos(q1 ) + m2 gl1 cos(q1 ) G1 = m2 glc2 cos(q1 + q2 ) with m1 = 3.55 Kg; m1 = 3.55 Kg; l1 = 205 mm; l2 = 210 mm; lc1 = 154.8 mm; lc2 = 105 mm;
The parameters for the controller are set as: ( ) ( ) ( ) 15 0 3 0 100 Kp = , Kd = ,α = , 0 20 0 2 100 ( ) ( ) ( ) 0.02 40 4 µ= ,K = ,λ = 0.02 4 4 The three controllers were used in two different scenarios: Scenario 1:
2
g = 9.81 m/s
For the actuator dynamics parameters, we have: 2 gr1 = 60; gr2 = 30; Ba1 = gr1 × 2 × 10−5 ; 2 Ba2 = gr2 × 1.3 × 10−5 2 2 Ja1 = gr1 × 3.7 × 10−5 ; Ja2 = gr2 × 1.47 × 10−4
The forward and backward time delays in the com‐ munication channel (Tm and Ts respectively) are cho‐ sen to be random variables with an upper bound equal to 0.4 seconds (Fig. 1). To highlight the performance of the proposed MFPDSMC controller, comparisons with the classical Proportional‐Derivative controller (PD) and the Model Free Proportional‐Derivative controller (MFPD) have been conducted. The PD controllers are:
In this instance, we consider the nominal parame‐ ters of the system model (4), the time‐varying delay depicted in Fig. 1, and the interaction torques between the human and the master manipulator and between the remote environment and the slave manipulator as follows [31, 32] (Fig. 2): τh = −Nm − Lm qm − Dm q̇m τe = Ns + Ls qs + Ds q̇s with
(
) ( ) 1.5 3 0 , L m = Ls = , 1.5 0 3 ( ) 6 0 Dm = Ds = 0 6 Nm = Ns =
τm = Kp1 em + Kd1 ėm τs = Kp1 es + Kd1 ės where
(
14 × 10−2 0
Kp1 = ( 14 × 10−2 Kd1 = 0
) 0 , 1
0 2 × 10−2
)
Figures 3 through 5 depict the simulation indings for this scenario. These igures demonstrate that for all three controllers, the slave manipulator exactly imi‐ tates the master’s trajectories, with negligible tracking errors. Clearly, the teleoperation system controlled by the MFPDSMC controller works more accurately, with a tracking error swing of about zero and less than 10−4 rad (Fig. 5.b), compared to the PD and MFPD 73
controlled by the MFPDSMC controller works system more tracking errors. Clearly, the teleoperation accurately, with a tracking error swing of works about zero controlled by the MFPDSMC controller Journal of Automation, Mobile Robotics and Intelligent more and less thanwith arad (Fig. 5.b), to theSystems PD accurately, tracking errorcompared swing of about zero and MFPD responses in Figs. 3.b and 4.b, where the and less than rad (Fig. 5.b), compared to the PD errors reachresponses the value of and MFPD in Figs. 3.brad. and 4.b, where the errors reach the value of
Tracking Error, (c) Control Torques and (d) Control Torques Errors VOLUME 17, N◦ 1 2023
rad.
as
sign( S m )) sign( S m )) gn( S s )) sign( S s )) Fig. 2. The human and the environment interaction Figure Thehuman humanand and the the environment torques Fig. 2.2.The environment interaction interaction
torques torques
ferent
fferent
meters delay meters tween delay tween etween tor as etween
ator as
Fig. 3. Simulation results with the PD controller for Scenario 1. (a) Position Trackingthe Error, (b) Position Fig. Fig. 3. Simulation results PD controller for for 4. Simulation resultswith with theand MFPD controller Tracking Error, (c) Control Torques (d) Control Figure 3. Simulation results with the PD controller for Scenario 1. (a) Position Tracking Error, (b) Position Scenario 1. (a) Position Tracking Error, (b) Position Torques Errors Scenario 1. (a) Position Tracking Error,and (b) Position Tracking Error, (c) (c) Control Torques (d) (d) Control Tracking Error, Control Torques and Control Torques Errors Tracking Error, (c) Control Torques and (d) Control Torques Errors
Torques Errors
6 6
Fig. 4.4.Simulation Figure Simulationresults resultswith withthe theMFPD MFPDcontroller controllerfor Fig. 5. Simulation results with the MFPDSMC controller Scenario 1. (a) Position Tracking Error, (b) Position for Scenario 1. (a) Position Tracking Error, (b) Position for Scenario 1. (a) Position Tracking Error, (b) Position Tracking Error, (c) Control Torques and (d) Control Tracking Error, (c) Control Torques and (d) Control Tracking Error, (c) Control Torques and (d) Control Torques Errors Torques Errors Torques Errors
In contrast, due to the estimation process of function
responses Figs. 3.bresponse and 4.b, of where the errors reach F, the in transient MFPDSMC is slightly the value 8 ×seconds 10−3 rad. slowerof(10 to reach the steady-state). Figs. 5(c)contrast, and 5(d) due show to thatthe the estimation master and slave control In process of signalsF,are close to response each other,ofwith a maximum function thevery transient MFPDSMC is difference lessseconds than 0.2to Nm. Thisthe demonstrates that slightly slowerof(10 reach steady‐state). the MFPDSMC approach is effective for achieving high 5(c) and 5(d) show that the master and slave Figures levels of transparency. control signals are very close to each other, with a Scenario 2: maximum difference than 0.2 Nm. Thiscontroller demon‐ To show how well of theless proposed MFPDSMC strates that the MFPDSMC approach is effective for can deal with changes in parameters and achieving high levels of transparency. disturbances, the same conditions as the last scenario 74
Trackin Torque
are looked at with the following uncertainties in the actuator parameters: 2 the MFPDSMC 5 2 5 Fig.g 5.Simulation results with 70; g 40; B g 3 10 ; B gcontroller r1 r2 a1 r1 a2 r 2 10 for Scenario 1. (a) Position Tracking Error, (b) Position J g 2 2.7 10 5 ; J g 2 3.47 10 4
Fig. 5.5.Simulation with the MFPDSMC controller Figure Simulationresults results with the MFPDSMC for Scenario 1. (a) Position Tracking Error, (b) Position controller for Scenario 1. (a) Position Tracking Error, Tracking Error, (c) Control Torques and (d) Control (b) Position Tracking Error, (c) Control Torques and Torques Errors (d) Control Torques Errors In contrast, due to the estimation process of function F, the transient response of MFPDSMC is slightly slower (10 seconds to reach the steady-state). Figs. 5(c) and 5(d) show that the master and slave control signals are very close to each other, with a maximum difference of less than 0.2 Nm. This demonstrates that the MFPDSMC approach is effective for achieving high levels of transparency. Scenario 2: To show how well the proposed MFPDSMC controller can deal with changes in parameters and disturbances, the same conditions as the last scenario are looked at with the following uncertainties in the Fig. 6. Simulation results with the PD controller for actuator parameters: Figure 6. Simulation results Scenario 2. (a) Position Tracking (b)2 Position 2 with the Error, 5 PD controller for g r1 70; g r 2 40; Ba1 g r1 3 10 ; Ba 2 g r 2 10 5 Tracking Error, (c) Control Torques (d) Control Scenario 2. (a) Position Tracking Error,and (b) Position 2 5 4 J a1 gErrors 10Control ; J a 2 Torques g r22 3.47 10 Torques rError, 1 2.7(c) Tracking and (d) Control
Torques Errors Scenario 2: show well the proposed MFPDSMC con‐ Figs.To6(a) and how 7(a) demonstrate the advantages of the troller can deal with changes in parameters and MFPD controller over the classical PD in terms ofdis‐ turbances, conditions as the the elimination last scenario robustness the and same performance, where of the uncertain partthe of the systemuncertainties by the functioninFthe are looked at with following provides aparameters: straightforward tuning for the MFPD gains. actuator
Then, Figure 9 verifies the stability of the closed loop 2 approach when MFPDSMC surface values gr1 the = 70; gr2 = 40; Ba1 = gr1 × 3 × 10−5zero, ; i.e., Si = 0 at about t = 4.8 seconds with i = m, s. In 2 −5 Ba2 = Fig. gr2 × 10 addition, 8.c shows the MFPDSMC control signals, 2 2 which appear the=closed-loop Ja1 = gr1 ×reasonable 2.7 × 10−5for ; Ja2 gr2 × 3.47 system, × 10−4 and Fig.8.d illustrates that the force feedback Fig. 6. Simulation results with the PD controllererrors for are Figures fairly Therefore, even though and 7(a) demonstrate themodel advan‐ Scenario 2. modest. (a)6(a) Position Tracking Error, (b) the Position parameters close toTorques theirover values, the slave can Tracking (c)not Control and Control tages ofError, theare MFPD controller the(d) classical PD in greatlyErrors the master, the human where operator Torques terms oftrack robustness andand performance, thewill elim‐ receive precise force feedback. This result validates ination of the uncertain part of the system by the func‐ the effectiveness of our control design.
tion F provides a straightforward tuning for the MFPD gains. Then, Figure 9 veri ies the stability of the closed loop when the MFPDSMC surface values approach zero, i.e.,and Si = 0 atdemonstrate about t = 4.8 seconds withof i =the m, s. In Figs. 6(a) 7(a) the advantages MFPD controller thethe classical PD incontrol terms signals, of shows MFPDSMC addition, Fig. 8.cover robustness and performance, where the elimination which appear reasonable for the closed‐loop system, ofand the Fig. uncertain part of thethat system by thefeedback function errors F 8.d illustrates the force provides a straightforward tuning even for the MFPD gains. are fairly modest. Therefore, though the model Then, Figure 9 verifies the stability of the closed loop parameters are not close to their values, the slave can when the MFPDSMC surface values approach zero, i.e., greatly track the master, and the human operator will Si = 0 at about t = 4.8 seconds with i = m, s. In receiveFig. precise forcethe feedback. This result validates addition, 8.c shows MFPDSMC control signals, the effectiveness of our control design. which appear reasonable for the closed-loop system, and illustrates that with the force feedback errorsfor Fig.Fig.8.d 7. Simulation results the MFPD controller areScenario fairly modest. Therefore, even though the 2. (a) Position Tracking Error, (b) model Position parameters are not(c)close to their values,and the slave can Tracking Error, Control Torques (d) Control greatly track the master, and the human operator will Torques Errors
Figs. 6( MFPD robust of the provid Then, F when t Si = 0 additio which and Fi are fai param greatly receive the effe
Fig. 7. Scenari Trackin Torque
roller sition ontrol
parameters are not close to their values, the slave can greatly track the master, and the human operator will Journal Automation, Robotics andresult Intelligent Systems receiveofprecise forceMobile feedback. This validates the effectiveness of our control design.
ction ghtly Figs. ntrol mum s that high
roller and nario n the eters: 5
Fig. 7. Simulation results with the MFPD controller for
Figure 7. Simulation results with the MFPD Scenario 2. (a) Position Tracking Error, (b)controller Position for Scenario 2. (a) Position Tracking Error, Tracking Error, (c) Control Torques and (b) (d) Position Control Tracking Error, (c) Control Torques and (d) Control Torques Errors Torques Errors
7
Fig. 8. Simulation results with the MFPDSMC controller
Figure Simulation results with MFPDSMC for Scenario 2. (a)results Position Tracking Error, (b)controller Position Fig. 8. 8. Simulation with thethe MFPDSMC controller for Scenario 2. (a) Position Tracking Error, Tracking Error, (c) Control Torques and (b) (d) Position Control for Scenario 2. (a) Position Tracking Error, (b) Position Tracking Error, (c) Control Torques and Torques Errors Tracking Error, (c) Control Torques and (d) Control (d) Control Torques Errors Torques Errors
VOLUME 17,
N◦ 1
2023
external disturbances was guaranteed by the sliding mode term, which drives the system states towards the sliding surface and, eventually, to equilibrium. Furthermore, by combining the existing teleopera‐ tion model with the actuator dynamics, the teleop‐ eration model described in this work is more chal‐ lenging and performs better in terms of accuracy and reduction of unmodeled disturbance. Finally, the sim‐ ulation results demonstrate the effectiveness of the proposed controller in achieving stability and trans‐ parency simultaneously and they verify the theory behind the controller design. AUTHORS Henni Mansour Abdelwaheb∗ – Laboratory Automation andin Systems Analysis and (LAAS), theofproposed controller achieving stability National Polytechnic School of Oran Algeria, transparency and theystability verify and thee‐mail: the proposed simultaneously controller in achieving abdelwahebhenni@gmail.com. theory behind the controller design. transparency simultaneously and they verify the Kacimi Abderrahmane – Institute of Industrial Secu‐ theory behind the controller design. Acknowledgements rity Maintenance of Oran, Algeria, e‐mail: kdjou‐ Acknowledgements jou@yahoo.fr. This work is supported by the Laboratory of Belaidi AEK – Laboratory of Automation and Systems Automation andsupported Systems Analysis of the This work (LAAS), is byPolytechnic the (LAAS) Laboratory of Analysis National School of Oran National Polytechnic SchoolAnalysis of Oran (LAAS) Maurice of Audin, Automation and Systems the Algeria, e‐mail: belaidiaek@gmail.com. Algeria.
National Polytechnic School of Oran Maurice Audin, ∗ Corresponding author Algeria. AUTHORS Henni Mansour Abdelwaheb*, Laboratory of AUTHORS Automation and Systems Analysis (LAAS), National ACKNOWLEDGEMENTS Henni Mansour Abdelwaheb*, Laboratory of Polytechnic and School of Oran Algeria, Automation Systems Analysis (LAAS), National This work is supported by the Laboratory of Automa‐ abdelwahebhenni@gmail.com Polytechnic School of Oran Algeria, tion and Systems Analysis Institute (LAAS) ofoftheIndustrial National Poly‐ Kacimi Abderrahmane, abdelwahebhenni@gmail.com technic School of Oran Maurice Audin, Algeria. Security Maintenance ofInstitute Oran, Algeria. Kacimi Abderrahmane, of Industrial kdjoujou@yahoo.fr Security Maintenance of Oran, Algeria. kdjoujou@yahoo.fr References Belaidi AEK, Laboratory of Automation and [1] Analysis K. A. Manocha, N. Pernalete, R. V. Dubey. “Vari‐ Systems National Belaidi AEK, (LAAS), Laboratory of Polytechnic AutomationSchool and able position mapping‐based assistance in tele‐ of Oran Algeria, Systems Analysisbelaidiaek@gmail.com (LAAS), National Polytechnic School operation for nuclear clean up”, In: Proceedings of Oran Algeria, belaidiaek@gmail.com
of the 2001 ICRA IEEE International Confer‐ *Corresponding Author ence on Robotics *Corresponding Author and Automation, pp. 374–379, (2001). doi: 10.1109/ROBOT.2001.932580. References [2] Manocha, L. F. Penin, Matsumoto, and S."Variable Wakabayashi. [1] K.A. N. K. Pernalete, R.V. Dubey, References
Fig. 9. The values of sliding mode surface at master and slave9.sides Fig. The values of sliding modemode surface at master and Figure 9. The values of sliding surface at master slaveslave sidessides and
5. Conclusion 5. Conclusion
5. In Conclusion this paper, a model-free proportional-derivative
sliding mode controller has proportional-derivative been proposed for a In thisthis paper, a amodel-free In paper, model‐free proportional‐derivative nonlinear teleoperation robotics system considering sliding mode controller has been proposed a a sliding mode controller has been proposedforfor actuator dynamics, time-varying various nonlinear teleoperation robotics delays, system and considering nonlinear teleoperation robotics system considering uncertainties. The main feature of this work is that actuator dynamics, time-varying delays, and various actuator dynamics, time‐varying delays, and various the derivationThe of the control laws doeswork not require uncertainties. main feature of this is that uncertainties. The main feature of this work isthe that any derivation knowledgeof of system since the the the control laws model does not require the derivation of the control laws does not require parameter change was managed by automatically any knowledge of the system model since the updating thechange control laws. Also,model the high performance any knowledge of the system the param‐ parameter was managed bysince automatically against external disturbances by the eter change managed bywas automatically updating updating the was control laws. Also, theguaranteed high performance sliding mode term, which the system by states against external disturbances wasperformance guaranteed the the control laws. Also, the drives high against towards the sliding surface and, eventually, to sliding mode term, which drives the system states equilibrium. by combining the existing towards theFurthermore, sliding surface and, eventually, to teleoperationFurthermore, model with the actuator dynamics, the equilibrium. by combining the existing teleoperation model model with described in this dynamics, work is more teleoperation the actuator the challenging and performs better in terms of accuracy
position mapping-based assistance in “Force re lection for time‐delayed teleoperation [1] K.A. Manocha, N. Pernalete, R.V. Dubey, "Variable teleoperation for nuclear clean up", In: of space robots”, In: Proceedings ICRA’00 position mapping-based assistance of in Proceedings of the ICRA IEEE teleoperation for 2001 nuclear cleanInternational up", IEEE International Conference on In: Robotics Conference on Automation, pp.(2000). 374– doi: Proceedings of Robotics the 2001and ICRA IEEE International and Automation, pp. 3120–3125, 379, 10.1109/ROBOT.2000.845143 (2001).on DOI:10.1109/ROBOT.2001.932580 Conference Robotics and Automation, pp. 374– [2] 379, L. F. (2001). Penin, DOI:10.1109/ROBOT.2001.932580 K. Matsumoto, and S. Wakabayashi, [3]F. R. J. Anderson, M. W.and Spong. “Bilateral control "Force reflection for time-delayed teleoperation [2] L. Penin, K. Matsumoto, S. Wakabayashi, of space robots", In: ICRA'00 IEEE of reflection teleoperators with timeofdelay”. Proceedings "Force forProceedings time-delayed teleoperation International Conference on ofRobotics and of the 1988 International Conference on of space robots", In:IEEE Proceedings ICRA'00 IEEE Automation, pp.3120–3125, (2000). International Conference on Robotics and China, Systems, Man, and Cybernetics, Beijing, DOI:10.1109/ROBOT.2000.845143 Automation, (2000). pp. 131–138,pp.3120–3125, (1988). doi: 10.1109/ICSMC. [3] DOI:10.1109/ROBOT.2000.845143 R.J. Anderson, M.W. Spong, "Bilateral control of 1988.754257. teleoperators time delay". Proceedings of the [3] R.J. Anderson,with M.W. Spong, "Bilateral control of [4] P.IEEE F. Hokayem, M. delay". W. Spong. “Bilateral teleoper‐ 1988 International Conference on Systems, teleoperators with time Proceedings of the an historical survey”. Automatica, 42(12), Man,ation: and Beijing, China, pp. 1988 IEEECybernetics, International Conference on 131-138, Systems, (1988). 10.1109/ICSMC.1988.754257. 2035‐2057, (2006). doi:China, 10.1016/j.automatica. Man, andDoi: Cybernetics, Beijing, pp. 131-138, [4] (1988). P.F. 2006.06.027. Hokayem, M.W. Spong, "Bilateral Doi: 10.1109/ICSMC.1988.754257. teleoperation: an historical [4] P.F. Hokayem, M.W. survey". Spong, Automatica, "Bilateral [5] I. G. Polushin, P. X. Liu,survey". and C. H. Lung. “A force‐ 42(12), 2035-2057, (2006). teleoperation: an historical Automatica, re lection algorithm for improved transparency https://doi.org/10.1016/j.automatica.2006.06.02 42(12), 2035-2057, (2006). 7. in bilateral teleoperation with communication https://doi.org/10.1016/j.automatica.2006.06.02 [5] 7. I. G. Polushin, P. X. Liu, and C. H. Lung, "A forcealgorithm for improved in [5] I.reflection G. Polushin, P. X. Liu, and C. H. transparency Lung, "A force75 bilateral algorithm teleoperation with transparency communication reflection for improved in delay", IEEE/ASME Trans. Mechatronics 12: 3, pp. bilateral teleoperation with communication
Journal of Automation, Mobile Robotics and Intelligent Systems
delay”, IEEE/ASME Trans. Mechatronics 12: 3, pp. 361–374, Jun. 2007. [6] G. Yang, H. Lv, Z. Zhang et al. “Keep healthcare workers safe: Application of teleoperated robot in isolation ward for COVID‐19 prevention and control”. Chin. J. Mech. Eng: 33, 47, (2020). doi: 10.1186/s10033‐020‐00464‐0. [7] T. Abut and S. Soyguder. “Real‐time control of bilateral teleoperation system with adaptive computed torque method”, Industrial Robot 44: 3, pp. 299–311, (2017). doi: 10.1108/IR‐09‐ 2016‐0245. [8] X. Liu and M. Tavakoli. “Inverse dynamics‐ based adaptive control of nonlinear bilateral teleoperation systems” 2011 IEEE International Conference on Robotics and Automation, Shang‐ hai International Conference Center, Shanghai, China, (2011).
N◦ 1
2023
[17] S. Hao, L. Hu and P. X. Liu. “Sliding mode control for a surgical teleoperation system via a dis‐ turbance observer”, IEEE Access, 7, pp. 43383– 43393, (2019), doi: 10.1109/ACCESS.2019.290 1899. [18] Y. Yang, C. Hua, J. Li, X. Guan. “Finite‐time output‐ feedback synchronization control for bilateral teleoperation system via neural networks”, Information Sciences. 406–407. 216‐233, (2017). doi: 10.1016/j.ins.2017.04.034. [19] M. Fliess, C. Join, M. Mboup, H. Sira‐Ramirez. “Vers une commande multivariable sans mod‐ ele”, arXiv preprint math/0603155, (2006). [20] H. Wang, X. Ye, Y. Tian, N. Christov. “Attitude con‐ trol of a quadrotor using model free based sliding model controller”, In: Proc. 2015 20th Interna‐ tional Conference on Control Systems and Sci‐ ence, Bucharest, Romania, pp. 149–154, (2015).
[9] K. Hosseini‐Suny et al. “Model reference adap‐ tive control design for a teleoperation system with output prediction”, Journal of Intelligent & Robotic Systems, 59, 319–339, (2010).
[21] C. Y. Yu, J. L. Wu. “Intelligent PID control for two‐wheeled inverted pendulums”, IEEE Inter‐ national Conference on System Science and Engi‐ neering, pp. 1–4, (2016).
[10] Z. Chen, Y. Pan, J. Gu. “A novel adaptive robust control architecture for bilateral teleoperation systems under time‐varying delays”, Interna‐ tional Journal of Robust and Nonlinear Control, 25: 17, pp. 3349–3366, (2015).
[22] A. N. Chand, M. Kawanishi and T. Narikiyo. “Non‐ linear model‐free control of lapping wing lying robot using iPID”, IEEE International Conference on Robotics and Automation, pp. 16–21, (2016).
[11] Z. Wang, Y. Sun, B. Lianga. “Synchronization con‐ trol for bilateral teleoperation system with posi‐ tion error constraints: A ixed‐time approach”, ISA Transactions 93, pp. 125–136, (2019). [12] M. Tong, Y. Pan, Z. Li, and W. Lin. “Valid data based normalized crosscorrelation (VDNCC) for topography identi ication”, Neurocomputing, 308, pp. 184–193, (2018). [13] X. Yang, C.‐C. Hua, J. Yan, and X.‐P. Guan. “A new master‐slave torque design for teleopera‐ tion system by T‐S fuzzy approach”, IEEE Trans‐ actions on Control Systems Technology, 23(4), 1611–1619, (2014). [14] Y.‐C. Liu and M.‐H. Khong. “Adaptive control for nonlinear teleoperators with uncertain kinemat‐ ics and dynamics”, IEEE/ASME Transactions on Mechatronics 20: 5, pp. 2550–2562, (2015). [15] Z. Chen, F. Huang, C. Yang and B. Yao. “Adaptive fuzzy backstepping control for stable nonlinear bilateral teleoperation manipulators with enhanced transparency performance”, IEEE Transactions on Industrial Electronics 67: 1, pp. 746–756, (2020), doi: 10.1109/TIE.2019.2898587. [16] H. Wang, P. X. Liu and S. Liu. “Adaptive neural synchronization control for bilateral teleoperation systems with time delay and backlash‐like hysteresis”, IEEE Transactions on Cybernetics 47: 10, pp. 3018‐3026, (2017), doi: 10.1109/TCYB.2016.2644656. 76
VOLUME 17,
[23] X. Wang, X. Li, J. Wang, X. Fang, X. Zhu. “Data‐ driven model‐free adaptive sliding mode con‐ trol for the multi degree‐of‐freedom robotic exoskeleton”, Information Sciences. 327, 246– 257, (2016). [24] F. Lafont, J. Balmat, N. Passel, M. Fliess. “A model‐ free control strategy for an experimental green house with an application to fault accommoda‐ tion”, Computers and Electronics in Agriculture, 110, 139–149, (2015). [25] T. M. Ridha and C. H. Moog. “Model free control of type‐1 diabetes: A fasting‐phase study”, IFAC‐ PapersOnLine. 48:20, pp. 76–81, (2015). [26] F. J. Carrillo, F. Rotella. “Some contributions to estimation for mode‐free control”, Proceedings of the 17th IFAC Symposium on System Identi‐ ication, Beijing, China, pp. 19–21, (2015). [27] B. Andrea‐Novel, L. Menhour, M. Fliess, H. Mounier. “Some remarks on wheeled autonomous vehicles and the evolution of their control design”, IFAC‐PapersOnLine, 49(15), 199–204. (2016). [28] R. C. Roman, M. B. Radac, R. E. Precup, E. M. Petriu. “Data‐driven optimal model‐free control of twin rotor aerodynamic systems”, IEEE Inter‐ national Conference on Industrial Technology (ICIT) (pp. 161–166), Seville, Spain, (2015). [29] N. Adhikary, C. Mahanta. “Sliding mode control of position commanded robot manipulators”. Con‐ trol Engineering Practice, 81, 183–198, (2018).
Journal of Automation, Mobile Robotics and Intelligent Systems
[30] T. Kara, A. H. Mary. “Adaptive PD‐SMC for nonlin‐ ear robotic manipulator tracking control”, Stud‐ ies in Informatics and Control, 26(1), 49–58, (2017). doi: 10.24846/v26i1y201706.
VOLUME 17,
N◦ 1
2023
[32] C. C. Hua, X. P. Liu. “Delay‐dependent stability criteria of teleoperation systems with asymmet‐ ric time‐varying delays”, IEEE Transactions on Robotics, 26(5), 925–932, (2010).
[31] C. Hua, P. X. Liu, H. Wang. “Convergence analy‐ sis of teleoperation systems with unsymmetric time‐varying delays”, IEEE International Work‐ shop on Haptic Audio Visual Environments and Games (pp. 65–69), (2008).
77