WWW.JAMRIS.ORG
Indexed in SCOPUS
pISSN 1897-8649 (PRINT)/eISSN 2080-2145 (ONLINE)
VOLUME 17, N° 3, 2023
Journal of Automation, Mobile Robotics and Intelligent Systems A peer-reviewed quarterly focusing on new achievements in the following fields: • automation • systems and control • autonomous systems • multiagent systems • decision-making and decision support • • robotics • mechatronics • data sciences • new computing paradigms • Editor-in-Chief
Typesetting
Janusz Kacprzyk (Polish Academy of Sciences, Łukasiewicz-PIAP, Poland)
SCIENDO, www.sciendo.com
Advisory Board
Webmaster
Dimitar Filev (Research & Advenced Engineering, Ford Motor Company, USA) Kaoru Hirota (Tokyo Institute of Technology, Japan) Witold Pedrycz (ECERF, University of Alberta, Canada)
TOMP, www.tomp.pl
Editorial Office
Co-Editors Roman Szewczyk (Łukasiewicz-PIAP, Warsaw University of Technology, Poland) Oscar Castillo (Tijuana Institute of Technology, Mexico) Marek Zaremba (University of Quebec, Canada)
Executive Editor Katarzyna Rzeplinska-Rykała, e-mail: office@jamris.org (Łukasiewicz-PIAP, Poland)
ŁUKASIEWICZ Research Network – Industrial Research Institute for Automation and Measurements PIAP Al. Jerozolimskie 202, 02-486 Warsaw, Poland (www.jamris.org) tel. +48-22-8740109, e-mail: office@jamris.org The reference version of the journal is e-version. Printed in 100 copies. Articles are reviewed, excluding advertisements and descriptions of products. Papers published currently are available for non-commercial use under the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0) license. Details are available at: https://www.jamris.org/index.php/JAMRIS/ LicenseToPublish
Associate Editor Piotr Skrzypczyński (Poznań University of Technology, Poland)
Statistical Editor Małgorzata Kaliczyńska (Łukasiewicz-PIAP, Poland)
Editorial Board: Chairman – Janusz Kacprzyk (Polish Academy of Sciences, Łukasiewicz-PIAP, Poland) Plamen Angelov (Lancaster University, UK) Adam Borkowski (Polish Academy of Sciences, Poland) Wolfgang Borutzky (Fachhochschule Bonn-Rhein-Sieg, Germany) Bice Cavallo (University of Naples Federico II, Italy) Chin Chen Chang (Feng Chia University, Taiwan) Jorge Manuel Miranda Dias (University of Coimbra, Portugal) Andries Engelbrecht ( University of Stellenbosch, Republic of South Africa) Pablo Estévez (University of Chile) Bogdan Gabrys (Bournemouth University, UK) Fernando Gomide (University of Campinas, Brazil) Aboul Ella Hassanien (Cairo University, Egypt) Joachim Hertzberg (Osnabrück University, Germany) Tadeusz Kaczorek (Białystok University of Technology, Poland) Nikola Kasabov (Auckland University of Technology, New Zealand) Marian P. Kaźmierkowski (Warsaw University of Technology, Poland) Laszlo T. Kóczy (Szechenyi Istvan University, Gyor and Budapest University of Technology and Economics, Hungary) Józef Korbicz (University of Zielona Góra, Poland) Eckart Kramer (Fachhochschule Eberswalde, Germany) Rudolf Kruse (Otto-von-Guericke-Universität, Germany) Ching-Teng Lin (National Chiao-Tung University, Taiwan) Piotr Kulczycki (AGH University of Science and Technology, Poland) Andrew Kusiak (University of Iowa, USA) Mark Last (Ben-Gurion University, Israel) Anthony Maciejewski (Colorado State University, USA)
Krzysztof Malinowski (Warsaw University of Technology, Poland) Andrzej Masłowski (Warsaw University of Technology, Poland) Patricia Melin (Tijuana Institute of Technology, Mexico) Fazel Naghdy (University of Wollongong, Australia) Zbigniew Nahorski (Polish Academy of Sciences, Poland) Nadia Nedjah (State University of Rio de Janeiro, Brazil) Dmitry A. Novikov (Institute of Control Sciences, Russian Academy of Sciences, Russia) Duc Truong Pham (Birmingham University, UK) Lech Polkowski (University of Warmia and Mazury, Poland) Alain Pruski (University of Metz, France) Rita Ribeiro (UNINOVA, Instituto de Desenvolvimento de Novas Tecnologias, Portugal) Imre Rudas (Óbuda University, Hungary) Leszek Rutkowski (Czestochowa University of Technology, Poland) Alessandro Saffiotti (Örebro University, Sweden) Klaus Schilling (Julius-Maximilians-University Wuerzburg, Germany) Vassil Sgurev (Bulgarian Academy of Sciences, Department of Intelligent Systems, Bulgaria) Helena Szczerbicka (Leibniz Universität, Germany) Ryszard Tadeusiewicz (AGH University of Science and Technology, Poland) Stanisław Tarasiewicz (University of Laval, Canada) Piotr Tatjewski (Warsaw University of Technology, Poland) Rene Wamkeue (University of Quebec, Canada) Janusz Zalewski (Florida Gulf Coast University, USA) Teresa Zielińska (Warsaw University of Technology, Poland)
Publisher:
Copyright © 2023 by Łukasiewicz Research Network - Industrial Research Institute for Automation and Measurements PIAP All rights reserved
i
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17, N˚3, 2023 DOI: 10.14313/JAMRIS/3-2023
Contents 1
46
Skeleton‐Based Human Action/Interaction Classification in Sparse Image Sequences Włodzimierz Kasprzak, Paweł Piwowarski DOI: 10.14313/JAMRIS/3‐2023/18
Comparison of Curvilinear Parametrization Comparison of Curvilinear Parametrization in the Path Following Task Filip Dyba, Alicja Mazur Doi: 10.14313/JAMRIS/3‐2023/22
15
Adaptive and Robust Following of 3D Paths by a Holonomic Manipulator Alicja Mazur, Mirela Kaczmarek DOI: 10.14313/JAMRIS/3‐2023/23
Multimodal Robot Programming Interface Based on RGB-D Perception and Neural Scene Understanding Modules Bartłomiej Kulecki DOI: 10.14313/JAMRIS/3‐2023/20
Parameter Identification of Space Manipulator’s Flexible Joint Mateusz Wojtunik, Fatina Liliana Basmadji, Grzegorz Granosik, Karol Seweryn DOI: 10.14313/JAMRIS/3‐2023/24
29
38
Singularity‐Robust Inverse Kinematics for Serial manipulators Ignacy Dulęba DOI: 10.14313/JAMRIS/3‐2023/21
ii
65
Improved Competitive Neural Network for Classification of Human Postures Based on Data from RGB-D Sensors Vibekananda Dutta, Jakub Cydejko, Teresa Zielińska DOI: 10.14313/JAMRIS/3‐2023/19
78
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
INTRODUCTION
Human-robot interaction and advanced robot control Since 2011 once every two years Journal of Automation, Mobile Robotics and Intelligent Systems has published a regular issue devoted to a particular aspect of robotics research conducted in Poland. The papers composing each such issue have been selected by the Program Committee of the National Conference on Robotics organized by the Chair of Cybernetics and Robotics, Faculty of Electronics, Fotonics and Microsystems of Wrocław University of Science and Technology. As due to the COVID pandemy the 2020 conference was postponed by two years, the associated issue of JAMRIS is also delayed by two years. The selection of papers for all of the mentioned issues, including this one, has been based on the signi icance of the research results conveyed in the papers published in the conference proceedings in Polish, and necessarily presented and discussed during these conferences. These papers contain most insightful achievements that had been attained during the last years preceding every of those conferences. It should be stressed that the papers gathered in those issues of JAMRIS have been by no means a simple translation into English of the conference papers. The results reported have been updated, described more comprehensively and in a wider context. The new papers have been subjected to the regular JAMRIS review procedure. Gratitude should be expressed to all of the reviewers who provided in depth comments enabling many clari ications and overall improvements of the papers. The papers published in this issue of JAMRIS are the results of research presented at the 16‐th National Conference on Robotics held in Trzebieszowice from the 31st of August till the 2nd of September 2022. As the title of this editorial suggests, the recent robotics research in Poland has mainly concentrated on human‐robot interaction and robot control.
Human‐Robot Interaction The papers devoted to human‐robot interaction concentrate on the anticipation of the actions of humans, so that robots can take that information into account while executing their tasks or during the process of their programming. All of the papers in that group start with visual perception of the human and employ neural networks to categorize the actions performed by the observed humans. This is the basis for robot decision making. The paper “Skeleton‐Based Human Action/Interaction Classi ication in Sparse Image Sequences”, by Włodzimierz Kasprzak and Paweł Piwowarski, reports on the classi ication of human activities based on the data obtained in video images. The presented algorithm estimates the location of human skeletons from the video frames. Single person actions and interaction between two persons are considered. The estimation is done by a neural network. The paper focuses on tracking skeleton motion. The tracking ability is signi icantly improved by processing selected features rather than using raw data. As not all video frames are used in the classi ication process the resulting algorithm can be run on devices that do not provide high computational power. The article “Improved Competitive Neural Network for Classi ication of Human Postures Based on Data from RGB‐D Sensors”, authored by Vibekananda Dutta, Jakub Cydejko and Teresa Zielińska, delves into the problem of recognising human postures. This research is motivated by the requirements of robots collaborating with humans. The robot has to anticipate human behaviour thus the information about human posture is necessary. Usually such information is obtained by using cameras tracking markers. However here the authors use marker‐less motion cap‐ ture systems and investigate whether they provide suf icient data to recognize human postures in a side view. To reduce the cost later they substitute the motion capture system by RGB‐D cameras. Posture classi ication is based on neural network processing of images. Another problem involving human‐robot interaction is considered in “Multimodal Robot Programming Interface Based on RGB‐D Perception and Neural Scene Understanding Modules” , written by Bartłomiej Kulecki. Here RGB‐D cameras and neural networks are employed to program manipulation tasks. The system detects objects and the oper‐ ator’s hand inferring the intensions of the human. Voice commands supplement gesture recognition in programming the task at hand. This research falls into the category of programming by demonstration, which is the most sought for form of programming robots, alas the most dif icult to accomplish.
iii
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Robot Control The batch of papers devoted to robot control concentrates on the solution of problems encountered while con‐ trolling manipulators in certain speci ic situations. Thus, avoidance of kinematic singularities, approach to a preset path and following it, as well as manipulator model parameter identi ication are discussed. The paper “Singularity‐Robust Inverse Kinematics for Serial Manipulators”, authored by Ignacy Dulȩba, deals with solving the inverse kinematics problem (IKP) near a singularity. It is shown that singularity avoidance can be performed in two different ways: either via properly modi ied manipulability matrix or by prohibiting the decrease of the minimal singular value below a certain threshold. Solution of IKP is generally dif icult especially for serial manipulators, but dealing with a singularity is extremely dif icult. The presented discussion is theoretical, however it has a fundamental impact on the implementation of controllers, as the speci ication of trajectories in operational space is preferred to the expression of those trajectories in con iguration space, for practical reasons. Thus the solution of IKP is a necessity, but then avoidance of singularities is a prerequisite. The article “Comparison of Curvilinear Parametrization Methods and Avoidance of Orthogonal Singularities in the Path Following Task”, presented by Filip Dyba and Alicja Mazur, investigates Serret–Frenet and Bishop curvi‐ linear parametrizations of paths to be followed by a manipulator. In the path following task both the orthogonal parameterisation of the path and the dynamics of the manipulator are taken into account. Both a fully known and partially known dynamics model are considered. Theoretical considerations are validated by simulation experiments conducted using a holonomic stationary manipulator. In some cases the singularity of the orthogonal projection of the manipulator end‐effector on the given path is located directly on the path. The paper proposes a solution of this problem. The paper “Adaptive and Robust Following of 3D Paths by a Holonomic Manipulator”, by Alicja Mazur and Mirela Kaczmarek, tackles the problem of following 3D paths by a holonomic serial manipulator with parametric or struc‐ tural uncertainty in the dynamics model. The path to be followed is parameterized orthogonally to the Serret‐Frenet frame which moves along the prescribed 3D curve. The theoretical considerations assume that the distance between the desired path and the one realised by the manipulator end‐effector is non zero. The theory has been validated by simulation. The paper “Parameter Identi ication of Space Manipulator’s Flexible Joint”, written by Mateusz Wojtunik, Fatina Liliana Basmadji, Grzegorz Granosik and Karol Seweryn, focuses on debris removal from Earth orbits by a manip‐ ulator attached to a satellite. Taking into account the number of satellites that have been launched, some of which have disintegrated, this research is of great practical value. To this end modelling and identi ication of satellite and manipulator dynamics is of utmost importance. As in the case of space manipulators their weight is crucial, so lexible designs are preferred. In the considered case lexible joint manipulator is analyzed. Both numerical simulations and those conducted in a microgravity experiments were used to validate the identi ication procedure. All of the subjects of the papers composing this issue of JAMRIS, brie ly characterized above, are the topics of current deliberations of the robotics community conducting research concerned with robot perception and control. Each of the papers gives a thorough insight into a particular problem, providing its formulation, background, and derivation of its solution. We hope that this selection of papers will be useful both to researchers and practitioners involved in diverse aspects of robotics.
Editors: Cezary Zieliński Warsaw University of Technology, Faculty of Electronics and Information Technology Alicja Mazur Wrocław University of Science and Technology, Faculty of Electronics, Fotonics and Microsystems
iv
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
SKELETON‐BASED HUMAN ACTION/INTERACTION CLASSIFICATION IN SPARSE IMAGE SEQUENCES Submitted: 14th February 2023; accepted: 16th June 2023
Włodzimierz Kasprzak, Paweł Piwowarski DOI: 10.14313/JAMRIS/3‐2023/18 Abstract: Research results on human activity classification in video are described, based on initial human skeleton estimation in selected video frames. Simple, homogeneous activ‐ ities, limited to single person actions and two‐person interactions, are considered. The initial skeleton data is estimated in selected video frames by software tools, like “OpenPose” or “HRNet”. Main contributions of pre‐ sented work are the steps of “skeleton tracking and cor‐ recting” and “relational feature extraction”. It is shown that this feature engineering step significantly increases the classification accuracy compared to the case of raw skeleton data processing. Regarding the final neural net‐ work encoder‐classifier, two different architectures are designed and evaluated. The first solution is a lightweight multilayer perceptron (MLP) network, implementing the idea of a “mixture of pose experts”. Several pose clas‐ sifiers (experts) are trained on different time periods (snapshots) of visual actions/interactions, while the final classification is a time‐related pooling of weighted expert classifications. All pose experts share a common deep encoding network. The second (middle weight) solution is based on a “long short‐term memory” (LSTM) network. Both solutions are trained and tested on the well‐known NTU RGB+D dataset, although only 2D data are used. Our results show comparable performance with some of the best reported LSTM‐, Graph Convolutional Network‐ (GCN), and Convolutional Neural Network‐based classi‐ fiers for this dataset. We conclude that, by reducing the noise of skeleton data, highly successful lightweight‐ and midweight‐models for the recognition of brief activities in image sequences can be achieved. Keywords: Action classification, Skeleton features, 2‐person interactions, Mixture of experts, Video analysis
1. Introduction Human activity recognition has recently caught the attention of the computer vision community since it drives real‐world applications that make our life bet‐ ter and safer, such as human‐computer interaction in robotics and gaming, video surveillance, and social activity recognition [1]. For example, new robotic applications try to predict human activity patterns in order to let the robot early infer when a speci ic collab‐ orative operation will be requested by the human [2, 3]. In video surveillance, human activity classi ication
can be integrated with probabilistic prediction mod‐ els, in order to infer the ongoing activity [4]. Ambient assited living technologies allow the recognition of a human’s daily living activity in order to take care of dependent people [5]. The computer vision approach to human activity recognition in video clips or video streams is typi‐ cally understood as the detection and classi ication of brief, homogeneous single‐person actions and two‐ person interactions. A longer video may contain vari‐ ous actions that eventually are parts of more complex human activities. An action or interaction is decom‐ posed in time into human poses, being recognized in single frames. In early solutions, hand‐designed features like edges, contours, Scale‐Invariant Feature Transform (SIFT), and Histogram of Oriented Gradients (HOG) have usually been used for detection and localization of human body parts or key points in the image [6]. More recently, neural network‐based solutions were successfully proposed, e.g., based on Deep Neural Net‐ works (DNN) [7], especially Convolutional Neural Net‐ works (CNN), LSTMs, and Graph CNNs [8], as they have the capability automatically to learn rich semantic and discriminative features. Furthermore, DNNs have an ability to learn both spatial and temporal information from signals and can effectively model scale‐invariant features as well. The approaches to vision‐based human activity recognition can be divided into two main categories: activity recognition directly in video data [9] or skeleton‐based methods [10], where the 2D or 3D human skeletons are detected irst, sometimes already by specialized devices, like the Microsoft Kinect. The skeleton‐based methods compensate some of the drawbacks of vision‐based methods, such as assuring the privacy of participants and reducing the scene’s light sensitivity. Some popular solutions to human pose estimation (i.e., the detection and localization of humans in images) can be mentioned: OpenPose [11], DeepPose [12], and DeeperCut [13]. It must be noticed, that the term “pose estimation” is commonly used in image and video analysis literature to refer to a result of semantic image segmentation models. Such models are trained to detect and classify objects, and to estimate by regression methods the image locations of object parts. Although the human activity classi ication domain has gained noticeable improvements in recent years,
2023 © Kasprzak and Piwowarski. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
1
Journal of Automation, Mobile Robotics and Intelligent Systems
it has still been facing many challenges in practice, e.g. occlusions, low resolutions, different view‐points, non‐rigid deformations, and intra‐class variability in shape [7]. In this work, we deal with analysis of brief video clips or streams, that contain single‐ person actions and two‐person interactions, assum‐ ing the existence of human skeleton data for selected video frames. After analysing the recent alternatives [14–16], we identi ied their main trends: (1) a smart processing of skeleton data for extracting meaning‐ ful information and cancelation of noisy data (e.g., relational networks); (2) designing light‐weight and middle‐weight solutions instead of heavy‐weight net‐ works by employing background knowledge, e.g., using graph CNNs instead of CNNs and LSTMs, or CNNs with 2‐D kernels instead of 3‐D CNNs. The aim of our work was to create an ef icient and practical tool for brief human action and interaction recognition in video data. Thus, we decided to propose light‐weight deep network models, which operate on strong relational features (extracted in an application‐ aware skeleton processing step). For a solid veri ica‐ tion of our contribution, a fair comparison with other known solutions, we decided to train and test the proposed solutions on a well known annonated video dataset (the NTU RGB+D). For the two considered problems (action‐ and interaction classi ication), we propose two solutions – one lightweight model, based on feedforward MLPs, and one middle‐weight model, based on LSTM net‐ work. In irst case, we took the idea of extending pose classi ication in still images to activity classi ication in video by applying a new version of the well‐known pattern classi ication approach, called “mixture of experts” [17]. But instead of identifying subdomains or learning different weights for expert classi iers in subdomains, the experts are distributed along the time axis. Every expert is responsible for classi ica‐ tion of frames belonging to different time periods of an activity process, e.g., start, initial, midterm, inal or end period. The inal classi ication is obtained by a weighted evaluation of class likelihoods of all the expert pose classi iers. With the second proposed solution, an LSTM model, and with the application‐ aware feature engineering step [18], we try to com‐ pete with the best performing methods in this ield, which use Graph CNNs or 3D CNNs. There are four remaining sections of this work. Section 2 describes recent approaches in human action and ‐interaction recognition. Our solutions, the feature engineering step and the two deep network models – one based on ANN pose experts and one on an LSTM network – are introduced in Section 3. Next, in Section 4, results of experiments are described and conclusions are drawn. Finally, in Section 5, we summarize our contribution to the subject.
2. Related Work Typically, human activity recognition in images and video requires irst a detection of human body parts or key‐points of a human skeleton. The 2
VOLUME 17,
N∘ 3
2023
skeleton‐based methods compensate some of the drawbacks of vision‐based methods, such as assuring the privacy of persons and reducing the scene’s light sensitivity. They also limit the sensitivity to clothes, hair, and other person‐speci ic features. 2.1. Human Pose Estimation Typically, a human’s pose is represented by the localisation of image features, key points or body parts, expressed in camera coordinates. In the past, mainly hand‐crafted features have been used, such as edges, contours, Scale‐Invariant Feature Transform (SIFT) or Histogram of Oriented Gradients (HOG). However, these approaches have produced modest performance when it comes to accurately localizing human body parts [19]. With the development of Convolutional Neural Networks (CNNs), the perfor‐ mance in solving human pose estimation problems has improved constantly and been signi icantly higher than the traditional methods [14]. There are three fundamental architectures, AlexNet [20], VGG [21], and ResNet [22], which have been employed as the backbone architecture for many human pose estimation studies [23]. Released in 2012, AlexNet has been considered one of the backbone architectures for many computer vision models. The DeepPose software employed AlexNet for estimating human poses [12]. Popular works in pose estimation, OpenPose [11], and human parts detection, Faster RCNN [24], have used VGG and achieved state‐of‐the‐art performance in visual human estimation. After the release of ResNet, many works on human pose estimation have applied it as a backbone (e.g., [13, 25]). 2.2. Human Action and Interaction Recognition In the machine learning literature it is often said that the skeleton data consists of “joints” and “limbs” (or “bones”). We must admit, that the terms “joints“ and “bones” have no direct correspondence to human body joints and bones. In order to clarify the termi‐ nology, we should rather call them “key points of the skeleton” and “skeletal segments”, accordingly. The vast majority of research on human action and interaction recognition is based on the use of arti icial neural networks. However, initially, classical approaches have also been tried, such as the SVM (e.g. [26, 27]). Yan et al. [28] used multiple features, like a “bag of interest points” and a “histogram of interest point locations”, to represent human actions. They proposed a combination of classi iers in which AdaBoost and sparse representation (SR) are used as basic algorithms. In the work of Vemulapalli et al. [29], human actions are modeled as curves in a Lie group of Euclidean distances. The classi ication process is a combination of dynamic time warping, Fourier tempo‐ ral pyramid representation, and linear SVM. Thanks to higher quality results, arti icial neural networks are replacing other methods. Thus, the most recently conducted research in the area of human activity classi ication differs only by the proposed network architecture. Networks based on the LSTM
Journal of Automation, Mobile Robotics and Intelligent Systems
architecture or a modi ication of this architecture (a ST‐LSTM network with trust gates) were proposed by Liu et al. [30] and Shahroudy et al. [31]. They intro‐ duced so called “Trust Gates” for controlling the con‐ tent of an LSTM cell and designed an LSTM network capable of capturing spatial and temporal dependen‐ cies at the same time (denoted as ST‐LSTM). The task performed by the gates is to assess the reliability of the obtained joint positions based on the temporal and spatial context. This context is based on the position of the examined junction in the previous moment (tem‐ poral context) and the position of the previously stud‐ ied junction in the present moment (spatial context). This behavior is intended to help network memory cells assess which locations should not be remem‐ bered and which ones should be kept in memory. The authors also drew attention to the importance of capturing default spatial dependencies already in the skeleton data. They have experimented with different mappings of the a joint’s set to a sequence. Among the, they mapped the skeleton data into a tree representa‐ tion, duplicating joints when necessary to keep spatial neighborhood relation, and performed a tree traversal to get a sequence of joints. Such an enhancement of the input data allowed an increase of the classi ication accuracy by several percent. The work [32] introduced the idea of applying convolutional ilters to pseudo‐images in the context of action classi ication. A pseudo‐image is a map (a 2D matrix) of feature vectors from successive time points, aligned along the time axis. Thanks to these two dimensions, the convolutional ilters ind local relationships of a combined time‐space nature. Liang et al. [33] extended this idea to a multi‐stream net‐ work with three stages. They use 3 types of features, extracted from the skeleton data: positions of joints, motions of joints and orientations of line segments between joints. Every feature type is processed inde‐ pendently in its own stream but after every stage the results are exchanged between streams. Graph convolutional networks are currently considered a natural approach to the action (and interaction) recognition problem. They are able to achieve high quality results with only modest requirements of computational resources. “Spatial Temporal Graph Convolutional Networks” [34] and “Actional‐Structural Graph Convolutional Networks” [35] are examples of such solutions. Another recent development is the pre‐processing of the skeleton data in order to extract different type of information (e.g., information on joints and bones, and their relations in space and time). Such data streams are irst separately processed by so called multi‐stream neural networks and later fused to a inal result. Examples of such solutions are the “Two‐Stream Adaptive Graph Convolutional Network” (2S‐AGCN) and the “Multistream Adaptive Graph Con‐ volutional Network” (MAGCN), proposed by Shi et al. [36, 37]. One of the best performances on the NTU RGB+D interaction dataset is reported in the work of Perez
VOLUME 17,
N∘ 3
2023
et al. [15]. Its main contribution is a powerful two‐ stream network with three‐stages, called “Interaction Relational Network” (IRN). The network inputs are basic relations between joints of two interacting per‐ sons tracked over the length of image sequence. An important step is the initial extraction of relations between pairs of joints – both distances between joints and their motion are obtained. The neural net‐ work makes further encoding and decoding of these relations and a inal classi ication. The irst stream means the processing of within‐a‐person relations, while the second one – between‐person relations. The use of a inal LSTM with 256 units is a high‐ quality version of the IRN network, called IRN‐LSTM. It allows to reason over the interactions during the whole video sequence – even all frames of the video clip are expected to be processed. In the basic IRN, a simple densely‐connected classi ier is used instead of the LSTM and a sparse sequence of frames is processed. The currently best results for small networks are reported by Zhu et al. [16], where two new modules are proposed for a baseline 2S‐AGCN network. The irst module extends the idea of modelling relational links between two skeletons by a spatio‐temporal graph to a “Relational Adjacency Matrix (RAM)”. The second novelty is a processing module, called “Dyadic Relational Graph Convolution Block”, which combines the RAM with spatial graph convolution and tem‐ poral convolution to generate new spatial‐temporal features. Very recently, exceptionally high performance was reported when using networks with 3D convolutional layers, applied to data sensors that constitute skeleton “heatmaps” (i.e., preprocessed image data) [38]. The approach, called PoseConv3D, can even be topped, when fused with the processing of ordinary RGB‐ data stream [39]. Obviously, this requires to create a heavy network and produces high computational load. 2.3. Conclusion From the analysis of the recent most successful solutions, we can draw three main conclusions: 1) using an analytic preprocessing of skeleton‐data to extract meaningful information and cancel noisy data, either by employing classic functions or learnable function approximations (e.g., relational networks); 2) preferring light‐weight solutions by employing background (problem‐speci ic) knowledge, i.e., using graph CNNs instead of CNN or CNNs with 2‐D kernels instead of 3‐D CNN; 3) a video clip containing a speci ic human action or interaction can be processed alternatively as a sparse or dense frame sequence, where sparse sequence is chosen to achieve real‐time process‐ ing under limited computational resources, while the processing of a dense sequence leads to better performance. 3
Journal of Automation, Mobile Robotics and Intelligent Systems
3. The Approach 3.1. Structure The input data has the form of video clips, contain‐ ing a single‐person action or a two‐person interaction. The two proposed solutions have a common irst part and differ by the feature extraction step and neural network‐based classi ier, as shown in Figure 1. The common processing steps are: key frame selection, skeleton estimation, skeleton tracking and ‐correcting. The inal steps, customized for action and interaction recognition, are: feature extraction and deep neural network model. Key frame selection The start‐ and end frames of an activity in a video clip are detected irst. The idea of this detection follows a typical approach to voice activity detection in speech signal, but a hys‐ teresis of thresholds for motion “energy” instead of signal energy is applied. The detected activity window will be represented by a ixed number 𝑁 of video frames (e.g. 𝑁 = 32) from 𝑀 subintervals (groups). Thus, 𝑁 = 𝑀 ⋅ 𝑚, where 𝑀 is the number of consecu‐ tive subintervals (groups), while 𝑚 is the number of frames in one group. In particular, let us ix 𝑀 = 4 and denote the groups as follows: start, 1‐st intermediate, 2‐nd intermediate and inal. For every group a sepa‐ rate expert (a weak action classi ier) will be created. Skeleton estimation In our implementation, we use the core block of OpenPose [44], the “body_25” model, to extract 25 human skeletal keypoints from an image. The result of OpenPose, as applied to a sin‐ gle key frame, will be a set of skeletons, where a 25‐elementary array represents a single skeleton, pro‐ viding 2D (or 3D, if needed) image coordinates and a con idence score for every keypoint (called “joint”). OpenPose provides the ability to obtain 2D or 3D information about detected joints. If one selects the 3D option, views must be speci ied for the library to perform triangulation. The library returns for every joint the following data: ‐ (𝑥, 𝑦, 𝑝)—for 2D joints; ‐ (𝑥, 𝑦, 𝑧, 𝑝)—for 3D joints. where (𝑥, 𝑦) are the image coordinates of a pixel, 𝑧 is the depth associated with the given pixel, 𝑝 is the certainty of detection and is a number in the range <0:1>. We have used the pretrained “OpenPose” model with default parameter settings and no person number limit. The received skeleton data was initially re ined by deleting keypoints with certainty 𝑝 below the certainty threshold of 0.1. Later, during skeleton re inement, such missing joints and remaining weak joints (with 𝑝 < 0.3) will be approximated from the data of their well‐detected neighbors (with 𝑝 ≥ 0.3). The OpenPose library offers the possibility to choose a model of a human igure. There are three models: “15 body”, “18 body”, and “25 body”. The num‐ ber in the name refers to the number of detectable joints. Table 1 lists the typically selected model of the “25 body”. 4
VOLUME 17,
N∘ 3
2023
Table 1. List of keypoints (joints) in the “25 body” model Number 0 1 2, 5 3, 6 4, 7 8 9, 12 10, 13 11, 14 15, 16, 17, 18 19, 20, 21, 22, 23, 24
Description The main point of the head Neck base Shoulders Elbows Wrists The base of the spine Hips Knee Cube Extra head points Extra foot points
Skeleton tracking and correcting In case, more than two skeletons in an image are returned by OpenPose, the two largest skeletons, 𝑆𝑎 , 𝑆𝑏 , are selected irst and then tracked in the remaining frames. We focus on the irst 15 joints of every skeleton, as the information about the remaining joints is very noisy (Fig. 2). In the case of many‐person scenes, the sets of skeletons generated by OpenPose, are not uniquely indexed over the frame sequence. There may also be falsely detected skeletons for objects in the back‐ ground, or a large mirror can lead to a second skeleton of the same person. The algorithm for tracking skeletons in many‐ skeleton frames can be seen as a multiple path search and can be solved in many ways. For example, beam search or some type of dynamic programming search could be used. Our algorithm initializes paths for up to two “foreground” skeletons, detected in the irst frame, and then runs a loop over the remaining frames trying to extend every path by the nearest skeleton detected in the next frame that is suf iciently close to the path’s last skeleton. New paths can also start in later frames when apparently new persons appear on the scene. The invariance of features with respect to the size of the skeleton in the image is obtained by normalizing the coordinates of the junction points with the section length between the neck 𝑗𝑜𝑖𝑛𝑡1 and the center of the hips 𝑗𝑜𝑖𝑛𝑡8 . The selected sequence of skeletons, representing one person, is sometimes deteriorated by missing joints of some skeleton or by missing an entire skele‐ ton in a frame. These misses of joints or virtual repli‐ cations of skeletons introduce noise and may also destabilize the feature normalization step when the locations of required joints are missing. Let 𝒗𝑖 be a series of 𝑁 positions of the „𝑖‐th” joint in time: 𝑣𝑖 = [𝑜11 , 𝑜𝑖2 , ..., 𝑜𝑖𝑁 ]. The procedure for the re inement of joints can be summarized as follows: 1) IF some position 𝑜𝑖𝑡 is missing THEN take the aver‐ age of neighbors 𝑜𝑖𝑡−1 , 𝑜𝑖𝑡+1 in time; 2) in the middle of the frame sequence, a missing joint is set to a linear interpolation of the two closest‐ (.) time known positions of this joint: IF 𝑜𝑖 is missing 𝑘 consecutive times, i.e., from 𝑡 to 𝑡 + 𝑘 − 1; THEN (𝑡−1) (𝑡+𝑘) take interpolations of values 𝑜𝑖 , 𝑜𝑖 ;
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Figure 1. General structure of our approach
Figure 3. Illustration of the PS features: the vectors between reference point 𝑆 and every skeleton joint, normalized by vector 𝒖 Figure 2. The 15 reliable keypoints (joints) (indexed from 0 to 14) out of 25 of the OpenPose’s “body_25” skeleton model 3) for the initial frame, the position of a missing joint is set to its irst detection in the frame sequence: IF (.) 𝑜𝑖 is missing irst 𝑘 times; THEN set irst 𝑘 values (𝑘+1) , (𝑡 = 1, ..., 𝑘); to 𝑜𝑖𝑡 = 𝑜𝑖 4) joints that are lost at the end of the frame sequence (.) receive the positions last seen: IF 𝑜𝑖 is missing last (𝑁−𝑘) 𝑘 times; THEN set last 𝑘 values to 𝑜𝑖𝑡 = 𝑜𝑖 , (𝑡 = 𝑁 − 𝑘 + 1, ..., 𝑁); (.)
5) IF 𝑜𝑖 is completely missing in the entire sequence; THEN set the joint data according to its sister joint, i.e., obtain a symmetric mapping (w.r.t. the spin axis) of “visible” sister joint. 3.2. Feature Extraction The result of tracking (up to) two sets of skeleton joints in 𝑁 frames can be represented as a 2D map of 𝑁 × 15 × 2 vector entries, where 2 means image coor‐ dinates (𝑥, 𝑦). We call this structure as RAW features. A representation of junction positions (the RAW features) has the disadvantage of being not invari‐ ant to basic transformations of the image space. It does not explicitly represent relationships within a skeleton and between two skeletons, like motion of joints and relative orientation of branches. There‐ fore, a relational representation of both skeletons was developed, which reduces the aforementioned disad‐ vantages of the raw representation of joints. The new features consist of: 1) int‐PS (”polar sparse” for interaction) features: with a local center S and normalization vector u,
de ined for a pair of skeletons, vectors are drawn between point S and every joint of the two skele‐ tons – all these vectors are length‐ and orientation‐ normalized w.r.t. 𝒖; 2) int‐PSM (”polar sparse with motion” for interac‐ tion) features: the int‐PS features with additional motion vectors of all the joints. For memory‐less networks, like the MLP, in order to make them competitive to LSTM models, we add motion vectors for every joint of every skeleton. The int‐PS features will be fed into the LSTM‐based clas‐ si ier, while the int‐PSM features – into the ensem‐ ble of pose‐based classi iers (the “mixture of experts” model). int‐PS Let us de ine the center point 𝑆 of vector 𝒖, which connects the central spin points of both skele‐ tons (Fig. 3). Now 15 vectors are de ined for every skeleton. Every vector connects the point 𝑆 with a joint of skeleton 1 or 2. Every vector is represented in polar form by two features – normalized magnitude ℎ𝑎,𝑗 , ℎ𝑏,𝑗 and relative orientation 𝑟𝑎,𝑗 , 𝑟𝑏,𝑗 (both magnitude and orientation are normalized with respect to magnitude and orientation of 𝒖). Thus, for every frame there are 60 features de ined (= (15 + 15) ⋅ 2). The 𝑁 ⋅ 60 features are split into two maps, 𝐻𝑎𝑁 and 𝐻𝑏𝑁 , one for each skeleton: 𝒉1 ⎡ 𝑎2 𝒉 𝐻𝑎𝑁 = ⎢ 𝑎 ⎢ ... ⎣ 𝒉𝑁 𝑎
⌣ 𝒓1𝑎 ⎤ ⌣ 𝒓2𝑎 ⎥ ⌣ ... ⎥ ⌣ 𝒓𝑁 𝑎 ⎦
(1)
𝒉1 ⎡ 𝑏2 𝒉 𝐻𝑏𝑁 = ⎢ 𝑏 ⎢ ... ⎣ 𝒉𝑁 𝑏
⌣ 𝒓1𝑏 ⎤ ⌣ 𝒓2𝑏 ⎥ ⌣ ... ⎥ ⌣ 𝒓𝑁 𝑏 ⎦
(2)
5
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Figure 4. Structure of the “mixture‐of‐pose‐experts” model act‐PS For single‐person action classi ication, one feature map 𝐻𝑎𝑁 , with 30 features, is created only. In this case, the reference point S is the center point of spin segment 𝑜1 ⌢ 𝑜8 and the vector u is the spin segment 𝑜1 ⌢ 𝑜8 . int‐PSM In addition to the int‐PS features, motion 𝑖 𝑖 vectors (𝛿𝑥𝑎,𝑗 , 𝛿𝑦𝑎,𝑗 ) are de ined for every joint of the two skeletons. Thus, ((15 + 15) × 2) = 60 features are added and the int‐PSM vector consists of 120 features. act‐PSM For single‐person action classi ication, the act‐PS feature vector is extended by motion vectors of the single skeleton joints only. Thus, 30 features are added, and the act‐PSM vector has 60 features in total. 3.3. Mixture of Pose Experts (MPE) The irst neural network model is called “mix‐ ture of pose experts” (shortly: MPE) (Fig. 4). We can distinguish three parts of this network: one pose encoder/classi ier, four pose‐based activity classi iers (”pose experts”), and a inal activity classi ier (fusing the results of pose experts). There is a common pose encoding network, which is trained with 4 × 𝐾 pose classes, i.e., every pose class represents one of the 𝐾 actions (or interactions) in one of the four time periods. After the training is accomplished, the classi ication layers are omitted, and the encoding embedding vector is passed to the four weak pose classi iers (called “experts”). Thus, the output layer of the pose classi ier is replaced by every expert network – a fully connected hidden layer with 6
𝐾 outputs each. Four alternative classi iers are trained on the PSM features obtained for samples from the time period corresponding to the given pose classi ier. The mixture of experts network (MPE) is imple‐ mented using Keras [45]. During training, the models are evaluated, and a search for optimal model param‐ eters is performed – a RandomSearch algorithm from Keras is applied in this stage. The following options of the MLP parameters are evaluated: the number of hid‐ den layers of the network can vary from 1 to 3, differ‐ ent activation functions (ReLU and/or sigmoid) may be chosen, as well as the number of neurons in hidden layers and the learning rate can vary. The ensemble classi ier consists of a fusion of expert results and an aggregation of class likelihoods over the entire frame sequence. The fusion layer is again a fully connected layer that is weighting the results of all weak classi‐ iers. It takes the frame index t as it is an additional input. Activity (i.e., action or interaction) classi ication then consists of specifying likelihoods of all activity classes by aggregating their corresponding pose likeli‐ hoods over the entire time sequence. The aggregation operation is mathematically a weighted sum of pose likelihoods for frames indexed from 𝑡 = 1 to 𝑡 = 𝑁. The ensemble classi ier provides gain coef icients for the four experts depending on the frame index: 𝑁
𝑺 = ∑𝑡=1 [𝑷𝒓𝑒𝑥𝑝𝑒𝑟𝑡_1 (𝑡) ⋅ 𝑤1 (𝑡) + 𝑷𝒓𝑒𝑥𝑝𝑒𝑟𝑡_2 (𝑡) ⋅ 𝑤2 (𝑡) + 𝑷𝒓𝑒𝑥𝑝𝑒𝑟𝑡_3 (𝑡) ⋅ 𝑤3 (𝑡) + 𝑷𝒓𝑒𝑥𝑝𝑒𝑟𝑡_4 (𝑡) ⋅ 𝑤4 (𝑡) (3)
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Figure 6. Image samples of actions from the NTU RGB+D action dataset (from left to right and top to bottom): drink, eat, hand waving, jump up, put palms together, take off a hat Table 2. The “everyday activities” subset (40 classes) of the NTU‐RGB+D dataset
Figure 5. Architecture of the SC‐LSTM‐PS network for 𝑁 = 28 key frames. The versions of SC‐LSTM differ only by the input layer size 3.4. LSTM Model A “single channel” LSTM network (denoted further as SC‐LSTM) is proposed. It consists of two LSTM lay‐ ers, interleaved by two Dropout layers, and two dense layers (Fig. 5). Depending on the role of given net‐ work model, we can distinguish between the action‐ and interaction‐classi ication models (SC‐LSTM‐act, SC‐LSTM‐int). In turn, based on the type of input fea‐ tures (RAW or PS/PSM) every such model will appear in two versions (e.g., SC‐LSTM‐act‐RAW, SC‐LSTM‐ int‐PS). Recall that the two baseline feature versions SC‐LSTM‐act‐RAW and SC‐LSTM‐int‐RAW process the raw skeleton joints, as initially obtained by OpenPose and established by the skeleton tracking and re ine‐ ment step. These versions differ by the input layer only, as there are different numbers of features considered. For example, in the SC‐LSTM‐act‐PS version, there are 3,350 k trainable parameters.
4. Results 4.1. Datasets Action set The dataset “NTU RGB+D” [31] is the basic set used in this work. It was made by ROSE (Rapid‐ Rich Object Search Lab), which is the result of a col‐ laboration between Nanyang Technological University in Singapore and Peking University in China. Many
A1: drink water A2: eat meal A3: brush teeth A4: brush hair A5: drop A6: pick up A7: throw A8: sit down A9: stand up A10: clapping A11: reading A12: writing A13: tear up paper A14: put on jacket A15: take off jacket A16: put on a shoe A17: take off a shoe A18: put on glasses A19: take off glasses A20: put on a hat/cap A21: take off a hat/cap A22: cheer up A23: hand waving A24: kicking something A25: reach into pocket A26: hopping A27: jump up A28: phone call A29: play with phone/tablet A30: type on a keyboard A31: point to something A32: taking a sel ie A33: check time (from watch) A34: rub two hands A35: nod head/bow A36: shake head A37: wipe face A38: salute A39: put palms together A40: cross hands in front
works on human action recognition have already been validated on this dataset, and a website collects the achieved performance scores [14]. The set can be characterized as follows (Fig. 6): ‐ Contains RGB videos with a resolution of 1920 × 1080 (pixels). ‐ Includes depth and infrared maps with a resolution of 512 × 424 (pixels). ‐ Each behavior of the set is captured by three cam‐ eras. ‐ Behaviors were performed by people in two settings (showing activities from different viewpoints). ‐ It consists of 56,880 videos showing 60 classes of behavior. Among the classes of behavior, the most popular are the “everyday activities”. They constitute a subset of 40 classes, as listed in Table 2. The collection con‐ tains 37,920 video clips, and associated depth maps and infrared frames. 7
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 3. The 10 activity classes of the UTKinect‐Action3D dataset A1: walk A3: stand up A5: carry A7: push A9: wave hands
A2: sit down A4: pick up A6: throw A8: pull A10: clap hands
Figure 8. Image samples from the SBU Kinect interaction dataset – interaction classes (from left to right) “moving toward, moving apart, kicking, slapping”
Figure 7. Image samples from the NTU RGB+D interaction dataset – interaction classes “Punch/slap, kicking, shaking hands, touch pocket” The NTU RGB+D dataset allows to perform a cross‐ subject (person) (short: CS) or a cross‐view (CV) eval‐ uation. In the cross‐subject setting, samples used for training show actions performed by half of the actors, while test samples show actions of remaining actors. In the cross‐view setting, samples recorded by two cameras are used for training, while samples recorded by the remaining camera – for testing. A major advan‐ tage of this dataset is an exact speci ication which video clips should be used for training and which for testing. The UTKinect‐Action3D dataset [46] is the second set of people’s activities used in this work. This set will be a secondary set, which means that only a testing of the developed model will be performed on it. The UTKinect dataset can be described by the following: ‐ Includes RGB videos with 640 × 480 resolution (pixels). ‐ Includes depth maps with a resolution of 320 × 240 (pixels). ‐ Activities were performed by 10 people. Each per‐ son repeated the performed activity 2 times. There are 10 activity classes. In Table 3, there are listed the 10 activity classes in UTKinect‐Action3D. The clips in this collection are organised as photo series in catalogs. To view a speci ic video, images within each catalog must be collected. A single video contains a person performing a series of 10 actions. A video is labeled with the action class, start photo and end photo of every action. Interaction set The best con iguration of the pose experts and the inal, time‐accumulating network will 8
be trained and tested on the interaction subset of the NTU RGB+D dataset (Fig. 7). It includes 11 two‐person interactions of 40 actors: A50: punch/slap, A51: kick‐ ing, A52: pushing, A53: pat on back, A54: point in‐ ger, A55: hugging, A56: giving object, A57: touch pocket, A58: shaking hands, A59: walking towards, and A60: walking apart. In our experiments, the skele‐ ton data of the NTU‐RGB+D dataset is already consid‐ ered. There are 10,347 video clips in total, in which 7,334 videos are in the training set and the remaining 3,013 videos are in the test set. No distinct validation subset is distinguished, as the idea is to run suf icient numbers of training/testing iterations and to select the best test iteration. Skeleton data may consist of 25 joints of 3D skele‐ tons that apparently represent a single person. As our research objective is to analyse video data and focus on only reliably detected joints, we use only the 2D information of only the irst 15 joints. From a video sample, a set of frames is chosen as follows: the video clip is uniformly split into 𝑀 = 4 time intervals (”periods”), from every period some number of frames 𝑚 is selected. We tested 𝑚 = 2, 4, 6, 8 and found that 𝑚 = 8, giving 𝑁 = 32 is the best setting. A second interaction dataset, used here mainly for initial parameter search, is the SBU Kinect Interac‐ tion dataset [47]. There are two person interactions of 8 types recorded by Microsoft Kinect v1: moving toward, moving apart, pushing, kicking, slapping, giv‐ ing objects, hugging, and hand shaking (Fig. 8). There are 21 subsets of image sequences recorded with pairs of actors (7 actors in total), performing all 8 interac‐ tions – all in the same single environment. In total, there are around 300 video samples – images of res‐ olution 640 × 480 – recorded with time rate of 15𝑓𝑝𝑠. 4.2. Interaction Classification Pose model search For running the RandomSearch algorithm, an NNHyperModel is created, which imple‐ ments the HyperModel class from the Keras tuner. The hyper‐parameters of the search space are declared in NNHyperModel as class parameters. We trained and tested various con igurations of the Pose model and the mixture model on the SBU Interaction and UT‐ Action datasets. For a given number of hidden layers (1, 2 or 3), we searched for the best number of neu‐ rons, the best type of activation function and the best value of the learning rate. Using the RandomSearch
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 4. The mean accuracy on the SBU Interaction dataset of three pose expert models (PE) with 1, 2 and 3 hidden layers mAP PE – training mAP PE – test mAP
1 95% 82%
2 96% 84%
3 99% 82%
Table 5. The accuracy of pose experts (PE), and the accuracy of the mixtures of experts MPE, verified on the NTU RGB+D interaction dataset in the CS (cross subject) mode Classi ier Pose expert MPE
Training 88.4% 94.6%
Test 76.8% 84.0%
function during training, we identi ied three ANN con‐ igurations, each one best performing for given num‐ ber of hidden layers (1, 2 or 3). The performances of the three selected models after 100 epochs of training on the SBU Interaction dataset are shown in Table 4. The best mean test accuracy (i.e., the recall averaged over all classes) of 84% was achieved by the second model, whereas the other two have shown an accuracy of 82%. Consequently, we have chosen an ANN con iguration of 2 hidden layers with 700 and 500 neurons in the irst and second layer, respectively. The activation functions are ReLU and sigmoid, respectively. Mixture of ANN experts The inally chosen ANN pose experts follow the second version, with two hidden layers, as reported above. The inal score of every interaction class is obtained by the fusion network with inal accumulation over time. The class with the highest score is selected as the winner. A notable improvement is observed, when fusing the results of experts by the ensemble classi ier. The mean accu‐ racy of a pose expert (PE) was 88.4% (training) and 76.8% (testing), while the ensemble classi ier MPE has reached 94.6% and 84.0%, respectively (Table 5). Processing times Experiments were conducted on a personal computer with processor Intel® Core™ i7‐7700HQ CPU @ 2.80GHz, GPU Nvidia GeForce GTX 1060Mobile with Nvidia CUDA driver, 16 GB RAM DDR4 2400MHz and a PM961 NVMe SAMSUNG 256GB + HDDWestern Digital Blue 1TB. The average processing times are shown in Table 6. Comparison with related methods Many methods for two‐person interaction classi ication have been tested on the NTU RGB+D interaction dataset. We list some of the leading works in Table 7. The results can be characterized as follows: ‐ (a) the mixture of pose experts MPE‐int needs a low number of weights (456 𝐾) to be trained but achieves good quality (84%); ‐ (b) the single‐channel LSTM‐int needs a reason‐ able (midweight) number of weights (3.35 𝑀) but
Figure 9. Example of confusion matrix for NTU RGB+D interaction classes, obtained by the SC‐LSTM model achieves high quality (91.2%), slightly lower than current best approaches, based on graph CNNs [16] and 3D CNNs [39]; ‐ (c) Solutions, that process all or nearly all frames of the video clip demonstrate superior performance over solutions operating on sparse frame sequences. Confusion matrix Confusion matrices allow for accu‐ rate analysis of incorrect predictions of individual classes. Figure 9 shows an example of a confu‐ sion matrix obtained for the NTU RGB+D interaction classes. The number of test samples has been virtu‐ ally made equal for all the classes, thus the number of 276 positive results means a 100% accuracy for given class. We show results of an average performing model, so that mistakes are better visible than in cases of better performing models. It can be seen that the vast majority of class predictions are correctly done. As the dataset provides balanced sets for all classes, the simple accuracy measure is used: 𝐴𝑐𝑐 = (𝑇𝑃/𝐴𝑙𝑙)⋅ 100%, where 𝑇𝑃 means “true positive results” and 𝐴𝑙𝑙 – all test samples. ”Punch” (A50) is misclassi ied with A51‐A54, which all use hands to express an action. In turn, the “ inger pointing” class (A54) is mainly confused with the “punch” class (A50). In both cases, a similar hand movement is made towards the other person. The class of “pat on back” (A53) is confused with the class of “touch pocket” (A57). Both movements involve touching another person on their back. The “giving object” class (A56) and the “shaking hands” class (A58) represent very similar interactions – both involve the contact of the hand. The “walking towards” and “walking apart” classes are detected highly accu‐ rate. 4.3. Action Classification Comparison with related methods Our two models, designed for single‐person action classi ication (MPE‐ act‐PSM, SC‐LSTM‐act‐PS) are compared with other top‐performing solutions, when trained and tested on 9
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 6. Processing times measured for a video clip of 3 s Step Activity detection Skeleton detection Feature engineering Pose classi ier Mixture classi ier Total time
Per frame 0.5 – 5 ms 67 ms 8 ms 1 ms 1 ms 78 – 87 ms
32 frames 16 – 160 ms 2134 ms 256 ms 32 ms 32 ms 2470 – 2614 ms
Remarks in 2 – 20 frames OpenPose
Table 7. Interaction classification accuracy of leading works evaluated on the NTU‐RGB+D interaction set in the CS (cross subject) mode. Note: † – result according to [15], ‡ – result according to [16] Model ST‐LSTM [30] ST‐GCN [34] AS‐GCN [35] IRN𝑖𝑛𝑡𝑒𝑟+𝑖𝑛𝑡𝑟𝑎 [15] MPE-int-RAW SC-LSTM-int-RAW MPE-int-PSM SC-LSTM-int-PS LSTM‐IRN [15] 2S‐AGCN [36] DR‐GCN [16] 2S DR‐AGCN [16] PoseConv3D(J+L) [39]
Year 2016 2018 2019 2019 2022 2022 2022 2022 2019 2019 2021 2021 2022
Accuracy (CS) 83.0% † 83.3% † 89.3% † 85.4% † 76.1% 80.2% 84.0% 91.2% 90.5% † 93.4% ‡ 93.6% ‡ 94.6% ‡ 97.0% ‡
the NTU RGB+D dataset (i.e., the everyday activity subset). The results in the CV (cross‐view veri ication) mode are shown in Table 8. Typically, the accuracy obtained in CV mode used to be by a signi icant margin higher than in CS mode. But the number of action classes (40) is much higher than the number of inter‐ action classes (11), previously considered. There is a visible tradeoff between the complexity of a solution and classi ication accuracy – our lightweight MPE‐act‐PSM vs. complex solutions of three‐stream CNNs (3S‐CNN) or solutions using Graph CNNs and 3D convolutions of skeleton heatmaps (PoseC3D). Exceptionally high performance was recently reported, when fusing RGB data and skeleton data processing results [39]. However, the relatively early methods have performed worse than ours. Even the best among them, mentioned above as ST‐LSTM, has reached a performance level barely comparable with our raw skeleton data (RAW) processing methods. Confusion matrix Figure 10 shows the main con‐ fusions detected for 40 classes of the NTU RGB+D dataset. The confused classes can be divided into three groups. Actions numbered 11 (read), 12 (write), 29 (play phone/tablet) and 30 (type on keyboard) form the irst group. People performing these activities are usually inclined over one of several objects, i.e., a phone, a book, a notebook or a tablet. The skeletal system of these persons is quite similar. The second group of activities consists of numbers 10 (clapping) and 34 (rub hands). For them, the people’s stature is quite similar. The last group consists of activities 10
Parameters ∼ 2.1𝑀 3.08𝑀 ∼ 9.5𝑀 ∼ 9.0𝑀 0.456𝑀 3.35𝑀 0.456𝑀 3.35𝑀 ∼ 9.08𝑀 3.0𝑀 3.18𝑀 3.57𝑀 6.9𝑀
Sequence 32 32 32 32 32 32 32 32 max(𝑎𝑙𝑙, 300) max(𝑎𝑙𝑙, 300) max(𝑎𝑙𝑙, 300) max(𝑎𝑙𝑙, 300) max(𝑎𝑙𝑙, 300)
Figure 10. The main confusions between 40 classes of the NTU RGB+D action dataset: four main confusion cases (given by red numbers 1, 2, 3 and 4, where confusion no. 2 is a symmetric relation) in the 40 × 40 confusion matrix numbered 31 and 32, i.e., pointing with a inger and taking a sel ie. These actions may seem to be much different. However, it should be known that the sim‐ pli ied skeleton model does not have a representation of ingers. Thus, these two behaviors are observed as putting a hand in front of a person. The accuracy of the network with 40 classes was, in this particu‐ lar case, equal to 89.6%. Let us combine the three subsets of “similar” classes into separate meta‐classes (𝑀1, 𝑀2, 𝑀3): {11, 12, 29, 30} → 𝑀1, {10, 34} → 𝑀2, {31, 32} → 𝑀3. Thus, we get a 35‐class problem (32 “normal” classes and 3 meta‐classes). The mean accu‐ racy of such a classi ication problem would increase to 93.5%.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 8. Comparison of our best solutions with related work for the NTU‐RGB+D dataset (40 action classes, CV mode). Method (Ref. No., Year) ST‐LSTM ( [30], 2016) 3S‐CNN ( [33], 2019) PoseC3D ( [38], 2021) RGBPose‐Conv3D ( [39], 2022) MPE-act-PSM SC-LSTM-act-PS MPE-act-RAW SC-LSTM-act-RAW
Test Accuracy (%) 77.7 93.7 97.1 99.6 83.2 90.8 75.4 79.8
Experiment with the UTKinect Dataset Finally, we also ran a cross‐dataset experiment using an NTU RGB+D pre‐trained SC‐LSTM‐act for testing on the UTKinect set. A problematic issue when using both collections is that the second set contains only 5 image sequences and 10 action types. For this reason, a modi ied model was pre‐trained on the NTU RGB+D set, where the last layer has been changed from 40 classes of actions to 10 classes. The UTKinect set was divided into a training and test set in a ratio of 9:1. No validation subset was created. The training process has been limited to 50 epochs. The number of network weights to be trained was 397 k. The averaged results of ive tries with different training/test splits are as follows: training accuracy 98%, test accuracy 90%. 4.4. Discussion The proposed solutions to human activity classi‐ ication were experimentally validated on two video datasets. This use of popular datasets allowed a per‐ formance comparison of our approach with other methods described in the literature. The performance of our light‐weight solutions (MPE), based on a mix‐ ture of pose classi iers, is slightly lower than the best reported results (by up to 10% for compara‐ ble 2D skeleton data), but our model is more than 10 times lighter. It still performs similar to or bet‐ ter than relatively old heavy solutions. The perfor‐ mance of our mid‐weight solutions (SC‐LSTM) is comparable with the current best reported results of similar complexity. Only the top‐most solutions, based on Graph CNNs and when exploring dense sequences (all video frames) overpower our results by up to 6%.
5. Summary Two light‐weight and mid‐weight models were proposed for human activity classi ication in sparse image sequences (key frames of video clips). They use as a preliminary step a human skeleton estimation in single frames. Our main focus was to improve the quality of skeleton data and to de ine relational infor‐ mation for skeletons, which allows us to use simple encoding/classi ication networks and reach reason‐ able accuracy. It was experimentally shown, that using our relational features an accuracy improvement of 8‐10% has been achieved, compared to the use of RAW skeleton data. Another bene it of our approach
Parameters >2 M unknown 6.9 M >10 M 412 k 3.32 M 412 k 3.32 M
Sequence 8 32 all all 32 32 32 32
is its lightness, which makes it easily applicable on mobile devices and on robotic platforms. A practical advantage is the assumed sparsity of video frames. By adjustment of the key frame number, it makes real‐time processing possible even with moderate computational resources. The approach can easily be adopted to process true image sequences, like image galleries. The limitations of this study are: a focus on the actions of main body parts and the use of a single performance measure: ‐ As the feature vector is based on the subset of the 15 most reliably detected skeleton joints, human actions performed mainly by feet, hands and ingers, which are not included in this subset, cannot be properly distinguished from each other. ‐ The evaluation process of the proposed approach could include other popular measures, such as the precision‐recall curve and AUC. Our future work should focus on more extensive training and testing of various network architectures (e.g., on the NTU RGB+120 dataset) and on the exten‐ sion of feature engineering to deal with partial skele‐ ton information. AUTHORS Włodzimierz Kasprzak∗ – Warsaw University of Technology, Institute of Control and Computation Eng. ul.Nowowiejska 15/19, 00‐665 Warsaw, Poland, e‐mail: wlodzimerz.kasprzak@pw.edu.pl, Orcid: 0000‐0002‐4840‐8860, www.ia.pw.edu.pl/ ~wkasprza. Paweł Piwowarski – Warsaw University of Technology, Institute of Control and Computation Eng. ul.Nowowiejska 15/19, 00‐665 Warsaw, Poland, e‐mail: pawel@piwowarski.com.pl, Orcid: 0000‐ 0002‐4477‐2534. ∗
Corresponding author
ACKNOWLEDGEMENTS This work was conducted within the project APAKT, supported by “Narodowe Centrum Badań i Rozwoju”, Warszawa, grant No. CYBERSECIDENT/455132/III/ NCBR/2020. 11
Journal of Automation, Mobile Robotics and Intelligent Systems
References [1] C. Coppola, S. Cosar, D. R. Faria, and N. Bellotto. “Automatic detection of human interactions from RGB‐D data for social activity classi ica‐ tion,” 2017 26th IEEE International Symposium on Robot and Human Interactive Communication “RO-MAN”, Lisbon, 2017, pp. 871–876; doi: 10.1109/ROMAN.2017.8172405. [2] A. M. Zanchettin, A. Casalino, L. Piroddi, and P. Rocco. “Prediction of Human Activity Pat‐ terns for Human–Robot Collaborative Assembly Tasks,” IEEE Transactions on Industrial Informatics, vol. 15(2019), no. 7, pp. 3934–3942; doi: 10.1109/TII.2018.2882741. [3] Z. Zhang, G. Peng, W. Wang, Y. Chen, Y. Jia, and S. Liu. “Prediction‐Based Human‐Robot Col‐ laboration in Assembly Tasks Using a Learning from Demonstration Model,” Sensors, 2022, no. 22(11):4279; doi: 10.3390/s22114279. [4] M. S. Ryoo. “Human activity prediction: Early Recognition of Ongoing Activities from Streaming Videos,” 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 1036–1043; doi: 10.1109/ICCV. 2011.6126349. [5] K. Viard, M. P. Fanti, G. Faraut, and J.‐J. Lesage. “Human Activity Discovery and Recognition using Probabilistic Finite‐State Automata. “IEEE Transactions on Automation Science and Engineering, vol. 17 (2020), no. 4, pp. 2085–2096; doi: 10.1109/TASE.2020.2989226. [6] S. Zhang, Z. Wei, J. Nie, L. Huang, S. Wang, and Z. Li. “A review on human activity recognition using vision‐based method,” Journal of Healthcare Engineering, Hindawi, vol. 2017, Article ID 3090343; doi: 10.1155/2017/3090343. [7] A. Stergiou and R. Poppe. “Analyzing human‐ human interactions: a survey,” Computer Vision and Image Understanding, Elsevier, vol. 188 (2019), 102799; doi: 10.1016/j.cviu. 2019.102799. [8] A. Bevilacqua, K. MacDonald, A. Rangarej, V. Widjaya, B. Caul ield, and T. Kechadi. “Human Activity Recognition with Convolutional Neu‐ ral Networks,” Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2018), LNAI vol. 11053, Springer, Cham, Switzerland, 2019, pp. 541–552; doi: 10.1007/978‐3‐030‐10997‐ 4_33. [9] M. Liu, and J. Yuan. “Recognizing Human Actions as the Evolution of Pose Estimation Maps,” 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, June 18‐22, 2018, pp. 1159–1168; doi: 10.1109/CVPR.2018.00127. [10] E. Cippitelli, E. Gambi, S. Spinsante, and F. Florez‐ Revuelta. “Evaluation of a skeleton‐based method for human activity recognition on a 12
VOLUME 17,
N∘ 3
2023
large‐scale RGB‐D dataset,” 2nd IET International Conference on Technologies for Active and Assisted Living (TechAAL 2016), London, UK, 2016; doi: 10.1049/IC.2016.0063. [11] Z. Cao, G. Hidalgo, T. Simon, S.‐E. Wei, and Y. Sheikh, ”OpenPose: Realtime Multi‐Person 2D Pose Estimation Using Part Af inity Fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):172–186, 2021; doi: 10.1109/TPAMI.2019.2929257. [12] A. Toshev, and C. Szegedy. “DeepPose: Human Pose Estimation via Deep Neural Networks,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1653–1660; doi: 10.1109/CVPR.2014.214. [13] E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele. “Deepercut: a deeper, stronger, and faster multi‐person pose estimation model,” Computer Vision – ECCV 2016, LNCS vol. 9907, Springer, Cham, Switzerland, 2016, pp. 34–50; doi: 10.1007/978‐3‐319‐ 46466‐4_3. [14] [Online]. NTU RGB+D 120 Dataset. Papers With Code. Available online: https://paperswithcode .com/dataset/ntu‐rgb‐d‐120 (accessed on 30 June 2022). [15] M. Perez, J. Liu, and A.C. Kot, “Interaction Rela‐ tional Network for Mutual Action Recognition,” arXiv:1910.04963 [cs.CV], 2019; https://arxiv. org/abs/1910.04963 (accessed on 15.07.2022). [16] L.‐P. Zhu, B. Wan, C.‐Y. Li, G. Tian, Y. Hou, and K. Yuan. “Dyadic relational graph convolutional networks for skeleton‐based human interaction recognition,” Pattern Recognition, Elsevier, vol. 115, 2021, p. 107920; doi: 10.1016/j.patcog.2021.107920. [17] R.‐A. Jacobs, M.‐I. Jordan, S.‐J. Nowlan, and G.‐E. Hinton. “Adaptive mixtures of local experts,” Neural Comput., 3(1):79–87, 1991. [18] S. Puchała, W. Kasprzak, and P. Piwowarski. “Feature engineering techniques for skeleton‐ based two‐person interaction classi ication in video,” 17th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 2022, IEEE Explore, pp. 66–71; doi: 10.1109/ICARCV57592.2022.10004329. [19] P.‐F. Felzenszwalb, R.‐B. Girshick, D. McAllester, and D. Ramanan, ”Object detection with discrim‐ inatively trained part‐based models,” IEEE Trans. Pattern Anal. Mach. Intell., 2010, vol. 32, no. 9, pp. 1627–1645; doi: 10.1109/TPAMI.2009.167. [20] A. Krizhevsky, I. Sutskever, and G.‐E. Hinton, “ImageNet classi ication with deep convolu‐ tional neural networks,” Communications of the ACM, 2017, vol. 60(6), pp. 84–90; doi: 10.1145/ 3065386. [21] K. Simonyan, and A. Zisserman. “Very Deep Convolutional Networks for Large‐Scale Image
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Recognition,” arXiv, 2015, arXiv:1409.1556; http s://arxiv.org/abs/1409.1556.
Activity Analysis,” arXiv:1604.02808[cs.CV], 2016; https://arxiv.org/abs/1604.02808.
[22] K. He, X. Zhang, S. Ren, and J. Sun. “Deep Resid‐ ual Learning for Image Recognition,” Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778; doi: 10.1109/CVPR.2016.90.
[32] C. Li, Q. Zhong, D. Xie, and S. Pu. “Skeleton‐based Action Recognition with Convolutional Neural Networks,” 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 10– 14 July 2017, Hong Kong, pp. 597–600; doi: 10.1109/ICMEW.2017.8026285.
[23] T.‐L. Munea, Y.‐Z. Jembre, H.‐T. Weldegebriel, L. Chen, C. Huang, and C. Yang. “The Progress of Human Pose Estimation: A Survey and Taxonomy of Models Applied in 2D Human Pose Estimation,” IEEE Access, 2020, vol. 8, pp. 133330–133348; doi: 10.1109/ACCESS.2020.3010248.
[33] D. Liang, G. Fan, G. Lin, W. Chen, X. Pan, and H. Zhu. “Three‐Stream Convolutional Neural Net‐ work With Multi‐Task and Ensemble Learning for 3D Action Recognition,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 16–17 June 2019, Long Beach, CA, USA, IEEE, pp. 934–940; doi: 10.1109/cvprw.2019.00123.
[24] K. Wei, and X. Zhao. “Multiple‐Branches Faster RCNN for Human Parts Detection and Pose Esti‐ mation,” Computer Vision – ACCV 2016 Workshops, Lecture Notes in Computer Science, vol. 10118, Springer, Cham, 2017; doi: 10.1007/978‐ 3‐319‐54526‐4. [25] Z. Su, M. Ye, G. Zhang, L. Dai, and J. Sheng. “Cas‐ cade feature aggregation for human pose esti‐ mation,” arXiv, 2019, arXiv:1902.07837; https: //arxiv.org/abs/1902.07837. [26] H. Meng, M. Freeman, N. Pears, and C. Bai‐ ley. “Real‐time human action recognition on an embedded, recon igurable video processing architecture,” J. Real-Time Image Proc., vol. 3, 2008, no. 3, pp. 163–176; doi: 10.1007/s11554‐ 008‐0073‐1. [27] K.‐G. Manosha Chathuramali, and R. Rodrigo. “Faster human activity recognition with SVM,” International Conference on Advances in ICT for Emerging Regions (ICTer2012), Colombo, Sri Lanka, 12–15 December 2012, IEEE, 2012, pp. 197–203; doi: 10.1109/icter.2012.6421415. [28] X. Yan, and Y. Luo. “Recognizing human actions using a new descriptor based on spatial–temporal interest points and weighted‐ output classi ier,” Neurocomputing, Elsevier, vol. 87, 2012, pp. 51–61, 15 June 2012; doi: 10.1016/j.neucom.2012.02.002. [29] R. Vemulapalli, F. Arrate, and R. Chellappa. “Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, 23–28 June 2014, Columbus, OH, USA, IEEE, pp. 588–595; doi: 10.1109/cvpr.2014.82. [30] J. Liu, A. Shahroudy, D. Xu, and G. Wang, ”Spatio‐Temporal LSTM with Trust Gates for 3D Human Action Recognition,” Computer Vision – ECCV 2016, Lecture Notes in Computer Science, vol. 9907, Springer, Cham, Switzerland, 2016, pp. 816–833; doi: 10.1007/978‐3‐319‐46487‐ 9_50. [31] A. Shahroudy, J. Liu, T.‐T. Ng, and G. Wang. “NTU RGB+D: A Large Scale Dataset for 3D Human
[34] S. Yan, Y. Xiong, and D. Lin. “Spatial Tempo‐ ral Graph Convolutional Networks for Skeleton‐ Based Action Recognition,” arXiv:1801.07455 [cs.CV], 2018; https://arxiv.org/abs/1801.074 55, (accessed on 15.07.2022). [35] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, ”Actional‐Structural Graph Con‐ volutional Networks for Skeleton‐Based Action Recognition,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019, pp. 3590–3598; doi: 10.1109/CVPR.2019.00371. [36] L. Shi, Y. Zhang, J. Cheng, and H.‐Q. Lu. “Two‐ Stream Adaptive Graph Convolutional Networks for Skeleton‐Based Action Recognition,” arXiv:1805.07694v3 [cs.CV], 10 July 2019; doi: 10.48550/ARXIV.1805.07694. [37] L. Shi, Y. Zhang, J. Cheng, and H.‐Q. Lu. “Skeleton‐ based action recognition with multi‐stream adaptive graph convolutional networks,” IEEE Transactions on Image Processing, vol. 29, October 2020, pp. 9532–9545; doi: 10.1109/TIP.2020.3028207. [38] H. Duan, Y. Zhao, K. Chen, D. Shao, D. Lin, and B. Dai. “Revisiting Skeleton‐based Action Recog‐ nition,” arXiv, 2021, arXiv:2104.13586; https:// arxiv.org/abs/2104.13586. [39] H. Duan, Y. Zhao, K. Chen, D. Lin, and B. Dai. “Revisiting Skeleton‐based Action Recognition,” arXiv:2104.13586v2 [cs.CV], 2 Apr 2022; https: //arxiv.org/abs/2104.13586v2. [40] J. Liu, G. Wang, P. Hu, L.‐Y. Duan, and A. C. Kot. “Global Context‐Aware Attention LSTM Networks for 3D Action Recognition,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21‐26 July 2017, pp. 3671–3680; doi: 10.1109/CVPR.2017.391. [41] J. Liu, G. Wang, L.‐Y. Duan, K. Abdiyeva, and A. C. Kot. “Skeleton‐Based Human Action Recognition with Global Context‐Aware Attention LSTM Networks,” IEEE Transactions 13
Journal of Automation, Mobile Robotics and Intelligent Systems
N∘ 3
2023
on Image Processing (TIP), 27(4):1586–1599, 2018; doi: 10.1109/TIP.2017.2785279.
[45] [Online]. “Keras: the Python deep learning API,” https://keras.io/.
[42] J. Liu, A. Shahroudy, G. Wang, L.‐Y. Duan, and A. C. Kot. “Skeleton‐Based Online Action Pre‐ diction Using Scale Selection Network,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(6):1453–1467, 2019; doi: 10.1109/TPAMI.2019.2898954.
[46] [Online]. “UTKinect‐3D Database,” Available online: http://cvrc.ece.utexas.edu/KinectData sets/HOJ3D.html (accessed on 30 June 2022).
[43] T. Yu, and H. Zhu. “Hyper‐Parameter Optimiza‐ tion: A Review of Algorithms and Applications,” arXiv:2003.05689 [cs, stat], 2020; https://arxiv. org/abs/2003.05689. [44] [Online]. “openpose”, CMU‐Perceptual‐ Computing‐Lab, 2021; https://github.com/CM U‐Perceptual‐Computing‐Lab/openpose/.
14
VOLUME 17,
[47] Kiwon Yun. “Two‐person Interaction Detection Using Body‐Pose Features and Multiple Instance Learning,” h t t p s : / / w w w 3 . c s . s t o n ybrook.edu/~kyun/research/kinect_interactio n/index.html.
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
IMPROVED COMPETITIVE NEURAL NETWORK FOR CLASSIFICATION OF HUMAN POSTURES BASED ON DATA FROM RGB‐D SENSORS Submitted: 5th January 2023; accepted: 18th April 2023
Vibekananda Dutta, Jakub Cydejko, Teresa Zielińska DOI: 10.14313/JAMRIS/3‐2023/19 Abstract: The cognitive goal of this paper is to assess whether marker‐less motion capture systems provide sufficient data to recognize human postures in the side view. The research goal is to develop a new posture classification method that allows for analysing human activities using data recorded by RGB‐D sensors. The method is insensi‐ tive to recorded activity duration and gives satisfactory results for the sagittal plane. An improved competitive Neural Network (cNN) was used. The method of pre‐ processing the data is first discussed. Then, a method for classifying human postures is presented. Finally, classifi‐ cation quality using various distance metrics is assessed. The data sets covering the selection of human activities have been created. Postures typical for these activities have been identified using the classifying neural network. The classification quality obtained using the proposed cNN network and two other popular neural networks were compared. The results confirmed the advantage of cNN network. The developed method makes it possible to recognize human postures by observing movement in the sagittal plane. Keywords: Human motion, Posture classification, Human activity, Competitive neural network, Classifier
1. Introduction An activity (e.g., cooking, exploring, searching) includes a sequence of actions. Kinematic features extracted using human body models are used to identify human actions. Knowing that actions are their elementary components makes it possible to anticipate human activities. At the lowest abstraction level, human actions are recognised by identifying the sequence of the motion primitives that compose them, which can be addressed as the recognition of the posture types. The motion primitive is the small ‘uniform’ fragment of motion associated with the spe‐ ci ic posture. In humanoid robots (and in the case of a human), the movement is carried out at a certain posture, and this does not mean that the robot (or human) is ‘frozen’ in place. The posture is associated with the features of kinematic con iguration (e.g., elbow joint oriented ‘in’, elbow joint oriented ‘out’, sharp or open‐angle is kept in the knee joint, etc.). It does not mean that the angu‐ lar positions are constant; they undergo the changes to such an extent that the speci ic posture is kept.
Many different criteria are used to recognise different postures. The robots move to withhold some posture (please see the robotics textbooks and classic publi‐ cations, e.g., [24, 25]). With such interpretation, the activity, motion primitive, and posture are not very distinct. If the activity is realised with one posture or by more postures, it depends on the speci icity of the activity and on the criterion or resolution used to distinguish the postures. Research on postures recognition is needed to syn‐ thesize the movement of humanoid robots. Operat‐ ing in an unstructured and unfamiliar environment, these robots are expected to plan their actions using the knowledge of the sequence of postures observed in humans [29]. Posture recognition is also use‐ ful for recognizing and predicting human activities. Some researchers use the identi ication of posture sequences for this purpose [14]. In our other paper, we proposed a semantic database for inferring current activity from a sequence of actions [8, 9]. Presented research may also be helpful in the diagnostic analysis of human movement [22], including in detecting and studying movement disorders. 1.1. Research Challenges and Contribution One of the essential problems in the ield of robotics supporting human activities is an automated inference about what and how a human is doing. More‐ over, assisting robots should not stress a person with their appearance, unusual equipment [29], or strange posture. Hence, there is a need for registration and analysis of human movement [10]. The latest tech‐ nological advances, including higher‐resolution depth sensors and more ef icient methods of motion data processing, make it easier to track the movement of parts of the human body. One of the goals of this work is to contribute to the research on the motion planning of a humanoid robot using the concept of learning by observation. The robot is not expected to imitate the motion of a human, but it will learn a sequence of key postures taken by a human performing an activity. The robot will per‐ form these postures in its own way as the kinematics, dynamics, and features of the actuating system are different from a human. The activity will be created by transformations between the postures that make the sequence. We can assume greater ‘resolution’ of such sequences with many postures, but then converting them to robot postures requires considerable effort.
2023 © Dutta et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
15
Journal of Automation, Mobile Robotics and Intelligent Systems
On the other hand, using a small number of postures will leave a lot of ‘freedom’ to the humanoid robot motion planning system, which can lead to strange (postures not typical for a human being) intermediate postures. Therefore, it is logical to identify the key postures with the appropriate resolution. It is suitable to choose those that are not too similar to each other. In this work, posture is associated with an action. This means that ongoing human activity can be anticipated through sequences of identi ied postures. Therefore, developing a method that can be used for such a pur‐ pose is the second goal of this paper. This article proposes a novel approach to infer‐ ring human postures using a simpli ied model of the human body and taking into account the data from the RGB‐D sensors. Human movement is mainly observed in the sagittal plane because, in this plane, the most visible changes in posture are made during basic activ‐ ities such as lifting, carrying, reaching, and collecting. Finally, machine learning methods are used, consid‐ ering the positions of relevant human body points to classify (to recognise) human postures [28]. The research contribution of this work is as follows: 1) a body posture classi ication procedure using data delivered from RGB‐D sensors was proposed, 2) following [13] this work introduces a modi ied method to approximate the missing data and reduce measurement errors. The normalization of body coordinates in the data pre‐processing stage was also done, 3) the most appropriate distance metrics were indi‐ cated for the shallow competitive neural network (cNN) classifying human postures, 4) the developed classi ication method (leading to posture recognition) was assessed, and compared with the results obtained using other classifying networks, and the advantages of this method were demonstrated, 5) the datasets with observations of various human activities in the sagittal view were created and shared. The remaining part of this paper presents the contribution in detail. In Section 2, we discuss the previous works in the area of recognising human actions/activities. Section 3 describes the data processing method; next, the proposed method is introduced and tested, considering different distance metrics. Section 4 presents the evaluation of classi ication quality. Finally, section 6 gives the conclusions and indicates the directions of future work.
2. Related Work The state‐of‐the‐art works on human postures identi ication (which also means human action/activ‐ ity recognition) focus on relations between body points position that de ine the body posture in each moment. Rogez et al. [19] proposed an end‐to‐end Localization‐Classi ication‐Regression Network (LCR 16
VOLUME 17,
N∘ 3
2023
Net) for 2D and 3D pose estimation using video images. The probabilistic method to predict human movement trajectories was successfully investigated by Dutta et al. [9] considering the needs of robot helpmates. Various works [2, 16] have sought to generate 3D pose estimation from 2D keypoints. Hou et al. [12] presented a multi‐channel network based on graph convolutional networks for skeleton‐based action recognition. In the work [5], the method for pos‐ tures recognition in the frontal plane using rather time‐consuming coordinates transformation and fea‐ tures extraction was proposed. Recent advances in the imaging techniques offered by high‐quality depth sensors that perform well despite lighting and color variations [17, 20] enable the easy capture of human posture. Lately, the shift from conventional statistical meth‐ ods to methods based on machine learning has also been noticed. So, using machine learning tools makes it possible to capture complex relationships within recorded datasets [16]. Building the benchmark RGB‐ D data sets [11, 23], human body joints are often registered in the frontal view using the multi‐camera setup and arranged environments. One of the possible approaches for postures recognition is the method of applying the classifying neural networks. Apart from differences in scale and level of abstraction, one of the main dif iculties in using neural networks for recog‐ nising postures in a 3D space is the relatively limited number of training samples, especially concerning the data from the observations taken in the sagittal plane. These shortcomings and the need for human assisting robots were the motivations for undertaking the pre‐ sented work.
3. Materials and Methods 3.1. Recording Setup and Constitution of the New Data‐set The movement of actors performing complex activities was recorded in the sagittal plane and at an angle of 50∘ as shown in Figure 1. The created data set (WUT‐22) covers the recording environments’ diversity, taking into account the cultural background. Our dataset was recorded in two different environ‐ ments – the irst 10 activities were performed at War‐ saw University of Technology, Poland, and the next 10 activities were recorded at Waseda University, Japan, respectively. For data recording, two modern ZED RGB‐D cameras with ZED Body Tracking SDK software or two Azure RGB‐D cameras with Azure Microsoft Body Tracking SDK software were used [17]. The relationship between the data from the RGB‐ D image and the coordinates of the human body in three‐dimensional space was established [27]. Twelve healthy people aged 24 − 35 years, with a height of 151−187 cm and weight of 65−85 kG, took part in the study with their consent. Twenty activities were per‐ formed in a random order. For each involved person performing a speci ic role, the activity was recorded 3 times, which, with a total of 4 people, gave a total
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 1. List of recorded activities and their parameters Activity 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
∗
Description Pick up a heavy object from the loor and place it on the table Pick up a light object from the loor and place it on the table Give the other person a bottle of water Give the other person a cup of coffee Take a heavy box from kitchen shelf and place it on the table Clean the wall
No. of objects 2
No. of postures∗ 4
No. of persons 1
2
4
1
3
6
2
2
6
2
1
4
1
4
11
1
Pick up the phone and answer the call Take the book from trolley and place it on the table Drag the book cart to another location Push cart with objects near the table and organise them Pick up boxes from the wardrobe and place them in order Greet each other in traditional Japanese style Making a cup of tea and sit on the chair Bring the objects in a cart and arrange them in order Imitate food preparation Offer a cup of coffee to the guest Greet the guest and prepare a cup of coffee for the guest Organize objects with the help of a colleague Have a meal together with a colleague Offer a cup of coffee to the guest and drink together
3
5
1
5
4
1
3
3
1
3
3
1
3
12
1
1
9
2
4
10
1
4
11
1
3
5
1
4
12
2
2
21
2
5
25
2
3
41
2
4
21
2
indicated by cNN
of 12 records. The nine randomly selected series of recordings of each activity performed by each per‐ son participating in the experiment were selected for training and the remaining three for testing. Data for two different viewing angles were recorded simulta‐ neously. Different camera arrangements were used (see Fig. 2(b)) to test the resistance of the trained neural network to scene variability. Activities included both – short and long sequences of actions. Both marker‐less motion recording sensors were set to a resolution of 1920 × 1080 pixels, with a recording speed of 15 frames per second and a 120∘ ield of view. The minimum depth range was 20 cm. Besides the RGB‐D camera, each system includes an accelerometer, gyroscope, barometer, magnetometer, and temperature sensor; therefore, it can be used for many purposes. Appropriate software has been
developed, enabling a simpli ied representation of the human body. Out of the 34 (ZED) or 32 (Microsoft Azure) human body points seen by the camera, 19 points were selected as essential for posture expres‐ sion (see Fig. 2(a)). Following [26], the sensors were synchronized to collect motion data simultane‐ ously. Finally, the human motion database (WUT‐22) was created and used to elaborate the classi ication method. Table 1 lists signi icant parameters involved dur‐ ing the data set creation, together with the number of postures assigned after the neural network training. Depending on the scenario, 1 to 5 objects were used (see third column). The last column lists the number of persons participating in the activity. The fourth col‐ umn gives the overall number of postures involved in the activity, including those that are repeated several 17
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
one row the data ile, and as it was mentioned 𝑐 is the con idence score and 𝑐 ∈ {1, 2, … , 100} for the ZED sensor, 𝑐 ∈ {0, 1, 2, 3} for the Azure sensor,
Background
𝑐𝑡ℎ𝑟 – the threshold, for ZED 𝑐𝑡ℎ𝑟 = 80, and for Azure 𝑐𝑡ℎ𝑟 = 2, Starting Position
View2
50𝑜
View1
Cam 1 Cam: ZED/Microsoft Azure
Figure 1. Camera setup in the laboratory environment (top view) times. The given number of postures is according to the decision of the neural network. In the inal version, the cNN had 20 outputs, meaning 20 postures were expected. The column lists how often the postures appeared in the sequence, which means the repeated postures are counted. Therefore, if the recording is longer and consists of repetitive actions, the number of listed postures is greater. 3.2. Data Recording and Pre‐processing Stage The initial data processing stage was carried out using inspirations from the literature, e.g., [13]. Devel‐ oped software provides the ability to obtain 2D or 3D data describing the position of the human body points. Software associated with used RGB‐D cam‐ eras has skeleton tracking systems that, based on RGB‐D data, automatically provide coordinates of the so‐called joints (points of the human body) following a generally accepted standard. These coordinates are the raw data later processed, i.e. iltered and illed with missing points, as described below. 3D information used the (𝑥, 𝑦, 𝑧, 𝑐) data. The 𝑥, 𝑦, 𝑧 coordinates were transformed from the world ref‐ erence frame into the local reference frame attached to the waist. The axes of this frame keep identical and constant orientation as the world reference frame. All positions were expressed in millimeters. Additionally, 𝑐 is the con idence score (natural number) ranging ⟨0, 100⟩ for ZED and ⟨0, 3⟩ for Kinect Azure. If 𝑐 falls below the given thresholds, which means no detec‐ tion’ or ‘occlusion’, the joint information is considered inaccurate and, therefore, approximated considering its neighbouring data. The following parameters were de ined: 𝐾 – number of frames in the video (it is equal to the number of rows in the recorded CSV ile contain‐ ing the positions data), 𝐼 – number of measurement points (they are com‐ monly addressed as the joints) together with their data. For ZED, it is 34 points, where each point data are: 𝑥, 𝑦, 𝑧, 𝑐. For Azure, it is 32 points with the 𝑥, 𝑦, 𝑧, 𝑐 data in each frame, which makes 18
𝑏𝑓 – before, 𝑎𝑓 – after, the indexes indicating, for each 𝑖‐th joint accordingly, the closest frame with a valid reading. These frames are the nearest neigh‐ bours before (𝑘𝑏𝑓 ) and after (𝑘𝑎𝑓 ), the 𝑘‐th frame for which the reading is invalid. Next, the joint illing process is applied. For clarity of notation lets us denote by 𝑘 𝑥𝑖 ,𝑘 𝑦𝑖 ,𝑘 𝑧𝑖 ,𝑘 𝑐𝑖 the data for 𝑖‐th joint from 𝑘‐th frame. The procedure is described as Algorithm 1. Algorithm 1 Filling the missing data First frame: 𝑘 = 1, 𝑖 = {1, ...., 𝐼}. For each 𝑖‐th joint for which 𝑘 𝑐𝑖 ≤ 𝑐𝑡ℎ𝑟 ind 𝑘𝑎𝑓 for which 𝑘𝑎𝑓 𝑐𝑖 ≥ 𝑐𝑡ℎ𝑟 and substitute (1 𝑥𝑖 ,1 𝑦𝑖 ,1 𝑧𝑖 ,1 𝑐𝑖 ) = (𝑘𝑎𝑓 𝑥𝑖 ,𝑘𝑎𝑓 𝑦𝑖 ,𝑘𝑎𝑓 𝑧𝑖 ,𝑘𝑎𝑓 𝑐𝑖 ) Internal frames: do 𝑘 = 𝑘 + 1, 𝑖 = {1, ...., 𝐼} For each 𝑖‐th joint for which 𝑘 𝑐𝑖 ≤ 𝑐𝑡ℎ𝑟 ind: 1) 𝑘𝑏𝑓 for which 𝑘𝑏𝑓 𝑐𝑖 ≥ 𝑐𝑡ℎ𝑟 and 2) 𝑘𝑎𝑓 ≤ 𝐾 for which 𝑘𝑎𝑓 𝑐𝑖 ≥ 𝑐𝑡ℎ𝑟 . if 𝑘𝑎𝑓 ≠ 𝑁𝑢𝑙𝑙 then substitute 𝑘
𝑎𝑖 =
𝑘𝑎𝑓 𝑘 𝑎𝑖 − 𝑏𝑓 𝑎𝑖
𝑘𝑎𝑓 −𝑘𝑏𝑓
(𝑘 − 𝑘𝑏𝑓 ),
𝑎 = 𝑥, 𝑦, 𝑧 accordingly. end if if 𝑘𝑎𝑓 = 𝑁𝑢𝑙𝑙 (there is no frame in the remaining part of the recording containing valid reading) then substitute (𝑘 𝑥𝑖 ,𝑘 𝑦𝑖 ,𝑘 𝑧𝑖 ,𝑘 𝑐𝑖 ) = (𝑘𝑏𝑓 𝑥𝑖 ,𝑘𝑏𝑓 𝑦𝑖 ,𝑘𝑏𝑓 𝑧𝑖 ,𝑘𝑏𝑓 𝑐𝑖 ) end if while (𝑘 ≤ 𝐾) Last frame: 𝑘 = 𝐾, 𝑖 = {1, ....., 𝐼}. For each 𝑖‐th joint for which 𝑘 𝑐𝑖 ≤ 𝑐𝑡ℎ𝑟 ind 𝑘𝑏𝑓 for which 𝑘𝑏𝑓 𝑐𝑖 ≥ 𝑐𝑡ℎ𝑟 . and substitute (𝑘 𝑥𝑖 ,𝑘 𝑦𝑖 ,𝑘 𝑧𝑖 ,𝑘 𝑐𝑖 ) = (𝑘𝑏𝑓 𝑥𝑖 ,𝑘𝑏𝑓 𝑦𝑖 ,𝑘𝑏𝑓 𝑧𝑖 ,𝑘𝑏𝑓 𝑐𝑖 ) After illing the missing data, the motion trajec‐ tories of the body points were iltered using the Savitzky‐Golay [6] ilter, which smoothes the data without distorting it. Because biological motion tra‐ jectories are naturally smooth, the use of such ilter‐ ing reduces the noise, causing the abrupt nature of changes in recorded data. Before feeding them to the arti icial neural net‐ work, the coordinates were normalized using the fol‐ lowing relation: 𝑘 𝑘 𝑛𝑟𝑚 𝑎𝑖 =
𝑎𝑖 −𝑘 𝑎𝑖𝑚𝑖𝑛
𝑎𝑖𝑚𝑎𝑥 − 𝑎𝑖𝑚𝑖𝑛
(1)
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Plane 1
0.6m
Plane 2
1.0m
Plane 3
Different Geometrical Scenarios for Registering Human Movement Postures
2.5m
50𝑜 cam 1
(a)
(b)
Figure 2. Illustration of data points and applied cNN: (a) view of available points (left) and considered points (right), (b) geometrical scenarios for registering human movement postures
Figure 3. Illustration of the proposed cNN architecture
Where 𝑎𝑖𝑚𝑖𝑛 and 𝑎𝑖𝑚𝑎𝑥 (𝑎 = 𝑥, 𝑦, 𝑧) denote the min‐ imum and maximum value of the appropriate coor‐ dinate of 𝑖‐th data point in the whole training set, 𝑘 𝑛𝑟𝑚 𝑎𝑖 ∈ ⟨0, 1⟩ is the normalized value. 3.3. Neural Network This section describes the implemented neural network by the authors, incorporating an adaptation framework. Adaptation means the selection of the most suitable distance measure. The measure con‐ cerns the distances between the weight and input data vectors. Different distance metrics were considered. According to the literature [1], iterative selection of the most appropriate distance metrics is the clue to adaptive learning. In this process, the quality of learn‐ ing is tested depending on different distance mea‐ sures. Such measures are inally applied in the cNN that corresponds to the best outcome.
Used cNN is a single‐layer neural network in which each output neuron is connected with all inputs. For the purpose of cNN description, let us assume that each input vector containing 𝐼3 = 𝐼 ⋅ 3 elements (which are normalized coordinates) is denoted by 𝑘 f = [ 𝑘𝑓1 , 𝑘𝑓2 , .., 𝑘𝑓𝑖 , .., 𝑘𝑓𝐼3 ], where 𝑘 = 1, ..., 𝐾 is the frame index. The number of output neurons equals 𝑀 – the maximum number of postures plus one. The additional output is left for unrecognised postures, which means the postures which were not identi ied (and counted) by the human expert. The output neu‐ rons compete with each other to become activated after the input data vector is applied [1, 4]. The com‐ petition concerns the distance measure. Lets us denote by w𝑚 the vector of weights [ 𝑤1,𝑚 , 𝑤2,𝑚 , ......, 𝑤𝐼3,𝑚 ] associated with the 𝑚 − 𝑡ℎ output neuron. For each output neuron, the distance measure 𝑑𝑚 (𝑘 f, w𝑚 ) is calculated according to the formulas described below. Let’s denote by 𝑘𝑤 the win‐ ning neuron, and it is such a neuron for which the distance measure 𝑑𝑘𝑤 (..) between the input vector and weights vector is the smallest: 𝑑𝑘𝑤 = min 𝑑𝑚 (𝑘 f, w𝑚 ) 𝑚=1,..,𝑀
(2)
During the learning phase, the weights vector of the winner is updated according to the learning rule: w𝑘𝑤 (𝑛 + 1) = w𝑘𝑤 (𝑛) + 𝛼(𝑘 f − w𝑘𝑤 (𝑛))
(3)
𝑛 is the learning iteration (epoch) number, 𝛼 is the learning rate. The rule is called ‘winner takes all’. Knowing that the performance of the cNN depends, among others, on the initialization method of weights, the different initialization options were tested (the weights equal to the random values, the weights equal to the mean values of the training data, and the weights initialized using the Gaussian distribution). It was observed that the weights initialized using Gaussian distribution offer the best classi ication performance: 𝑤𝑖,𝑚 =
1 𝛼√2𝜋
exp(
−(𝑟𝑑 − 𝜇𝑖 )2 ) 2𝜎𝑖2
(4) 19
Journal of Automation, Mobile Robotics and Intelligent Systems
where 𝑟𝑑 is a random number from the range ⟨0, 1⟩, 𝜇𝑖 (𝑖 = 1, .., 𝐼3) is the mean value calculated taking into account all normalized training data applied as the 𝑖 − 𝐾 𝑡ℎ input (𝜇𝑖 = (∑𝑘=1 𝑘𝑓𝑖 )/𝐾), 𝜎𝑖 is the standard devi‐ ation of these data. After training, each output neuron represents a cluster in the input dataset, meaning that its weight vector makes the center of that cluster [21]. Figure 3 shows the general architecture of a cNN. An input vector 𝑘 f of normalized data is supplied to each 𝑚‐th output neuron. Each neuron has its weights vector w𝑚 selected in the training process, as described. These weights are ‘frozen’ in the testing phase. The distance 𝑑𝑚 between the current input vector and the weights vector is calculated. This dis‐ tance is determined using the adopted metric. The distances determined for all neurons are compared, and the neuron with the smallest distance wins. The transfer block (TF) generates outputs of 0 for all neu‐ rons except the winner, whose output is 1. We will not give the formulas for all distance metrics tested in this work as they are easily available. To focus the attention, we give the formula for angular (Cosine) distance, which was selected as the most appropriate, as described later. Cosine distance has often been used to classify high dimensional data. It is the dot product of the vectors divided by the product of their lengths. The Cosine metric does not depend on the magnitudes of the vectors but only on the angle between them. The Cosine distance between 𝑘 f and w𝑚 , is de ined as:
VOLUME 17,
N∘ 3
2023
k2 k3
k1k0
(a) Traditional cNN
k2
k3
k0 k1
𝑘
𝑑𝑐𝑜𝑠 = 1 −
f ⋅ w𝑚 ‖𝑘 f‖ ‖w𝑚 ‖
(5) (b) Proposed method (cNN + repulsion rule)
where w𝑚 denotes the weights vector of the 𝑚‐th neuron of the competitive layer. Avoiding overlapping clusters (which means the bad grouping of inputs [18]), a rule of repulsion was applied, ‘pushing’ the winner away from the rest of the neurons. Thus, during the training, after updating the weights of the winning neuron (neuron 𝑘𝑤 ), such weights 𝑤𝑖,𝑚 are moved away from the winner 𝑘𝑤 for which the absolute difference between 𝑤𝑖,𝑚 and the weight 𝑤𝑖,𝑘𝑤 is less than the de ined Δ threshold: |𝑤𝑖,𝑚 − 𝑤𝑖,𝑘𝑤 | < Δ
(6)
Such modi ication prevents the clusters nearer than the threshold Δ, and the modi ication rule applies the repulsion rate 𝛽, also called the conscience bias learning rate. Denoting by 𝑤𝑖,𝑟 , the neurons which are within Δ range from the winner 𝑤𝑖,𝑘𝑤 , the following formula is applied: 𝑤𝑖,𝑟 (𝑛 + 1) = 𝑤𝑖,𝑟 (𝑛) − 𝛽(𝑤𝑖,𝑘𝑤 (𝑛) − 𝑤𝑖,𝑟 (𝑛)) (7) In our research, the performance of the cNN with repulsion was compared with the conventional cNN. Both networks were trained using the same distance metric (in this case, Euclidean distance) for the same input data set. As seen in Figure 4, the repulsion strat‐ egy offered better placement of the central points of the clusters. These points (marked as green dots) are well separated (Fig. 4(b)). 20
Figure 4. Distribution of the winning neurons for the traditional cNN (upper part) and cNN with repulsion (lower part), 𝑘0, 𝑘1, 𝑘2, 𝑘3 are denoting the centers of each class for Activity 1 3.4. Implementation Details The collected data concerned 32 or 34 points of the human body, but the data from 19 points were selected as a set suf icient to describe the body pos‐ ture. Given that each point has 3 coordinates, it brings 𝐼3 = 57 inputs to the neural network, normalized with (1). Thus, each camera provides the input vec‐ tor k f = [ 𝑘𝑓1 , 𝑘𝑓2 , ..., 𝑘𝑓57 ] for each 𝑘th sample. During the training, input vectors from both cameras were supplied to the neural network, but only data from the camera placed in the sagittal plane were used for the tests. This arrangement made it possible to observe the resilience of the neural network to limited data. According to the literature [18], the value of 𝛽, the repulsion rate, should be less than 𝛼, the learning rate. As it was already mentioned, the number of out‐ puts of the neural network was by one greater than the number of expected postures. The postures ‘labelling’ were done after the neural network grouped the data. This was performed in the learning phase, linking the time interval of input data recording and the
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
(a) Euclidean
(b) Cosine
(c) Manhattan
(d) Minkowski
N∘ 3
2023
Figure 5. The scatter plots showing the distribution of the winning neurons for different distance metrics for Activity 1
corresponding active output of NN with the skeleton diagrams visible in the video. Thus, by observing what posture is maintained in this time range, a label (in other words, semantic meaning) was assigned to this posture. In this way, it was detected which output indi‐ cated which posture, which allowed the recognition of postures during the testing phase. Whether an action is carried out with one or more postures depends on the speci icity of the action and the resolution used to distinguish the positions. Before training, a human expert suggested the number of expected postures. In the case of all 20 activities, this number was 19, with the added 1 bearing in mind an unidenti ied case. Therefore, the number of outputs of the neural network was set to 20. 3.5. Repulsion Rule and Selection of the Distance Metric To illustrate the role of repulsion, only one activity was considered, namely Activity 1 (pick up a heavy object from the loor and place it on the table). In this case, a cNN with four outputs was implemented (for three different postures expected by the human expert). Additional output was de ined to indicate one more or an incorrectly recognized posture. The cNN with Euclidean distance was trained. The value 0.007 was selected as the learning rate 𝛼, and the repulsion rate 𝛽 was 0.0001. The number of iterations during the training phase was limited to
100 after observing that the minimum distance error remained essentially constant after 70 − 80 iterations. Figure 4 illustrates the advantage of using the rule of repulsion. The upper part of the igure shows the distribution of winning neurons obtained in the train‐ ing phase for the neurons updated only with the for‐ mula (3), and the lower part shows the result obtained using the rules of repulsion (6,7). Each dot represents the winning neutron position obtained for each data frame. As can be seen, repulsion prevents class over‐ lapping, and clusters are clearly visible. We focus on the 𝑋𝑌 (sagittal) plane data in this and other igures. This is the plane that best shows changes in pos‐ ture. However, the full set of coordinates (that means 𝑥, 𝑦, 𝑧) was used for training and testing. After investigating the repulsion role, the most appropriate distance metric was chosen. The widely used Euclidean, Manhattan, Cosine, Minkowski, Hamming, and Mahanalobis distances were applied in the irst stage. Based on the results (not described in this article), four best‐performing metrics were selected for the next stage of investigations, namely: (a) Euclidean distance, (b) Cosine distance, (c) Manhattan distance, and (d) Minkowski distance. The results presented in Figure 5 in the form of scatter plots show the distribution of winning neu‐ rons for each training epoch. The data is based on training with Activity 1 only. Then, a set containing 21
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
(a) Euclidean
(b) Cosine
(c) Manhattan
(d) Minkowski
N∘ 3
2023
Figure 6. The scatter plots showing the distribution of the winning neurons for different distance metrics for Activity 12. Here 𝑝1 ‐ person 1, and 𝑝2 ‐ person 2, respectively
(a) Euclidean
(b) Cosine
(c) Manhattan
(d) Minkowski
Figure 7. The confusion matrix of proposed cNN taking into account the distance metrics for Activity 1 22
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 2. Performance evaluation: cNN, KNN, and uRF with different distance metrics/configurations Method
Distance
ACC
Pr
Re
F1-score
cNN
Euclidean Cosine Manhattan Minkowski
89.6% 95.6% 87.7% 92.2%
91.3% 97.0% 91.0% 94.6%
89.6% 95.2% 87.7% 92.2%
90.0% 95.9% 89.0% 94.0%
KNN [7]
Euclidean Cosine Manhattan Minkowski
73.9% 93.8% 80.8% 82.2%
75.3% 94.1% 82.1% 83.6%
73.1% 93.3% 80.4% 81.2%
72.0% 93.5% 80.3% 82.0%
uRF [15]
tree‐16,0 depth‐8 tree‐200, depth‐10 tree‐220, depth‐8 tree‐250, depth‐8
78.8%
79.3%
78.5%
78.6%
78.2%
78.9%
78.2%
79.1%
80.06%
82.5%
81.9%
80.1%
80.9%
81.01%
80.65%
80.9%
Figure 8. Computational efficiency of proposed cNN considering the distance metrics the results of nine recording sessions for Activity 1 and Activity 12 was used for training. The number of outputs of the neural network was ive. Scatterplots for this case are shown in Figure 6. In both examples, the separation of clusters is obvious, and the overlap of winning neurons is small, which justi ies the validity of all considered metrics. Results are shown for the most signi icant 𝑋𝑌 (sagittal) plane. In addition, Figure 7 shows the confusion matrix obtained when testing cNNs trained using only Activ‐ ity 1. As seen, the results obtained for distances Cosine and Minkowski are better than those for Euclidean and Manhattan distances.
4. Experimental Evaluation The inal evaluation included the comparison of the proposed cNN with the results provided by the other commonly used networks with unsupervised training. For this evaluation, the data for all 20 activi‐ ties were used. The number of cNN outputs was set to 20. The training was conducted using the data for nine
randomly selected recording sessions for each activity, and the remaining three were used for testing. First, the performances of cNN, KNN, and uRF with four different distance metrics or con igurations were compared. KNN is the Kohonen Neural Net‐ work [7], which belongs to a family of self‐organizing feature maps that uses a competitive learning scheme with an additional parameter – the neighbourhood function. Therefore, the main parameters and struc‐ ture of the KNN were similar to the proposed cNN (see the irst paragraph of section 3.4). uRF is the unsupervised Random Forest [15]. According to [15], the uRF construction process involves modifying the Random Forest algorithm using k‐means clustering. When building a modi ied Random Forest, the follow‐ ing parameters were used: (a) number of trees (n‐ estimators: 160, 200, 220, 250), (b) maximum depth (max‐depth: 8, 10), (c) min‐samples‐ split (8), (d) min‐ samples‐leaf (16), (e) max‐leaf‐nodes (16), (f) max‐ features (sqrt), (g) criterion (entropy) respectively. Referring to [9], a standard selection of assessment measures was used, namely accuracy (ACC), preci‐ sion (Pr), memory (Re), and F1 score (F measure). A detailed description of the assessment measures can be found in [9]. Accuracy is most often used in classi ication problems. This measure, while intuitive, can be misleading in the case of a strongly unbalanced number of samples in [3] classes. Fortunately, this was not the case in the presented studies. Precision and recall are not very useful when considered sepa‐ rately because precision does not distinguish between true positives and true negatives, and recall does not inform how many cases were incorrectly classi‐ ied. Taking into account the above, the F1‐score was additionally used. It is the weighted harmonic mean of precision and recall (F1‐score=2(Pr⋅Re)/(Pr+Re)). Testing results obtained using the above three meth‐ ods and four different distance measures (if applica‐ ble), are given in Table 2. cNN with Cosine distance has a 95.6% accuracy. 23
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
(a) frame 20
(b) frame 50
(c) frame 138
(d) frame 20
(e) frame 73
(f) frame 111
N∘ 3
2023
Figure 9. Sample image frames from both activity 1 and activity 12
(a) Classification for cNN𝑒𝑐𝑢𝑙
(b) Classification for cNN𝑐𝑜𝑠
(c) Classification for cNN𝑚𝑎ℎ
(d) Classification for cNN𝑚𝑖𝑛
Figure 10. Stick diagrams of classification for Activity 1 (single person involved) are taken into account with four different distance metrics and configurations A similar result concerns KNN with Cosine distance for which the accuracy is 93.8%. The classi ication quality is strong. The precision, recall, and F1 scores are high for all the above networks. Maximum scores obtained for each method are denoted by boldface 24
font in Table 2. These studies led to the indication of the most appropriate distance metric. As Table 2 shows, the highest set of scores was obtained for cNN with Cosine distance. The second highest result was obtained for KNN with Cosine distance.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
(a) Classification for cNN𝑒𝑐𝑢𝑙 for person 1
(b) Classification for cNN𝑒𝑐𝑢𝑙 for person 2
(c) Classification for cNN𝑐𝑜𝑠 for person 1
(d) Classification for cNN𝑐𝑜𝑠 for person 2
(e) Classification for cNN𝑚𝑎𝑛 for person 1
(f) Classification for cNN𝑚𝑎𝑛 for person 2
(g) Classification for cNN𝑚𝑖𝑛 for person 1
(h) Classification for cNN𝑚𝑖𝑛 for person 2
N∘ 3
2023
Figure 11. Stick diagrams of classification for Activity 12 (2 people involved) are taken into account with four different distance metrics and configurations
25
Journal of Automation, Mobile Robotics and Intelligent Systems
Next, the computational ef iciency was examined to illustrate the cNN features better. Fig 8 shows the computational performance obtained for different dis‐ tance metrics during training and testing. As can be seen in Figure 8, Manhattan distance gives the best results in terms of computational performance, fol‐ lowed by Euclidean. The Cosine distance metric shows the lowest ef iciency. Finally, a qualitative assessment was made by dis‐ playing the classi ied postures using stick diagrams of the human body. In this paper, the results obtained only for one testing session and for selected activities are presented. However, the results obtained for the remaining two sessions are similar, and the results for the remaining activities do not show excessive cases. Figure 9 shows example images for Activity 1 and Activity 12. The results illustrated by Figure 10 and Figure 11 con irm that the proposed cNN classi ies the postures well. Each class is represented by one color in the stick diagrams. cNN with distance Cosine and Minkowski correctly classi ied all postures for Activity 1. While testing using data for Acivity 12, cNN with distances Cosine and Minkowski correctly classi ied most postures. Only one posture was misclassi ied for person 1 when using the Cosine distance measure. Two misclassi ied plus one additional (unexpected by the expert) posture occurred in the case of Minkowski distance. A single posture was misclassi ied for Euclidean distance, and even more errors occurred for Manhattan distance for Activity 1. On the other hand, for Activity 12, cNN with Euclidean distance had two misclassi ied positions for person 1, and even more for person 2. In the case of Manhattan distance, for both Activity 1 and Activity 12, several postures were assigned to unexpected or incorrect classes compared to the human expert indications (Fig. 10(c)).
5. Discussion It should be emphasized that the data collection and processing method has been developed and veri‐ ied. Further data collection is in progress. The studies described do not include data relevant to recognising human‐human and human‐object relationships. Such studies are planned for the future. The described cNN network has been fully imple‐ mented, and the remaining neural networks have been used as ready‐to‐use programs. The software was developed in Python. Its libraries were also used, including machine learning tools, e.g., scikit-learn was used to evaluate the results, and MatplotLib was used for visualization. The calculations were performed on a PC with an Intel Core‐i7 (2.8 GHz) processor and 64 GB of RAM. Presented studies can be further extended. Only two RGB‐D cameras collected the data, so using multi‐ camera data is a future research topic. Despite the use of many data sets, the effectiveness of the devel‐ oped method of posture classi ication was veri ied experimentally, taking into account still a limited set of postures. 26
VOLUME 17,
N∘ 3
2023
6. Conclusion An improved version of a conventional cNN was programmed and tested with an adaptive selection of the most appropriate distance metric. Evaluations of the proposed approach made using the test dataset con irmed the good accuracy of the classi ication of the developed cNN network (95.6%) and KNN net‐ work (93.8%) with the Cosine distance measure. It was also shown that the use of RGB‐D sensors for human postures recognition is a good option wher‐ ever expensive professional motion capture systems are not available. Applied type of neural network is classic. The net‐ work is simple and allows a full understanding of how it works. It is easy to modify so that it effectively meets the requirements. It was programmed from scratch because the ready‐made libraries have many default functions that are not always required and may delay the work. The proposed method allows for recognizing pos‐ tures in real‐time. The classi ication process was tested in a simulated online mode using recorded test data, and online classi ication was performed using a pre‐trained neural network. The results were satisfac‐ tory, and the processing time was not noticeable to a human observer. The indication that the angular measure is the most appropriate for the classi ication of postures is a valuable contribution. Changes in the angular positions of body parts cause changes in posture. In this sense, a larger number of output neurons means that postures with smaller angular differences will be grouped into one group, and fewer neurons mean that positions with more different postures are clustered together. By controlling the number of outputs neu‐ rons make, it is easy to in luence the ‘resolution’ in postures recognition.
AUTHORS Vibekananda Dutta∗ – Institute of Micromechanics and Photonics, Faculty of Mechatronics, Warsaw University of Technology, ul.Sw. Andrzeja Boboli 8, 02‐525 Warsaw, Poland, e‐mail: vibekananda.dutta@pw.edu.pl. Jakub Cydejko – Institute of Automatics and Robotics, Faculty of Mechatronics, Warsaw University of Tech‐ nology, ul.Sw. Andrzeja Boboli 8, 02‐525 Warsaw, Poland, e‐mail: jakub.cydejko2.stud@pw.edu.pl. Teresa Zielińska – Institute of Aeronautics and Applied Mechanics, Faculty of Power and Aeronautical Engineering, Warsaw University of Technology, ul.Nowowiejska 24, 00‐665 Warsaw, Poland, e‐mail: teresa.zielinska@pw.edu.pl, www: https://ztmir.meil.pw.edu.pl/web/eng/Pracownicy/ prof.‐Teresa‐Zielinska. ∗
Corresponding author
Journal of Automation, Mobile Robotics and Intelligent Systems
ACKNOWLEDGEMENTS The research leading to the described results was carried out within the program POB‐IDUB funded by Warsaw University of Technology within the Excel‐ lence Initiative Program – Research University (ID‐ UB). The data used in this work are available from the corresponding author by request.
References [1] A. R. Abas. “Adaptive competitive learning neural networks”, Egyptian Informatics Journal, vol. 14, no. 3, 2013, 183–194. [2] M. A. R. Ahad et al. “Action recognition using kinematics posture feature on 3d skeleton joint locations”, Pattern Recognition Letters, vol. 145, 2021, 216–224. [3] P. Branco, L. Torgo, and R. P. Ribeiro. “A survey of predictive modeling on imbalanced domains”, ACM Computing Surveys (CSUR), vol. 49, no. 2, 2016, 1–50. [4] G. Budura, C. Botoca, and N. Miclău. “Competitive learning algorithms for data clustering”, Facta universitatis-series: Electronics and Energetics, vol. 19, no. 2, 2006, 261–269. [5] B. Cao, S. Bi, J. Zheng, and D. Yang. “Human posture recognition using skeleton and depth information”. In: 2018 WRC Symposium on Advanced Robotics and Automation (WRC SARA), vol. 1, 2018, 275–280, doi: 10.1109/WRC‐ SARA.2018.8584233. [6] J. Chen, P. Jönsson, M. Tamura, Z. Gu, B. Mat‐ sushita, and L. Eklundh. “A simple method for reconstructing a high‐quality ndvi time‐series data set based on the savitzky–golay ilter”, Remote Sensing of Environment, vol. 91, no. 3‐4, 2004, 332–344. [7] K.‐H. Chuang, M.‐J. Chiu, C.‐C. Lin, and J.‐H. Chen. “Model‐free functional mri analysis using kohonen clustering neural network and fuzzy c‐means”, IEEE Transactions on Medical Imaging, vol. 18, no. 12, 1999, 1117–1128. [8] V. Dutta, and T. Zielinska. “An adversarial explainable arti icial intelligence (xai) based approach for action forecasting”, Journal of Automation Mobile Robotics and Intelligent Systems, vol. 14, 2020. [9] V. Dutta, and T. Zielinska. “Prognosing human activity using actions forecast and structured database”, IEEE Access, vol. 8, 2020, 6098–6116. [10] V. Farrahi, M. Niemelä, M. Kangas, R. Korpelainen, and T. Jämsä. “Calibration and validation of accelerometer‐based activity monitors: A systematic review of machine‐learning approaches”, Gait and Posture, vol. 68, 2019, 285–299. [11] M. Firman. “Rgbd datasets: Past, present and future”. In: IEEE Proc., vol. 1, 2016, 661–673.
VOLUME 17,
N∘ 3
2023
[12] R. Hou et al. “Multi‐channel network: Construct‐ ing ef icient gcn baselines for skeleton‐based action recognition”, Computers and Graphics, vol. 110, 2023, 111–117. [13] W. Kasprzak, and B. Jankowski. “Light‐weight classi ication of human actions in video with skeleton‐based features”, Electronics, vol. 11, no. 14, 2022, 2145. [14] V. Kellokumpu, M. Pietikäinen, and J. Heikkilä. “Human activity recognition using sequences of postures”. In: IAPR, Conference on Machine Vision Applications, vol. 1, no. 1, 2022, 570–573. [15] F. Kruber, J. Wurst, and M. Botsch. “An unsu‐ pervised random forest clustering technique for automatic traf ic scenario categorization”. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), vol. 1, 2018, 2811–2818. [16] J. Liu et al. “A graph attention spatio‐temporal convolutional network for 3d human pose esti‐ mation in video”. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), vol. 1, 2021, 3374–3380. [17] L. E. Ortiz, V. E. Cabrera, and L. M. Goncalves. “Depth data error modeling of the zed 3d vision sensor from stereolabs”, ELCVIA: Electronic Letters on Computer Vision and Image Analysis, vol. 17, no. 1, 2018, 1–15. [18] E. J. Palomo, E. Domínguez, R. M. Luque, and J. Munoz. “A competitive neural network for intrusion detection systems”. In: International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences, vol. 1, 2008, 530–537. [19] G. Rogez, P. Weinzaepfel, and C. Schmid. “Lcr‐ net++: Multi‐person 2d and 3d pose detection in natural images”, IEEE TPAMI, vol. 42, no. 5, 2019, 1146–1161. [20] L. Romeo, R. Marani, M. Malosio, A. G. Perri, and T. D’Orazio. “Performance analysis of body track‐ ing with the microsoft azure kinect”. In: 2021 29th Mediterranean Conference on Control and Automation (MED), vol. 1, 2021, 572–577, doi: 10.1109/MED51440.2021.9480177. [21] D. E. Rumelhart, and D. Zipser. “Feature discovery by competitive learning”. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, 151–193. 1986. [22] R. J. Saner, E. P. Washabaugh, and C. Krishnan. “Reliable sagittal plane kinematic gait assess‐ ments are feasible using low‐cost webcam tech‐ nology”, Gait and Posture, vol. 56, 2017, 19–23. [23] A. Shahroudy, J. Liu, T. Ng, and G. Wang. “Ntu rgb+d: A large scale dataset for 3d human activ‐ ity analysis”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, Los Alamitos, CA, USA, 2016, 1010–1019, doi: 10.1109/CVPR.2016.115. 27
Journal of Automation, Mobile Robotics and Intelligent Systems
[24] C. SL. “Task compatibility of manipulator pos‐ tures”, International Journal of Robotics Research, vol. 7(5), 1988, 13–21, doi: 10.1177/027836 498800700502. [25] P. Tommasino, and D. Campolo. “An extended passive motion paradigm for human‐like posture and movement planning in redundant manipu‐ lators”, Frontiers in Neurorobotics, vol. 11, 2017, doi: 10.3389/fnbot.2017.00065. [26] S. Vafadar, W. Skalli, A. Bonnet‐Lebrun, M. Khal‐ ifé, M. Renaudin, A. Hamza, and L. Gajny. “A novel dataset and deep learning‐based approach for marker‐less motion capture during gait”, Gait and Posture, vol. 86, 2021, 70–76. [27] X. Wang, G. Liu, Y. Feng, W. Li, J. Niu, and Z. Gan. “Measurement method of human lower limb joint range of motion through human‐machine interaction based on machine vision”, Frontiers in Neurorobotics, vol. 15, 2021.
28
VOLUME 17,
N∘ 3
2023
[28] L.‐F. Yeung, Z. Yang, K. C.‐C. Cheng, D. Du, and R. K.‐Y. Tong. “Effects of camera viewing angles on tracking kinematic gait patterns using azure kinect, kinect v2 and orbbec astra pro v2”, Gait and Posture, vol. 87, 2021, 19–26, doi: 10.1016/j.gaitpost.2021.04.005. [29] T. Zielinska, G. R. R. Coba, and W. Ge. “Variable inverted pendulum applied to humanoid motion design”, Robotica, vol. 39, no. 8, 2021, 1368– 1389.
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
MULTIMODAL ROBOT PROGRAMMING INTERFACE BASED ON RGB‐D PERCEPTION AND NEURAL SCENE UNDERSTANDING MODULES Submitted: 14th January 2023; accepted: 24th May 2023
Bartłomiej Kulecki DOI: 10.14313/JAMRIS/3‐2023/20 Abstract: In this paper, we propose a system for natural and intu‐ itive interaction with the robot. Its purpose is to allow a person with no specialized knowledge or training in robot programming to program a robotic arm. We utilize data from the RGB‐D camera to segment the scene and detect objects. We also estimate the configuration of the opera‐ tor’s hand and the position of the visual marker to deter‐ mine the intentions of the operator and the actions of the robot. To this end, we utilize trained neural networks and operations on the input point clouds. Also, voice commands are used to define or trigger the execution of the motion. Finally, we performed a set of experiments to show the properties of the proposed system. Keywords: Human‐robot interface, Robot programming, 3D perception
a
b
c
d
Figure 1. Interaction of the operator with the robot using gestures: pointing at the object by the operator (a), grasping the object by the robot initiated by a voice command (b), passing to hand order (c), transferring an object to the operator’s hand (d)
1. Introduction Cooperative robots are becoming increasingly popular in modern industry, because they support the production process. Once programmed, they perform repetitive tasks for days or months. Reprogramming a robot is a process that requires robotics expertise. New methods of robot programming are constantly being developed to enable operators to conveniently program robots to perform complex tasks. This article addresses the problem of intuitive robot programming using natural language. The proposed interface does not rely on expensive devices for robot programming, such as operator panels (teach pendants) or virtual reality goggles. The main goal of the integrated sys‐ tem is to interpret the operator’s intentions based on camera images and voice commands to program the robot’s motions. Convolutional Neural Networks (CNN) enable the detection of objects in the scene and the interpretation of context related to the robot operator’s intentions. To make robot programming more lexible and enable fast and natural programming of movements, in this article, we propose to use 3D perception and CNNs to interpret the operator’s intentions. The goal of the designed interface is to mimic human‐to‐human com‐ munication, which is based on words and gestures. The operator uses his hands to point at an object to be
grasped or asks the robot to pass an object to his hand (Fig. 1) by giving an appropriate voice command.
2. Related Work This article proposes to use various interfaces to de ine the robot’s expected motion. The recognized user’s commands can cause scheduling or direct exe‐ cution of repetitive tasks like in [5, 11]. In this work, we combined hand gesture recognition, voice com‐ mands, and scene understanding to implement a new natural human‐robot interface for programming the manipulator. Many human‐robot interfaces use mul‐ tiple channels of communication with the user. The popular approach is connecting gestures with voice commands as in [18]. Such a solution prevents many faulty actions caused by the user’s mistakes in ges‐ tures or misunderstandings by a robot. One of the most natural ways to interact with a robot is through verbal communication. Xiaoling et al. [17] present a system that controls a mobile robot using voice commands. MFCCs (Mel Frequency Cepstral Coef icients) are extracted as audio signal features. Authors perform speech recognition based on a pattern matching algorithm – Dynamic Time Warping (DTW). A similar and common approach
2023 © Kulecki. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
29
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Robot
Voice commands recognition Robot program (instructions set)
Hand detection and gesture recognition
Robot controller
Pointer detection Head RGB-D camera
Wrist RGB-D camera
RGB images
Object detection
Depth images Point clouds
Scene segmentation
Grasp planning
Figure 2. Architecture of the proposed robot programming system – a user can interact with the robot through voice commands, hand gestures, and a pointer to indicate 3D positions and trigger desired actions is the use of the Hidden Markov Model (HMM). A common solution to the speech recognition prob‐ lem is to use off‐the‐shelf software for this purpose, such as Microsoft or Google Speech API [18, 22], CMU Sphinx [23], or wit.ai environment. For the correct performance of the human‐robot interface, the voice recognizer module has to convert speech to text, ind a keyword in the command, connect it with the speci ic action, and send an order to the robot. Voice com‐ mands greatly simplify the communication between the operator and the robot [1]. Many human‐robot interfaces use vision as a source of information for planning the robot’s motion. Gesture recognition is an important function of many user interfaces, as it is natural for people to com‐ plete a verbal message with a gesture. This method enables the user to select an object in the scene or to show the desired robot position. In [3], the authors implemented an interface to control a dual‐arm robot. They used a Leap Motion sensor and estimated the position and orientation of the operator’s hands using Kalman and particle ilters. The method presented in [4] uses RGB‐D images to detect hand gestures and track whole body parts, allowing the operator to select objects on the loor with his arm. Many methods use neural networks to detect hand (or body) posi‐ tions and con igurations and classify gestures [8, 19, 20, 24]. The recognized gestures are then translated into commands corresponding to the de ined robot motions. In [6, 7] human action prediction in human‐ object interactions is performed using a probabilistic framework. Similar to our approach, RGB‐D images are utilized. The operator can use gestures to specify an object for the robot to grasp. Other ways of selecting objects utilize visual markers. A popular approach is based on a laser pointer and detecting the laser spot in the RGB image. This solution is often used in robotic arms supporting partially paralyzed people [9]. A simple detection process and fast selection of objects from a long distance are the main advantages of using a laser pointer. However, this method can be problematic for detecting the laser spot on transparent or re lective objects. 30
The human‐robot interface plays an important role, but only the visual understanding of the scene enables the robot to interact with the a priori unknown and ever‐changing environment [1, 21]. Object recognition is another ield that gives robots new advanced capabilities. Today’s state‐of‐the‐art methods are based on Convolutional Neural Net‐ works. Examples include YOLO [2], SSD [16], and Mask R‐CNN [10]. Knowledge about the categories of the objects on the scene often helps in grasp planning [14] and enables selecting an object by name. All robotic systems with user interfaces have to meet the imposed requirements regarding the cor‐ rect understanding of commands, effectiveness, and time to complete the requested tasks. HRI veri i‐ cation scenarios include the execution of tasks like assembling [18] or objects grasping [4]. One of the most common criteria of veri ication metrics is a task success rate and execution time [3]. For the system based on voice commands [18] the authors evaluated the time and accuracy of a voice command understanding. In this research, we used the latest advances in machine learning for hand detection, object detec‐ tion, and grasping to achieve an ef icient and intu‐ itive human‐robot interface. The presented results are an extension of the approaches shown in pre‐ vious papers [11–13]. Compared to our previous publications, this algorithm improves the pointer‐ less programming presented in [13] by enabling an indication of the target with the inger. Also, voice command communication is developed to work in two‐way asynchronous mode. In this work, we inte‐ grate vision‐based pointer [11] and gesture recogni‐ tion [13] approaches and perform extended experi‐ mental veri ication of the resulting system. Our contribution with respect to state of the art can be summarized as follows: ‐ intuitive robot programming interface based on nat‐ ural language, ‐ object selection method based on neural object detector, scene segmentation, and hand gestures, ‐ a complete architecture of the user interface that improves the robot‐human interaction process.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Figure 3. Gestures recognized by the classifier: pointing and outreached
3. Natural Human‐Robot Interaction System This research was conducted using our mobile‐ manipulating platform Robot 4.0 [14]. During exper‐ iments, we do not use the lower part of the robot, but only the Universal Robots UR5 robotic arm pre‐ sented in Figure 1. The robot is equipped with an Intel Realsense D435 camera mounted on its wrist, and the Microsoft Kinect Xbox One mounted on its head for 3D perception. For grasping, we use an industrial two‐ ingered OnRobot RG6 gripper. Because of force feedback, using such grippers increase the success rate of many grasp‐planning algorithms. The Robot Operating System (ROS) coordinates the functionality of all modules. The robot workspace consists of a table with various objects on its surface. 3.1. System Architecture The architecture of the proposed system is pre‐ sented in Figure 2. The user communicates with the robot through voice commands, gestures, and the pointer. Hand, objects, and pointer detection are per‐ formed based on RGB‐D images. The scene segmen‐ tation module processes the point cloud from the camera. After recognizing proper voice commands, estimated hand con iguration and pointer position, as well as the information about the objects on the scene, determine the robot’s motion. 3.2. Voice Commands The proposed human‐robot interface uses voice commands to con irm intentions expressed by ges‐ tures or to command the robot directly. For this purpose, the module that converts speech to text is needed. We decided to use an off‐the‐shelf solution for this purpose – the SpeechRecognition library and the Google Speech Recognition module, which is easy to integrate, highly effective, and has a choice of many different languages. The beginning of the operator’s speech is automatically detected (the module works asynchronously), and the voice command is recorded and sent to the recognizer server (Internet connection is required). After successful speech processing, the user can ensure that the recognized command is valid since the text is converted back to audio format and played. The set of available commands and associated actions are presented below. We de ined each of them in English and Polish with a few equivalent versions which contain synonyms:
Figure 4. Example output from the object detector – an image with labeled objects 1) grasp (+object name): the robot plans and exe‐ cutes the trajectory to grasp the selected object and closes the gripper, 2) give to hand: the robot passes the grasped object to the operator’s hand, 3) put down: the robot places the grasped object in the indicated (or saved) position on the table, 4) go there: the robot arm (end effector) moves to the position of the detected pointer or hand, 5) move home: the robot arm moves to the prede‐ ined “home” con iguration, 6) move place: the robot arm moves to the prede‐ ined “place” con iguration, 7) go to initial con iguration: the robot arm moves to the initial con iguration, 8) look down/front: sets end effector pitch angle, 9) save initial con iguration: saves current robot con iguration as initial, 10) save robot con iguration: saves current robot con iguration as a waypoint, 11) save put down position: saves indicated posi‐ tion as a place position, 12) save trajectory: saves current waypoint list as a trajectory, 13) start scanning: the robot starts scanning the environment (the table and objects), 14) open/close the gripper: the robot’s gripper opens or closes, 15) clear/execute program: the motion program is cleared or executed, 16) play: enables robotic arm external control program, 17) stop: stops the execution of current trajectory. 3.3. Hand Detection and Gestures Recognition The hand detection in the RGB image is per‐ formed using the state‐of‐the‐art system for hand detection and tracking – Mediapipe Hands [26]. The output from this module is a set of 21 hand landmark points consisting of 3 coordinates: x, y, and relative depth. Obtained data, which informs us about the hand pose (con iguration of hand and ingers), can 31
Journal of Automation, Mobile Robotics and Intelligent Systems
a
VOLUME 17,
a
b
c
d
N∘ 3
2023
b
Figure 5. Example pointer detection on RGB image (a) and point cloud with object bounding boxes where red box refers to the selected object (b) be divided into categories representing gestures. We propose a method that recognizes gestures to allow effective interaction between the user and the robot. Two gestures shown in Figure 3, named pointing and outreached, are de ined. The set of 63 coordinates obtained from the hand detector is used as a fea‐ ture vector for classi ication. First, we collected the training data set consisting of 9700 samples (5500 for class pointing and 4200 for the outreached class). Four different classi iers were trained and tested using the scikit‐learn library: Random Forest, Support Vec‐ tor Classi ier, Ridge Classi ier, and K‐Nearest Neigh‐ bors. The inal system utilizes the K‐NN algorithm that has 99% accuracy on the test dataset containing 2400 samples. The recognized gesture type de ines the robot’s behavior. If the predicted class is pointing, the program searches for the selected object and estimates the indicated point on the table. Conversely, when the result of the classi ier is an outreached ges‐ ture, a target position for the gripper to pass an object is de ined on the hand surface. 3.4. Pointer Detection The main task of the pointer detection module is to ind a pointer in the RGB image and estimate its position in 3D space. Later, the obtained 3D coordi‐ nates are used to de ine the target position of the robot end effector, select an object for grasping, or de ine the place on the table to put the gripped object down. First, the algorithm inds a pointer in the HSV image based on color and shape. Then, a mask is created, which extracts image pixels itting de ined ranges of HSV values corresponding to the color of the pointer. The resulting image is converted to grayscale, thresh‐ olded, and blurred using a Gaussian ilter. We use the Hough transform to ind the circle whose center is the position (𝑥𝑝𝑥 , 𝑦𝑝𝑥 ) of the pointer in the image, as illustrated in Figure 5a. The pointer 3D position is determined using the coordinates of the found circle center, the depth value in the corresponding point in the depth image, and camera intrinsics. 3.5. Object Detection Our system allows the user to specify a target object without using hands or a pointer. It is possi‐ ble by including the object name in the voice com‐ mand. To enable this feature, we implemented an 32
Figure 6. Stages of example scene segmentation process: an input point cloud – from one view (a), estimated table planar (b), point clouds of segmented objects (c), oriented bounding boxes aligned with the input point cloud (d) object detection procedure using a Single Shot Detec‐ tor (SSD) based on the Inception V2 neural network architecture [16]. The SSD was pre‐trained on the MS COCO dataset [15]. To create a training dataset, we compiled a set of 10 objects including a sumo igurine, a probe, a dispenser, and a sponge. We collected and manually labeled 1500 images of these objects with bounding boxes and object classes. An example result of the neural network’s inference is shown in Figure 4. 3.6. Scene Segmentation In the proposed system, the 3D perception module performs scene segmentation by processing a point cloud. The output from this module is a set of cuboids that approximate detected objects. The found Ori‐ ented Bounding Boxes (OBB) represent the objects’ poses and are later used for grasp planning. In the irst stage of processing, we detect the table plane using the RANSAC algorithm – by matching the equa‐ tion of the plane to the data in the point cloud. The resulting plane (Fig. 6b) is removed from the input data to extract points corresponding to the objects. Since the data obtained from a single perspective does not contain information about the complete three‐ dimensional shape of the object, we perform a scan‐ ning procedure to collect and assemble multiple views of the scene. Another solution used in the system is a 3D reconstruction [25], which completes the object point cloud based on the predicted view from its oppo‐ site side. The next step of processing is Euclidean seg‐ mentation (based on the Euclidean distance between points) which inds instances of objects in the com‐ plete point cloud (Fig. 6c). For each cluster represent‐ ing an object, the position is estimated by calculating the centroid. Then we perform Oriented Bounding Box itting to approximate each object. For this purpose, the Principal Component Analysis (PCA) is utilized.
Journal of Automation, Mobile Robotics and Intelligent Systems
M(1)
d1
P
VOLUME 17,
N∘ 3
2023
M(2)
a d2
Figure 7. Identification of the pointed object. The smallest distance value 𝑑𝑖 is associated with the selected object
b
Figure 8. Pointing gesture identified on the RGB image (a) and point cloud visualization (b) with the indicated point on the table (green sphere – point estimated as a line with plane intersection) from the pointing vector: arg min 𝑖𝑑
It inds the three directions of the object point cloud with the highest variance, which determines the OBB orientation. Example bounding boxes matched to the objects’ point cloud are shown in Figure 6d. 3.7. Object Grasping The grasp‐planning method utilizes information about the obtained OBB. The centroid of the object is used as a gripper target position. Orientation of the end‐effector depends on the box dimensions and alignment. If the height of the object’s OBB is smaller than the width and length, the grip is per‐ formed vertically from the top. If the box height is the largest dimension, the robot tries to grasp the object horizontally from the side. The distance between the end‐effector and the TCP depends on the gripper opening, which we estimate based on the OBB dimen‐ sions. While closing the gripper on the object, force feedback is used. The gripper stops the motion when the reaction force exceeds the threshold value. This strategy compensates for the inaccuracies of the per‐ ception system, stabilizes the grasp, and increases the success rate of the grasping method. 3.8. Object Selection In case of multiple objects on the scene, the user needs a method for specifying the item to grasp. Our system offers three options. The irst method uses information about the current gesture and pose of the hand. To determine which object is pointed at by the user’s inger, we utilize the position of two hand landmark points detected in the image [26], named INDEX_FINGER_MCP and INDEX_FINGER_TIP (the origin and tip of the index inger). Based on the depth image and the camera matrix, the position of these points in 3D space is obtained, which allows the determination of the pointing vector 𝑃. In the next step, the origin of the index inger (MCP point) is connected with the object centers to obtain a set of 𝑀 vectors. As illustrated in Figure 7, to determine which object is selected by the operator, it is nec‐ essary to ind the identi ier id of the OBB that is at a minimum distance (less than the threshold value)
𝑃⃗ × 𝑀(𝑖𝑑)⃗ 𝑃⃗
.
(1)
Another way to choose an object for grasping is a pointer (Fig. 5). This method calculates the Euclidean distances between the 3D pointer position and cen‐ troids of objects on the table. The selected object is chosen only if the smallest distance is below the required threshold value. Also, the user can specify the target object by voice command. When the output of the neural object detec‐ tor includes an object with the given name, we ind its bounding box in the image. The frame center is converted to 3D space and compared to every object centroid to ind the closest (selected) one. 3.9. Calculation of the Goal Position The set of possible user actions also includes the computation of the goal position of the object. We can interpret the goal as the target position for the end effector or the grabbed object – it depends on the user’s voice command. To specify the desired position of the object, the operator has two options. The irst method uses the pointer. In this case, the system uti‐ lizes the current pointer location to set the position on the table to place the object or the point in space to move the end effector. Another more intuitive way uses a set of gestures. As mentioned in section 3.3, the robot can pass the grabbed object to the operator’s hand when the rec‐ ognized gesture is ”outreached”. Also, connected with another voice command, this gesture causes the end effector to move to the hand position. Apart from giv‐ ing the item to the hand, it is also possible to indicate with a inger the place to place it on the table as illustrated in Figure 8. For this purpose, we determine the intersection of the index inger’s straight line given by the point 𝑊⃗ and vector 𝑃⃗ with the plane of the table ⃗ given by the normal vector 𝑁⃗ and the point 𝑇. In general, the task is to solve a system of equations: ⎧𝑥 = 𝑃𝑥 ⋅ 𝑡 + 𝑊𝑥 ⎪𝑦 = 𝑃𝑦 ⋅ 𝑡 + 𝑊𝑦 ⎨𝑧 = 𝑃𝑧 ⋅ 𝑡 + 𝑊𝑧 ⎪𝐴(𝑥 − 𝑥 ) + 𝐵(𝑦 − 𝑦 ) + 𝐶(𝑧 − 𝑧 ) = 0 0 0 0 ⎩
(2)
33
Journal of Automation, Mobile Robotics and Intelligent Systems
a
b
VOLUME 17,
N∘ 3
2023
c
Figure 9. Programming the robot using a pointer: user selects a sponge with a pointer (a), the object is grasped and the user chooses the goal position using a pointer (b), the object is moved (c)
a
b
a
b
c
Figure 10. Programming the robot using a mixed approach: user selects a bottle with a gesture (a), the object is grasped and the user chooses the goal position using a pointer (b), the object is moved (c) where 𝑡 is a scalar parameter of 3D line equations (parametric form, irst three equations in eq. (2)), and 𝑇
𝑃⃗ = 𝑃𝑥 𝑃𝑦 𝑃𝑧 – a vector lying on a line, 𝑇 𝑊⃗ = 𝑊𝑥 𝑊𝑦 𝑊𝑧 – a point belonging to a line, 𝑇 𝑁⃗ = [𝐴 𝐵 𝐶] – a plane normal vector, 𝑇 𝑇⃗ = [𝑥0 𝑦0 𝑧0 ] – a point belonging to a plane, [𝑥 𝑦 𝑧]
𝑇
– a searched line with plane intersection point.
In our case, the task is simpli ied. The reference frame is placed on the robot’s footprint. A table plane normal vector 𝑁⃗ is [0 0 1] and a point belonging to the table plane 𝑇⃗ can be [0 0 ℎ𝑡 ], where ℎ𝑡 is a table height (we estimate it from the point cloud). The 𝑃⃗ vector is calculated as a difference between position vectors of inger TIP and MCP points. The 𝑊⃗ vector is a TIP or MCP point. The last equation simpli ies to: 𝑧 = ℎ𝑡
(3)
c Figure 11. Example robot camera image (left column), point cloud visualization (middle column), and side view (right column) during the grasping experiment: user selects the object with his finger (a), the object is grasped (b), the object is passed to detected hand (c) before grasping has to be linear. The MoveIt package is also used to prevent the robot from self‐collisions and from collisions with the table (we represent the obstacles with Octomap). During programming, the motions are executed on the ly and stored in memory. If any system compo‐ nent returns an error, the robot does not perform any action, and the command has to be repeated correctly. The user is supported by a visualization on the screen that allows him to validate the issued commands and gestures. After the operator says the execute program command, saved motions can be read and used in repetitive tasks. When the operator wants to program a new motion sequence, he can delete the current program with the clear program command. A de ined robot program consists of a list of movements and high‐level tasks, e.g., moving to the initial pose or grasping an object of a given category.
so we can determine 𝑡 as: ℎ𝑡 − 𝑊𝑧 𝑡= 𝑃𝑧
4. Results (4)
and substituting to the irst two equations calculate 𝑥 and 𝑦 – the coordinates of the intersection point: ℎ −𝑊
𝑡 𝑧 ⎧𝑥 = 𝑃𝑥 ⋅ 𝑃𝑧 + 𝑊𝑥 ℎ −𝑊 𝑦 = 𝑃𝑦 ⋅ 𝑡 𝑧 + 𝑊𝑦 𝑃𝑧 ⎨ ⎩𝑧 = ℎ𝑡
(5)
3.10. Robot Program and Motion Planning The goal of the previously described modules is to de ine the robot’s motions. For motion planning, we use the MoveIt package. The robot executes joint‐ space or Cartesian‐space trajectories depending on the task requirements, for example, the last motion 34
The capabilities of the proposed robot program‐ ming interface were tested in experiments divided into two groups. The irst group of experiments was designed to check the time of programming motion sequences of the robotic arm, in which the robot per‐ forms simple motions and gripper operations (voice command sequences 1–3). For these trajectories, we compared the proposed intuitive interface with the standard programming methods using the operator panel (teach pendant) and programming by mov‐ ing the robot manually (teaching by doing – TbD). The goal of experiments in the second group was to review the performance of the user interaction mod‐ ules during programming higher‐level tasks. These tasks include picking and placing a selected object in the speci ied position on the work table, as well as
Journal of Automation, Mobile Robotics and Intelligent Systems
a
b
c Figure 12. Example robot camera image (left column), point cloud visualization (middle column), and side view (right column) during the grasping experiment: user selects the object with his finger (a), the object is grasped (b), the object is released in the place indicated with a finger (c) grasping and passing an item to the user’s hand (voice command sequences 4–8). In this paper, we repeat the experiments performed in previous works [11–13] to test the robot programming interface ef iciency after making changes in the speech recognition module. We also added another test case that validates the indicating place on the table with a inger. The following voice command sequences were used during tests: 1–3. various length combination of commands: save initial con iguration, go there, open/close the gripper, save robot con iguration, save trajectory, execute program 4. grasp (with pointer), put down (with pointer), go home 5. grasp + name (with object detection), put down, go home 6. grasp (with gesture), give to hand, go home 7. grasp (with gesture), put down (with gesture), go home 8. grasp (with gesture), put down (with pointer), go home During the experiments, we measured the pro‐ gramming time of each trajectory for different pro‐ gramming techniques: the standard methods and using voice commands in combination with a pointer, object detection, or gestures. The results are shown in Table 1. For safety reasons, the robot’s speed during the tests was limited to 50%. Compared to our previous works [11–13] time of programming in each of the proposed methods has decreased. The main reason is the recognition of voice commands in the user’s native language. During the experiments, we got the following results: 355/500
VOLUME 17,
N∘ 3
2023
(71%) correct conversions in English, and 479/500 (96%) in Polish (native). For successful speech‐to‐ text processing, the orders have to be pronounced clearly and with the correct accent. Speaking in the native language signi icantly increases the effective‐ ness of speech recognition, reduces the need to repeat commands, and thus shortens the robot programming time. In the utilized module, the language is easily con‐ igurable and can be changed even when the system is running. The ef iciency of verbal communication with the robot is also improved by audio feedback. After speech‐to‐text processing, the output text is converted back to audio and played. It ensures that the user’s command has been correctly recognized and informs him when the order is accepted. However, speech recognition has a signi icant impact on extending programming time. The mean time of recording, converting to text, and playing back the command during our experiments is about 6 seconds. This time strongly depends on the length of the command, the recorded audio signal quality, and the internet connection. The ef iciency of executing the programmed manipulation task heavily depends on the grasping algorithm. The point‐cloud‐based method used in our system [14] is not able to plan a successful grasp for small objects or those with complicated geometry. For such items (plastic shaft, ring lying down) the success rate is 7.5%. Although, the algorithm performs well for many other objects (e.g. sponge, dispenser bottle, sumo igurine, tape, ring (standing vertically), and I‐shape part). For this set of objects, the success rate is 76%. Another important metric that characterizes our system is the accuracy of object recognition. The SSD object detector achieves 76.9% mAP [16]. As we can see in Table 1, programming simple trajectories with a pointer and voice commands can be faster than standard programming methods with the teaching pendant but is still not as fast as the teach‐ ing by doing method. The advantage of the proposed interface is the increased intuitiveness and possibility to program higher‐level tasks. The user can use the pointer to select the object to pick and indicate its place position, as illustrated in Figure 9. Example experiments verifying interaction with the robot using only gestures are shown in Figures 11 and 12. In this test case, the robot passes the selected object to the user’s hand (command sequence no. 6) or places chosen item in the indicated position (com‐ mand sequence no. 7). Based on the results shown in Table 1, it can be seen that programming the robot using gestures is slightly more time‐consuming than using the pointer. Detecting the hand and calculating its position takes an average of 56 ms (for the pointer it is 36 ms [11]). The main advantage of this type of interaction with the robot is that there is no need to use additional hardware (pointer, marker). When programming the robot using the visualization on the computer screen, the user receives feedback on the current interpretation of his gestures and voice com‐ mands, which offsets the inconvenience associated with the delay in the system. 35
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Table 1. Comparison of programming time [s] of example motion sequences using six programming methods. The symbols 𝜇 and 𝜎 denote mean and standard deviation values calculated for 10 measurements sequence no.
1
2
3
4
5
6
7
number of commands
10
12
16
3
3
3
3
measurement
programming time [s]
𝜇
𝜇
𝜎
𝜇
𝜎
𝜇
𝜎
𝜇
𝜎
𝜇
𝜎
teach pendant 162.0 16.7 184.3 15.3 235.3 15.9
–
–
–
–
–
–
–
–
–
–
TbD 100.0 12.3 113.4 9.9 155.4 12.5
–
–
–
–
–
–
–
–
–
–
pointer and voice cmd. [11] 133.2 7.3 141.8 8.0 188.8 8.8 59.3 1.6
–
–
–
–
–
–
–
–
–
–
–
–
–
–
62.0 4.5 63.8 5.9
–
–
–
𝜎
–
𝜇
–
–
–
gestures and voice cmd. [13] 143.2 7.8 151.5 8.9 206.5 9.3
–
–
object detection and voice cmd. [11]
–
𝜇
3
𝜎
–
𝜎
8
–
68.4 4.4 –
–
pointer, object detection, 133.2 7.3 141.8 8.0 188.8 8.8 59.3 1.6 68.4 4.4 62.0 4.5 63.8 5.9 73.6 5.7 gestures and voice cmd.
During the tests, the performance of the interface integrating the different interaction modules was also checked. We performed an experiment illustrated in Figure 10, in which the user selects an object to grasp using a gesture and then uses a pointer to determine the place on the table where the object should be put down (commands sequence no. 8). We observe that the system using multiple modules for human‐robot interaction at the same time runs slower because it requires more computing resources. However, this type of solution offers the greatest lexibility. For example, it allows the selection of an object of any type, unlike a system with object detection, which only lets the user choose items that were in the training set. Only the integrated system presented in the article allows all de ined tasks to be programmed.
5. Conclusion This article proposes a system that uses voice commands, gestures, and 3D perception to program a robot to perform complex tasks. The advantage of the designed system is the capability to easily program tasks related to grasping and moving objects, which is not possible with standard methods (teach pendant, teaching by doing). In addition, an increase in the intuitiveness of the robot’s operation was achieved through the use of gesture recognition combined with voice commands. The integrated robot interaction modules allow the robot to be programmed not only with gestures but also with a pointer. In the future, we plan to increase the number of recognized gestures to better understand the user’s intentions and improve human‐robot interaction. AUTHOR Bartłomiej Kulecki – Poznan University of Technology UL. Piotrowo 3A, 60‐965 Poznań, Poland, e‐mail: bartlomiej.kulecki@put.poznan.pl, www.put.poznan.pl.
ACKNOWLEDGEMENTS This work was supported by the National Centre for Research and Development (NCBR) through project LIDER/33/0176/L‐8/16/NCBR/2017. 36
References [1] R. Adamini, N. Antonini, A. Borboni, S. Medici, C. Nuzzi, R. Pagani, A. Pezzaioli, and C. Tonola. “User‐friendly human‐robot interaction based on voice commands and visual systems,” 2021 24th International Conference on Mechatronics Technology (ICMT), 2021, pp. 1–5. [2] A. Bochkovskiy, C. Wang, and H.M. Liao. “YOLOv4: Optimal speed and accuracy of object detection,” CoRR, vol. abs/2004.10934, 2020. [3] M. Chen, C. Liu, and G. Du. “A human–robot inter‐ face for mobile manipulator,” Intelligent Service Robotics, vol. 11, no. 3, 2018, pp. 269–278; doi: 10.1007/s11370‐018‐0251‐3. [4] P. de la Puente, D. Fischinger, M. Bajones, D. Wolf, and M. Vincze. “Grasping objects from the loor in assistive robotics: Real world implications and lessons learned,” IEEE Access, vol. 7, 2019, pp. 123725–123735. [5] W. Dudek and T. Winiarski. “Scheduling of a robot’s tasks with the tasker framework,” IEEE Access, vol. 8, 2020, pp. 161449–161471. [6] V. Dutta, and T. Zielińska. “Predicting human actions taking into account object affordances,” Journal of Intelligent & Robotic Systems, vol. 93, 2019, pp. 745–761. [7] V. Dutta, and T. Zielińska. “Prognosing human activity using actions forecast and structured database,” IEEE Access, vol. 8, 2020, pp. 6098– 6116. [8] Q. Gao, J. Liu, Z. Ju, and X. Zhang. “Dual‐hand detection for human–robot interaction by a par‐ allel network based on hand detection and body pose estimation,” IEEE Transactions on Industrial Electronics, vol. 66, no. 12, 2019, pp. 9663–9672. [9] M. Gualtieri, J. Kuczynski, A.M. Shultz, A. Ten Pas, R. Platt, and H. Yanco. “Open world assistive grasping using laser selection,” 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017, pp. 4052–4057. [10] K. He, G. Gkioxari, P. Dollár, and R. Girshick. “Mask R‐CNN,” 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980– 2988.
Journal of Automation, Mobile Robotics and Intelligent Systems
[11] B. Kulecki. “Intuitive robot programming and interaction using RGB‐D perception and CNN‐based objects detection,” Automation 2022: New Solutions and Technologies for Automation, Robotics and Measurement Techniques, R. Szewczyk, C. Zieliński, and M. Kaliczyńska, eds., Cham, 2022, pp. 233–243. [12] B. Kulecki, and D. Belter. “Intuicyjny interfejs programowania robota z neuronowymi mod‐ ułami do interpretacji sceny,” Postępy robotyki. T.2, C. Zieliński and A. Mazur, eds., Warszawa, 2022, pp. 89–98. [13] B. Kulecki, and D. Belter. “Robot programming interface with a neural scene understanding sys‐ tem,” Proceedings of the 3rd Polish Conference on Arti icial Intelligence, April 25–27, 2022, Gdynia, Poland, 2022, pp. 130–133. [14] B. Kulecki, K. Młodzikowski, R. Staszak, and D. Belter. “Practical aspects of detection and grasping objects by a mobile manipulating robot,” Industrial Robot, vol. 48, no. 5, 2021, pp. 688–699. [15] T.‐Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C.L. Zitnick. “Microsoft COCO: Common objects in context,” Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds., Cham, 2014, pp. 740–755. [16] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.‐Y. Fu, and A.C. Berg. “SSD: Single shot multibox detector,” Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, eds., Cham, 2016, pp. 21–37. [17] X. Lv, M. Zhang, and H. Li. “Robot control based on voice command,” 2008 IEEE International Conference on Automation and Logistics, 2008, pp. 2490–2494. [18] I. Maurtua, I. Fernández, A. Tellaeche, J. Kildal, L. Susperregi, A. Ibarguren, and B. Sierra. “Nat‐ ural multimodal communication for human– robot collaboration,” International Journal of Advanced Robotic Systems, vol. 14, no. 4, 2017; doi: 10.1177/1729881417716043.
VOLUME 17,
N∘ 3
2023
[19] O. Mazhar, S. Ramdani, B. Navarro, R. Passama, and A. Cherubini. “Towards real‐time physical human‐robot interaction using skeleton infor‐ mation and hand gestures,” IEEE/RSJ Int. Conf. on Int. Robots and Systems (IROS), 2018, pp. 1–6. [20] C. Nuzzi, S. Pasinetti, M. Lancini, F. Docchio, and G. Sansoni. “Deep learning‐based hand ges‐ ture recognition for collaborative robots,” IEEE Instrumentation Measurement Magazine, vol. 22, no. 2, 2019, pp. 44–51. [21] K.‐B. Park, S. H. Choi, J. Y. Lee, Y. Ghasemi, M. Mohammed, and H. Jeong. “Hands‐free human–robot interaction using multimodal gestures and deep learning in wearable mixed reality,” IEEE Access, vol. 9, 2021, pp. 55448– 55464. [22] P. Putthapipat, C. Woralert, and P. Sirinimnu‐ ankul. “Speech recognition gateway for home automation on open platform,” 2018 International Conference on Electronics, Information, and Communication (ICEIC), 2018, pp. 1–4. [23] S. Sharan, T. Q. Nguyen, P. Nauth, and R. Araujo. “Implementation and testing of voice control in a mobile robot for navigation,” 2019 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), 2019, pp. 145–150. [24] M. Simão, P. Neto, and O. Gibaru. “Natural con‐ trol of an industrial robot using hand gesture recognition with neural networks,” IECON 2016 – 42nd Annual Conference of the IEEE Industrial Electronics Society, 2016, pp. 5322–5327. [25] R. Staszak, B. Kulecki, W. Sempruch, and D. Bel‐ ter. “What’s on the other side? A single‐view 3D scene reconstruction,” 2022 17th International Conference on Control, Automation, Robotics and Vision (ICARCV), 2022, pp. 173–180. [26] F. Zhang, V. Bazarevsky, A. Vakunov, A. Tkachenka, G. Sung, C.‐L. Chang, and M. Grundmann. “Mediapipe hands: On‐device real‐time hand tracking,” arXiv e-prints, Jun 2020; doi: 10.48550/arXiv.2006.10214.
37
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
SINGULARITY‐ROBUST INVERSE KINEMATICS FOR SERIAL MANIPULATORS Submitted: 4th December 2022; accepted: 21st April 2023
Ignacy Dulęba DOI: 10.14313/JAMRIS/3‐2023/21 Abstract: This paper is a practical guideline on how to analyze and evaluate the literature algorithms of singularity‐ robust inverse kinematics or to construct new ones. Addi‐ tive, multiplicative, and based on the Singularity Value Decomposition (SVD) methods are examined to retrieve well‐conditioning of a matrix to be inverted in the Newton algorithm of inverse kinematics. It is shown that singu‐ larity avoidance can be performed in two different, but equivalent, ways: either via properly modified manipula‐ bility matrix or not allowing the decrease of the minimal singular value below a given threshold. It is discussed which method can always be used and which can only be used when some pre‐conditions are met. Selected methods are compared to with respect to the efficiency of coping with singularities based on a theoretical ana‐ lysis as well as simulation results. Also, some questions important for mathematically and/or practically oriented roboticians are stated and answered. Keywords: Serial manipulator, Forward kinematics, Jacobi matrix, Inverse kinematics, Singularities, Robustness, Evaluation
1. Introduction Robustness in robotics has got different meanings. Singularity‐robust algorithms (methods) of inverse kinematics are aimed at coping with singular con‐ igurations without switching a model that works in a regular case into a much more complicated singu‐ lar one. As problems with singularities appear only locally, so a slightly modi ied regular algorithm is able to generate a trajectory passing through a singular con iguration. However, an inevitable decrease of a desired end‐effector path tracking quality occurs. Inverse kinematics is probably the most frequently solved task in robotics. The task arises when forward kinematics is de ined for serial manipulators [13] or instantaneous kinematics for nonholonomic robots [12] and a con iguration or a trajectory is to be found which corresponds to a given point/trajectory in a task‐space. Only for a very few simple robots, the task can be solved analytically. Usually, it is solved using the numerical Newton algorithm [11] which guaran‐ tees to obtain a solution when no singular con ig‐ urations are encountered while generating consec‐ utive con igurations. At a singular con iguration the 38
algorithm becomes badly conditioned and, temporar‐ ily, special measures need to be taken to retrieve its well‐conditioning. The simplest, and therefore the most frequently used, method to cope with this prob‐ lem is to apply a robust version of the Newton algo‐ rithm. Around a current singular con iguration, the method temporarily modi ies a badly conditioned matrix as long as a neighborhood of the singularity is left. In the robotic literature there are many methods of motion planning through singularities. In [16] Var‐ gas et al. described a singularity approaching as a dynamic estimation of the inverse of the Jacobi matrix, arguing that their method requires less numbers of parameters to ix than other methods and avoids an explicit matrix inversion. Various methods of approx‐ imating inverse matrices without their explicit inver‐ sion are provided in [3]. Sun et al. [14] used a SVD‐ based method to optimize a cost function composed of tracking accuracy and velocity dumping terms with the goal to eliminate a terminal error after pass‐ ing a singularity. The channel algorithm proposed by Duleba in [4] tries to jump through a singular con ig‐ uration extrapolating behavior of a trajectory when the singular con iguration was approached. All the discussed methods are based on the Jacobi, matrix which is just a linear term in the Taylor expansion of kinematics around a current con iguration. Thus, the methods can be classi ied as irst‐order. Recently, Lloyd et al. [9] argued that second‐order methods based also on the Hessian matrix of kinematics also have some advantages from a numerical point of view as an increase of the computational complexity in a single iteration can be compensated by the smaller number of iterations to complete solving the inverse kinematic task. There are also more mathematically advanced methods of motion planning through sin‐ gularities. The normal form method [15] transforms original kinematics around a singular con iguration into its particularly simple, normal form, solves the planning task in the new coordinates and moves back the solution into original coordinates. Unfortunately, this method is computationally demanding. Neverthe‐ less, it allows one to trace a desired path with a high accuracy [15]. Depending on a manipulator redundancy, robust inverse kinematics methods modify either a Jacobi matrix (non‐redundant case) or a manipulability matrix (the redundant case). In some practical situa‐ tions any (sometimes random) modi ication of a badly conditioned matrix retrieves its well‐conditioning.
2023 © Dulęba. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems
However, only provably good solutions have got a theoretical value as they work also in the most demanding circumstances which can appear in practice. Retrieving the well‐conditioning of an inverted matrix can be realized in two ways: either an additive one via adding to the matrix a designed matrix, or a multiplicative one via multiplying the Jacobi matrix by a properly designed matrix. The paper, being a substantially extended version of the conference paper [2], is organized as follows. In Section 2, preliminaries concerning the inverse kinematic task are recalled together with the Newton algorithm used to solve the task. In Section 3, a mathe‐ matical framework for designing robust additive or multiplicative matrix inversions was presented. In this section, the approach based on the Singular Value Decomposition (SVD) algorithm was examined. In this section, the geometrical interpretation is given based on how locally singularity‐robust methods can impact possible directions of motion in a task‐space and why new numeric problems can arise. Section 4 is devoted to the multi‐criteria evaluation of the proposed techniques. In the simulation Section 5, based on the evaluation methodology developed, a standard (the literature based) method of singularity‐ robust inverse kinematics is compared to a method that increases minimal singular values of a Jacobi matrix. In Section 6, auxiliary issues are discussed related to the singularity‐robust inverse kinematics yet outside the main scope of the paper. Answers for two questions important for mathematically oriented roboticians are given and two tips for practically oriented designers of algorithms are provided. Section 7 concludes the paper.
2. Inverse Kinematic Task and its Solution Forward kinematics assigns to a con iguration 𝑞 = 𝑞) (𝑞1 , … , 𝑞𝑛 )𝑇 from a con iguration space ℚ a point 𝑘 (𝑞 in a task space 𝕏 𝑞 ) ∈ 𝕏, 𝑘 ∶ ℚ ∋ 𝑞 → 𝑘 (𝑞
dim 𝑄 = 𝑛, dim 𝑋 = 𝑚. (1) A redundancy is described by the number 𝑟 = 𝑛 − 𝑚 and for redundant/non‐redundant manipulators it takes the value of 𝑟 > 0, 𝑟 = 0, respectively. A basic (point) inverse kinematic task is de ined as follows: for a given point 𝑥 𝑓 in a task‐space ind such a con‐ 𝑞 ⋆ ) = 𝑥 𝑓 . An analytic solution iguration 𝑞 ⋆ , that 𝑘 (𝑞 of the inverse kinematic task is possible only for a very few, simple manipulators. Therefore instead of 𝑞 , 𝑥 ) the search solving the task in positional spaces (𝑞 𝑞 𝑥 ̇ ̇ is moved into velocity spaces ( , ) related with the Jacobi matrix 𝑘 (𝑞 𝑞) 𝜕𝑘 𝑞 )𝑞̇ = 𝑥̇ = 𝑞̇ = 𝐽 (𝑞 𝑞 𝜕𝑞
𝑛
VOLUME 17,
N∘ 3
2023
where 𝑞 𝑖 is a con iguration in the 𝑖‐th iteration and 𝜉𝑖 is a positive coef icient that impacts the speed of convergence of the algorithm. An initial con iguration 𝑞 0 is known and selected by a user. Depending on the redundancy, the matrix 𝐽 $ is expressed as 𝐽 $ ∶=
𝐽 −1 𝑀 )−1 𝐽 # = 𝐽 𝑇 (𝑀
for non‐redundant, for redundant
(4)
manipulators, where the Moore‐Penrose pseudo‐ inverse matrix 𝐽 # is de ined using a symmetric, posi‐ tive semi‐de inite manipulability matrix [13] 𝑞 ) ∶= 𝐽 (𝑞 𝑞 ) 𝐽 (𝑞 𝑞 )𝑇 . 𝑀 (𝑞
(5)
The algorithm (3) stops its run successfully when matrices (4) are of the full‐rank in all iterations until a goal point 𝑥 𝑓 is reached with a prescribed accuracy 𝛿, i.e., 𝑥 𝑓 − 𝑘 (𝑞 𝑞 𝑖 )‖ < 𝛿. ‖𝑥 The main point of interest in this paper is to analyze the Newton algorithm at and around singular con ig‐ urations where the Jacobi matrix 𝐽 drops its maxi‐ mal possible rank and 𝐽 $ becomes ill‐conditioned. The rank decrease is described by a corank de ined as follows: 𝑞 )) ∶= 𝑐(𝑞 𝑞 ) = 𝑚 − rank(𝐽𝐽 (𝑞 𝑞 )). corank(𝐽𝐽 (𝑞
(6)
3. Robust Matrix Inversions 3.1. Additive Robust Inverse Let 𝐴 , 𝑀 denote symmetric, square matrices. 𝐴 is a designed positively‐de inite matrix, 𝐴 > 0, while 𝑀 is a positive semi‐de inite, manipulability matrix (5) 𝑀 ≥ 0, which at singular con igurations is degener‐ 𝑀 ) = 0. ated, i.e. rank(𝐽𝐽 ) < 𝑚, or, alternatively, det(𝑀 The key inequality used in designing robust inverse matrices is the following: 𝐴 + 𝑀 ) > det(𝐴 𝐴 ) + det(𝑀 𝑀 ), det(𝐴
(7)
and it holds if only 𝑀 ≠ 0 . Proof (based on the Grossmann’s idea [6]): to sim‐ plify notations 𝐵 ∶= 𝐴 −1/2 is de ined. Using properties of determinants and symmetric matrices one gets 𝐴 + 𝑀 ) = det(𝐴 𝐴 (𝐼𝐼 + 𝐴 −1𝑀 )) = det(𝐴 𝐴 ) det(𝐵 𝐵 𝐼 𝐵 −1 + 𝐵 𝐵 𝑀 𝐵 𝐵 −1 ) = det(𝐴 𝐴 ) det(𝐵 𝐵 (𝐼𝐼 + 𝐵 𝑀 𝐵 )𝐵 𝐵 −1 ) = det(𝐴 𝐴 ) det(𝐵 𝐵 ) det(𝐼𝐼 + 𝐵 𝑀 𝐵 ) det(𝐵 𝐵 −1 ) = det(𝐴 𝐴 ) det(𝐼𝐼 + 𝐵 𝑀 𝐵 ) > det(𝐴 𝐴 )(det(𝐼𝐼 ) + det(𝐵 𝐵 𝑀 𝐵 )) = det(𝐴
(8)
𝐴 ) + det(𝑀 𝑀 ) > 0. = det(𝐴 𝑞 )𝑞̇ 𝑧 , 𝐽 ⋆,𝑧 (𝑞
(2)
𝑧=1
𝑞 ) is the 𝑧‐th column of the Jacobi matrix where 𝐽 ⋆,𝑧 (𝑞 and 𝑞̇ 𝑧 ∶= 𝑑𝑞𝑧 /𝑑𝑡. Then, the Newton algorithm is applied given by the iterative scheme [11] 𝑞 𝑖 )$ (𝑥 𝑥 𝑓 − 𝑘 (𝑞 𝑞 𝑖 )), 𝑞 𝑖+1 = 𝑞 𝑖 + 𝜉𝑖 ⋅ 𝐽 (𝑞
(3)
In the chain of transformations (8), the only non‐ obvious transformation is the irst strong inequality occurrence. The matrix 𝐵 𝑀 𝐵 ≠ 0 is symmetric, pos‐ itive semi‐de inite with real and non‐negative eigen‐ values (singular values) collected in the vector 𝑆 = (𝜎1 , … , 𝜎𝑚 )𝑇 , 𝜎𝑖 ≥ 0, 𝑖 = 1, … , 𝑚, not all equal to zero 39
Journal of Automation, Mobile Robotics and Intelligent Systems
(at least 𝜎1 > 0). This matrix can be expressed in the form 𝑆) 𝑈 𝑇 , 𝐵 𝑀 𝐵 = 𝑈 diag(𝑆 (9) with a non‐singular rotation matrix 𝑈 ∈ 𝕊𝕆(𝑚). The identity matrix can be expressed as 𝐼 = 𝑈𝑈 𝑇 , and eigen‐(singular)values of positive semi‐de inite 1) + 𝑆 . Finally, matrix 𝐼 + 𝐵 𝑀 𝐵 are equal to diag(1 using properties of determinants of similar matrices and eigenvalues of matrices one gets 𝑚
det(𝐼𝐼 + 𝐵 𝑀 𝐵 ) = 𝑚
+
𝑚
(𝜎𝑖 + 1) = 1 + 𝑖=1
𝑚
𝑚
𝜎𝑖 𝜎𝑗 + 𝑖,𝑗=1,𝑖>𝑗
𝜎𝑖 𝜎𝑗 𝜎𝑘 + … +
𝑖,𝑗,𝑘=1,𝑖>𝑗>𝑘
𝑚
𝑖=1
𝜎𝑖 = 𝑖=1
𝐵 𝑀 𝐵 ). 𝜎𝑖 = det(𝐼𝐼 ) + det(𝐵 𝑖=1
(10) The inequality (7) guarantees positive‐de initeness of the sum of matrices when only a positive‐de inite matrix is added to the manipulability matrix. This way generalized inverse matrix (4) can be applied. Appar‐ ently, the additive disturbance should be small not to cause inaccuracies in tracking a desired path towards the goal point. A very general method to generate the disturbing matrix with desired properties is to take any (𝑚 × 𝑛) full‐rank matrix 𝐵 and obtain always positive‐de inite matrix 𝐵𝐵 𝑇 . The simplest possible choice, and frequently encountered in the robotic lit‐ erature [11], takes the form 𝐴 ∶= 𝜆 𝐼 𝑚 ,
(11)
with a small positive design parameter 𝜆 and the (𝑚 × 𝑚) identity matrix 𝐼 𝑚 [13]. However, the drawback of (11) is the identical handling of all coordinates and a weak relationship with kinematics for which the disturbance is applied. Another natural disturbance de inition proposed below does not suffer from the aforementioned drawback (11) 𝐴 ∶= 𝜆 ⋅ diag(⟨𝐽𝐽 1,⋆ , 𝐽 1,⋆ ⟩, … , ⟨𝐽𝐽 𝑚,⋆ , 𝐽 𝑚,⋆ ⟩)
(12)
where ⟨⋅, ⋅⟩ denotes a dot product and 𝐽 𝑖,⋆ is the 𝑖‐th row of the Jacobi matrix. The expression (12) depends on kinematics and potentially increases items on the main diagonal of the manipulability matrix. Unfor‐ tunately, for any model of kinematics there is no guarantee that all rows of the the Jacobi matrix have got a non‐zero length. To avoid this extremely rare but still possible case, the always positive kinematic‐ dependent disturbance function can be de ined as
It is worth mentioning that Equation (7) offers more than what is necessary to retrieve the well‐ conditioning of the inverted matrix 𝐴 + 𝑀 ) > det(𝐴 𝐴 ) + det(𝑀 𝑀 ) ≥ det(𝐴 𝐴 ) > 0. (15) det(𝐴 It appears that at singular con igurations the badly conditioned matrix 𝑀 actively supports avoiding sin‐ gularities because from Equations (8), (10), and (15) it follows that (16)
In the multiplicative case, the robust inversion of the manipulability matrix is searched for by disturb‐ ing factors of the matrix rather than by modifying the matrix itself. The irst possibility is to modify the matrix 𝐽 , by adding the (𝑚 × 𝑛) matrix Ψ to get the well‐conditioned manipulability matrix (𝐽𝐽 + Ψ )(𝐽𝐽 + Ψ )𝑇 = 𝐽𝐽 𝑇 + 𝐽Ψ 𝑇 + (𝐽𝐽Ψ 𝑇 )𝑇 + ΨΨ 𝑇 . (17) The general inequality (7) does not work directly for (17) as there are four items instead of two. The irst term is badly conditioned at singular con igurations while the last term seems to be a good candidate for a positive‐de inite matrix if only Ψ is non‐singular, i.e., Ψ) = 𝑚). So, two mid‐terms should be elimi‐ rank(Ψ nated either by zeroing their sum or one of them. The second condition seems to be easier to satisfy but it is stronger than the irst one. The condition 𝐽 Ψ𝑇 = 0𝑚
(18)
means that rows of Ψ are perpendicular to the rows of 𝐽 , so they belong to the null‐space of the Jacobi matrix. The dimension of the null space at regular (non‐singular) con igurations is equal to 𝑟 = 𝑛 − 𝑚 and in typical singular con igurations it is increased by 𝑐 ∈ {1, 2}, cf. Equation (6). Thus, the rank of the matrix Ψ is upper‐bounded by 𝑐 + 𝑟. On the other hand, it was Ψ) = 𝑚. In the vast majority of assumed that rank(Ψ practical cases, the condition 𝑟 + 𝑐 − 𝑚 ≥ 0.
(19)
cannot be satis ied (example: 𝑛 = 4, 𝑚 = 3, 𝑐 = 1). Consequently, this multiplicative method of making the manipulability matrix robust seems to lack gener‐ ality. 3.3. Robust Inverse Using SVD
𝐴 ∶= 𝜆 ⋅ diag(𝑓𝜖 (𝐽𝐽 1,⋆ ), … , 𝑓𝜖 (𝐽𝐽 𝑚,⋆ )),
(13)
𝑓𝜖 (𝐽𝐽 𝑖,⋆ ) ∶= max{⟨𝐽𝐽 𝑖,⋆ , 𝐽 𝑖,⋆ ⟩, 𝜖}
(14)
Another version of multiplicative modi ication of the manipulability matrix extensively uses the Sin‐ gular Value Decomposition algorithm [5] introduced in robotics by Maciejewski and Klein [10]. The SVD decomposes the Jacobi matrix into a product of three matrices 𝐷 , 0 𝑚,𝑟 ] ⋅ 𝑉 𝑇 , 𝐽 = 𝑈 ⋅ [𝐷 (20)
with a small, positive design parameter 𝜖. The robust inversion (13) shares advantages of (11), (12) while avoiding their drawbacks.
where 0 𝑚,𝑟 denotes a (𝑚 × 𝑟) matrix composed of zeroes only, 𝑈 ∈ 𝕊𝕆(𝑚) and 𝑉 ∈ 𝕊𝕆(𝑛) are rotation matrices in 𝑚 and 𝑛 dimensional spaces, respectively,
where
40
2023
3.2. Multiplicative Robust Inverse
𝑚
𝜎 ) > 1+ 𝜎𝑖 +𝑅(𝜎
1+
N∘ 3
𝐴 + 𝑀 ) − det(𝐴 𝐴 ) = det(𝐴 𝐴 )𝑅(𝜎 𝜎 ) > 0. det(𝐴
𝜎𝑖 +
𝑖=1
VOLUME 17,
Journal of Automation, Mobile Robotics and Intelligent Systems
and a (𝑚 × 𝑚) diagonal matrix 𝐷 = diag(𝑑𝑖 ), 𝑖 = 1, … , 𝑚, collects non‐negative singular values ordered in non‐ascending order 𝑑𝑖 ≥ 𝑑𝑗 when 𝑖 < 𝑗. A clear advantage of the decomposition (20) is that the pseudo‐inverse matrix at regular (i.e., non‐singular) con igurations can be obtained immediately as fol‐ lows: 𝐷 −1 𝐽# = 𝑉 ⋅ ⋅ 𝑈𝑇, (21) 0 𝑟,𝑚 where 𝐷 −1 = diag(1/𝑑𝑖 ), 𝑖 = 1, … , 𝑚. From a numerical perspective, the detection of sin‐ gularities requires dropping any of the singular values below a given design parameter 𝑑min , describing a safety margin from singularities. Therefore, a method to avoid singularities should modify those singular values that do not exceed the threshold value. In this method 𝐷 −1 in Equation (21) is replaced with −1
∶= diag(1/𝑑̃ 𝑖 ), 𝑖 = 1, … , 𝑚, where 𝑑̃ 𝑖 ∶= max(𝑑𝑖 , 𝑑min ).
𝐷̃
3.4. Singularity‐Robust Inverse Kinematics — A Geometrical Interpretation The main idea behind almost all singularity‐robust inverse kinematics algorithms is the same: to modify either a Jacobi matrix 𝐽 or a manipulability matrix 𝑀 to retrieve invertability of the manipulability matrix. However, the modi ication may disturb a motion towards a current sub‐goal 𝑥 𝑓 initialized at a current 𝑞 𝑐 ) especially when point in a task‐space 𝑥 𝑐 = 𝑘 (𝑞 the modi ication is not small. In Figure 1, possible motion scenarios are depicted. At regular con igu‐ rations 𝑞 𝑐 , the motion generated with the Newton algorithm (3) (at least in initesimally) shifts the point A2 B2
A3
B3
𝑥𝑐
A1
2023
𝑥 𝑐 directly towards the goal (A1) and there are no problems with the convergence of the algorithm even when the Jacobi matrix is not frequently updated as 𝑞 𝑐 is iterated. When modi ications are relatively big (as it happens at or around singular con igurations), possible directions of motion can be quite different (A2‐A6). Some of them (A3‐A6), even in initesimally, do not guarantee the decrease of a tracking error (so the error is inevitable). Some others (A2‐A3) allow the movement of the current point 𝑥 𝑐 towards the goal but with a carefully selected (optimized) value of the positive coef icient 𝜉𝑖 , cf. Equation (3), which is not to move further than points (B2‐B3). In this case, the number of iterations to obtain a vicinity of the goal point 𝑥 𝑓 with an assumed accuracy 𝛿 increases. Obviously, when 𝛿 is small and the absolute value of an angle between a current direction of motion and the vector 𝑥 𝑐𝑥 𝑓 is close to 90∘ , but does not exceed it, the number of iterations to converge increases even more.
4. Evaluation of Robust Matrix Inversion Methods In order to compare methods of coping with sin‐ gularities via robust matrix inversions, some criteria should be proposed. The most important four factors are listed below: 1) How big is a matrix modi ication? 2) How far is it allowed to move away from singulari‐ ties? 3) What is the computational complexity of the method? 4) Can the aforementioned characteristics be com‐ puted analytically or numerically only? QN. 1) The answer for the irst question seems to be obvious: a matrix distance between the original manipulability matrix and the modi ied one should be calculated using a selected matrix norm. How‐ ever, the irst method is based on modifying the manipulability matrix (5) by adding to it either the matrix (11) or (13), while the second method (22) modi ies singular values of the Jacobi matrix. The methods based on modi ication (11) and (22) are comparable because decomposition (20) allows the expression of the manipulability matrix (5) in terms of singular values of 𝐽 as follows: 𝐷 , 0 𝑚,𝑟 ] 𝑉 𝑇 𝑉 𝑀 = 𝑈 [𝐷
A6
N∘ 3
(22)
It is worth noticing that the method (21), (22) of a robust matrix inversion is valid for non‐redundant as well as for redundant manipulators. Moreover, it is not computationally demanding, as at regular con‐ igurations it is computed using (20), (21), while at singular con igurations the modi ication (22) is minor. In the next section, it will be explained why all very small singular values should be replaced with the same value 𝑑min .
A4
VOLUME 17,
𝑥𝑓 𝛿
A5
Figure 1. Possible motions in a task‐space resulting from modifying either the Jacobi or the manipulability matrix
𝐷𝑇 𝑈 = 𝑈 𝐷 2𝑈 𝑇 . 0 𝑟,𝑚
(23)
Consequently, the modi ication (11) can be expressed in terms of singular values 𝐷 2 + 𝜆𝐼𝐼 )𝑈 𝑈𝑇 𝑀 + 𝐴 = 𝑀 + 𝜆𝐼𝐼 = 𝑈 (𝐷
(24)
2 and the mid‐term matrix 𝐷 2 = diag(𝑑12 , … , 𝑑𝑚 ) is increased by the value of
𝑓1 = 𝑚 ⋅ 𝜆.
(25)
41
Journal of Automation, Mobile Robotics and Intelligent Systems
When the modi ication method based on (22) is used, then the matrix 𝐷 2 is replaced with 2 2 2 diag(𝑑12 , … , 𝑑𝑚−𝑐 , 𝑑min … , 𝑑min ), thus the increase 𝑐 times
is equal to
VOLUME 17,
2023
Finally, 𝑚 2 𝑐⋅𝑑min −
2 2 2 2 𝑑𝑖2 ≤ 𝑐⋅𝑑min −𝑐⋅𝑑𝑚 < 𝑚 (𝑑min −𝑑𝑚 ), 𝑖=𝑚−𝑐+1
𝑚
𝑚
2 2 𝑑𝑖2 , (26) − (𝑑min −𝑑𝑖2 ) = 𝑐⋅𝑑min 𝑓2 = 𝑖=𝑚−𝑐+1 𝑖=𝑚−𝑐+1
where the corank 𝑐 was de ined in (6) and counts items satisfying the condition 𝑑𝑖 < 𝑑min . The functions 𝑓1 , 𝑓2 , respectively, measure a distance between the original matrix 𝑀 and its modi ied version, performed either directly (24) or via increasing minimal singular values (22). QN. 2) Usually, a “distance” to singular con igura‐ tions is estimated by the value of the manipulability matrix determinant (or its square root — the manipu‐ lability index) 𝑚
𝑞 ). 𝑑𝑖2 (𝑞
𝑀 (𝑞 𝑞 )) = det(𝑀
(27)
𝑖=1
The function (27) is non‐negative, has got an analyt‐ ical gradient with respect to 𝑞 , and attains the zero value at singular con igurations. Therefore, it is useful in avoiding singularities by the optimization within a null space of the Jacobi matrix for redundant manipu‐ lators [11]. Unfortunately, it does not show how far a current con iguration is from a singular one. For this purpose, a better choice is to take the minimal value among singular values, or its value squared, i.e., 2 𝑞 𝑞 ) ∶= 𝑑𝑚 𝑔2 (𝑞 (𝑞 )
𝑞 ) ∶= 𝑑𝑚 (𝑞 𝑞 ), 𝑔1 (𝑞
(28)
as this value decides the well or badly conditioning aspect of the manipulability matrix. Similar to (28), also in other robotic tasks, a square of a positive value is used instead of the value itself just to avoid the time‐consuming square‐root operation, like in a task of calculating the distance to obstacles. The functions (28) have also one drawback as they can be calculated only numerically at a given con‐ iguration. Apparently, the minimal singular value of the modi ied manipulability matrix should be as big as possible. This observation justi ies selection (22) to increase all the smallest singular values to the same level. Using the function 𝑔2 , the modi ica‐ tions (11), (24), or (22) increase the minimal singular value to 2 𝑞 𝑞 ) ∶= 𝑑𝑚 ℎ1 (𝑞 (𝑞 ) + 𝜆,
2 ℎ2 ∶= 𝑑min ,
(31) as 𝑐 < 𝑚 and 𝑑𝑖 ≤ 𝑑min for 𝑖 = 𝑚 − 𝑐 + 1, … , 𝑚. From (31), it can be deduced that the differ‐ ence between ℎ2 and ℎ1 increases as 𝑐 is small (in a typical case 𝑐 = 1) and 𝑚 is big (usually 𝑚 ∈ {3, … , 6}). QN. 3) It is commonly agreed that kinematic‐like transformations are much simpler than those related to dynamics. Nevertheless, the robust inverse kine‐ matics method based on the SVD decomposition is computationally simpler than the other methods as it couples the detection of singularities with passing through them. In this case, computations performed at the detection stage are effectively used in calculating the robust inversion (21), (22). Moreover, the method works in the same way for redundant as well as for non‐redundant manipulators (the only difference is that the sub‐matrix 0 𝑚,𝑟 in (20) disappears in the non‐ redundant case). The other robust methods do not show this feature and both cases should be considered and implemented separately. QN. 4) All presented methods of robust inverse kinematics are numerical in nature. For some simple manipulators singular con igurations can be calcu‐ lated explicitly, for more complicated ones only ana‐ lytical conditions can be formulated to be satis ied at singular con igurations. For the PUMA 560 robot [8] shoulder singularities are given by an expression relating components of a con iguration vector while elbow and wrist singularities can be calculated explic‐ itly.
5. Simulation Results Simulations were performed on the non‐ redundant manipulator, Figure 2, equipped with three rotational joints and described by the forward kinematics 𝑥 𝑐1 𝐴 𝑦 = 𝑠1 𝐴 , 𝑧 𝑎0 + 𝑎2 𝑠2 + 𝑎3 𝑠23
(29)
𝑎2
𝑎1
2 2 𝑑𝑖2 < 𝑚 (𝑑min − 𝑑𝑚 ). 𝑖=𝑚−𝑐+1
(30)
𝑞3
𝑎3
𝑞2
𝑎0
𝑚 2 ℎ1 < ℎ2 ⇔ 𝑐 ⋅ 𝑑min −
(32)
z
respectively. Now we are in a position to compare the two modi ications assuming that the manipulability matrix increase is the same, 𝑓1 = 𝑓2 . The better modi‐ ication should give a greater value of the resulting minimal singular value squared. It will be proven that ℎ1 < ℎ2 , which means that the modi ication (22) is better than (11). At irst, from 𝑓1 = 𝑓2 (cf. (25), (26)), 𝜆 is calculated and substituted into (29). Then,
42
N∘ 3
𝑞1
x
Figure 2. The three degrees of freedom manipulator
Journal of Automation, Mobile Robotics and Intelligent Systems
where 𝐴 = 𝑎1 + 𝑎2 𝑐2 + 𝑎3 𝑐23
(33)
and 𝑎0 , 𝑎1 , 𝑎3 , 𝑎3 are length parameters. In (32), (33) the standard robotic convention is used to denote trigonometric functions and 𝑠23 = sin(𝑞2 + 𝑞3 ), 𝑐2 = cos(𝑞2 ), etc. Singular con igurations are derived from the Jacobi matrix −𝑐1 (𝑎2 𝑠2 + 𝑎3 𝑠23 ) −𝑎3 𝑐1 𝑠23 −𝑠1 (𝑎2 𝑠2 + 𝑎3 𝑠23 ) −𝑎3 𝑠1 𝑠23 , 𝑎2 𝑐2 + 𝑎3 𝑐23 𝑎3 𝑐23 (34) and they satisfy the condition 𝑞 )) = 𝐽 (𝑞
−𝑠1 𝐴 𝑐1 𝐴 0
𝑞 )) = −𝑎2 𝑎3 𝐴 ⋅ 𝑠3 = 0. det(𝐽𝐽 (𝑞
(36)
is met. The two following singularity‐robust algorithms of motion planning (3) are compared to: Algorithm 1: To the manipulability matrix 𝑀 in (4), the term given in Equation (24) is added, Algorithm 2: SVD‐based method (22) with equal minimal singular values. In order to make the algorithms comparable, the total increase of the manipulability matrix was ixed and the 𝜆 coef icient in Algorithm 1 was calculated from the condition 𝑓1 = 𝑓2 , cf. (25), (26) based on 𝑑min parameter given. The other data for a simulation are the following: ‐ accuracy of reaching sub‐goals along a given path 𝛿 = 10−9 , ‐ unit‐less length parameters: 𝑎0 = 0, 𝑎1 = 0.3, 𝑎2 = 0.2, 𝑎3 = 0.3, ‐ detection of singularities 𝑑𝑚 < 10−3 , ‐ the minimal singular value after modi ication 𝑑min = 10−2 . Finally, the path to follow 0 𝑥(𝑡) 0 𝑦(𝑡) = 𝑧(𝑡) (0.6 + 0.2 𝑡) (𝑎2 + 𝑎3 )2 − 𝑎12
120 [deg]
N∘ 3
2023
q2
100 80
q3
60 40 20
t [-] 0 0
0.5
1
1.5
2
Figure 3. Coordinates 𝑞2 and 𝑞3 corresponding to the path followed
(35)
It appears that singular con igurations arise for 𝑞3 = 0 at the boundary of the task space but also along the 𝑧‐ axis when 𝐴 = 0 if only the condition 𝑎2 + 𝑎3 ≥ 𝑎1
VOLUME 17,
(37)
is given on the (time) interval 𝑡 ∈ [0, 2] and all its points are realized with singular con igurations as 𝐴 = 0, so the path can be considered as hard to follow. The range of admissible z‐coordinates for 𝑥 = 𝑦 = 0 𝑎0 − (𝑎2 + 𝑎3 )2 − 𝑎12 , 𝑎0 + (𝑎2 + 𝑎3 )2 − 𝑎12 . Trajectories generated with both robust inverse algorithms were almost the same: 𝑞1 = 300∘ took almost the constant value (up to the numerical noise) while the remaining components of the con iguration
vector are depicted in Figure 3. The only difference was observed at 𝑡 = 2 where the SVD‐based Algo‐ rithm 2 obtained the inal path point with the pre‐ scribed accuracy while Algorithm 1 cannot do that and stopped at the distance 5.6 ⋅ 10−6 from the target. All other points of the path were obtained with the pre‐ scribed accuracy 𝛿, thus both robust inverse kinematic algorithms worked quite well. The point for 𝑡 = 2 is located at the boundary of the task‐space and it is obtained in the corank 2 con iguration (not only 𝐴 = 0 but also 𝑠3 = 0, cf. Equation (35)) and only one column of matrix 𝐽 in (34) is independent.
6. Auxiliary Issues In this section, two mathematical problems are addressed and two practical tips are given. In Subsec‐ tion 6.1 it is shown that a minimal singular value (the function 𝑔1 ) used as a distance estimation of a current con iguration (via singular values of its Jacobi matrix) from a singularity is not a norm in a mathematical sense. In Subsection 6.2, it is pointed out that for non‐ redundant manipulators reliable singularity‐robust inverse kinematics cannot be performed directly on the Jacobi matrix (but still it can be done using its manipulability matrix or operations on its singular values). The next two subsections provide practically‐ oriented hints. In Subsection 6.3, advantages of a nor‐ malization of forward kinematics are discussed while in Subsection 6.4 it is shown how to minimize dis‐ advantages of existence of inadmissible motion direc‐ tions within a task‐space. 6.1. Is the Minimal Singular Value a Norm for the Singularity Detection? The function 𝑔1 de ined in (28) and equal to the minimal singular value de ines a “distance” of a Jacobi matrix to singularity. However, it is not a distance in a strict mathematical sense as it does not satisfy the triangle inequality. In order to show this, let us take two exemplary matrices 𝐴=
1 0
0 , 0
𝐵=
0 0
0 , 1
𝐴 + 𝐵 = 𝐼 2.
(38)
𝐴 ) = 𝑔1 (𝐵 𝐵 ) = 0 but 𝑔1 (𝐴 𝐴 + 𝐵 ) = 1. Apparently, 𝑔1 (𝐴 It is known [7] that the maximal singular value can 43
Journal of Automation, Mobile Robotics and Intelligent Systems
serve as a norm of a matrix (so called the spectral norm). However, this norm is not useful in coping with singularities as the smallest singular value of a matrix impacts singularities the most. To emphasize the importance of the minimal singular value in the vicinity of singular con iguration let us assume that a 𝑥 (‖Δ𝑥 𝑥 ‖ = 𝜈) is to be executed small displacement Δ𝑥 there. Using (2) and rearranging (20), the following formula is obtained 𝑥 = [𝐷 𝐷 , 0 𝑚,𝑟 ] ⋅ 𝑉 𝑇 Δ𝑞 𝑞. 𝑈 𝑇 Δ𝑥
(39)
𝑥 = (0, … , 0, 𝜈)𝑇 , In the worst possible case, when 𝑈 𝑇 Δ𝑥 𝑈 𝑇 Δ𝑥 𝑥 ‖ = 𝑑𝑚 ‖𝑉 𝑉 𝑇 Δ𝑞 𝑞 ‖ = 𝑑𝑚 ‖Δ𝑞 𝑞 ‖. 𝜈 = ‖𝑈
(40)
Equation (40) describes a well known fact that around a singular con iguration velocities at joints can attain very large (in inite) values. But it also prompts how to upper‐bound the norm on the velocities by replacing 𝑑𝑚 by a design parameter 𝑑min 𝑞 ‖ ≤ 𝜈/𝑑min . ‖Δ𝑞
(41)
6.2. Varying the Jacobi Matrix for Non‐redundant Manipulators For redundant manipulators, the classical modi i‐ cation of the manipulability matrix (24) is provable good for redundant manipulators. But the question arises: is it possible to modify the Jacobi matrix itself and avoid the computationally costly modi ication via the manipulability matrix (in this case non‐redundant manipulator is considered as a special case of redun‐ dant one)? The answer is no. To prove this statement let us analyze a hypothetical 2‐dof manipulator with its Jacobi matrix expressed in a general form 𝑗 𝐽 = 11 𝑗21
𝑗12 𝑗22
(42)
and considered at a singular con iguration det(𝐽𝐽 ) = 𝑗11 𝑗22 − 𝑗11 𝑗21 = 0.
(43)
The modi ied Jacobi matrix 𝐽 + 𝜆𝐼𝐼 2 , taking into account (43), has got its determinant equal to det(𝐽𝐽 +𝜆𝐼𝐼 2 ) = 𝜆(𝑗11 +𝑗22 )+𝜆2 = (𝑗11 +𝑗22 +𝜆)𝜆 (44) which, in general, may take the value of zero for 𝜆 ≠ 0. The same reasoning repeated for higher dimensional con iguration spaces reveals that the polynomial (44) will be of the 𝑛‐th degree with its coef icients depend‐ ing of items of 𝐽 and non‐zero value not‐guaranteed. 6.3. Normalization of Kinematics The Jacobi matrix (2), (20) transforms velocities from a con iguration space into a task‐space. In prac‐ tice, coordinates of vectors from the con iguration and the task spaces have got a physical meaning and units as well (angles, positions, and translations). Thus, in some applications there is a practical need to weight their contribution in the Newton algorithm (3). 44
VOLUME 17,
N∘ 3
2023
In order to unify different (in units and ranges) con iguration coordinates, a weighted pseudo‐inverse matrix [11] can be used 𝐽 #𝑊 = 𝑊 −1𝐽 𝑇 (𝐽𝐽𝑊 −1𝐽 𝑇 )−1
(45)
where 𝑊 is a symmetric, positive de inite weight‐ ing matrix. This (right) pseudo‐inverse matrix results 𝑇 from maximizing 𝑥̇ = 𝐽𝑞̇ with the ixed value of 𝑞̇ 𝑊 𝑞̇ . While analyzing the decomposition (20), one can notice that the matrix 𝑉 𝑇 rotates velocities 𝑞̇ , then the rotated velocity is scaled and truncated into 𝑚 𝐷 , 0 𝑚,𝑟 ] and inally dimensional object by the matrix [𝐷 this object is rotated in the 𝑚‐dimensional space by the matrix 𝑈 to get 𝑥̇ . In the previous sections singular values were extensively used as a tool for detecting singularities and constructing robust inverses. Here a natural question can be posed about their units and ranges. Assuming that rotation matrices are unit‐ 𝐷 , 0 𝑚,𝑟 ] should have the less, then components of [𝐷 same units as the Jacobi matrix 𝐽 . Clearly, singular values should depend on geometrical parameters of a manipulator. In order to make them less dependent on the parameters, it is advised to normalize com‐ ponents of forward kinematics (1) by dividing each component by its maximal allowable value. Unfor‐ tunately, as components of kinematics depend also on the con iguration, the optimization could be a serious numerical task. Fortunately, it is quite sim‐ ple to get a reasonable approximation of the value by summing up upper bounds of items contributing to the component of kinematics. For an exemplary expression 𝑥 = 𝑎1 + 𝑎2 cos 𝑞2 + 𝑎3 cos(𝑞2 + 𝑞3 ),
(46)
the value is equal to 𝑎1 + 𝑎2 + 𝑎3 . The normalization of kinematics has got one more potential advantage. Frequently, to complete the Newton algorithm (3), the vector from a current location to the goal one should 𝑥 𝑓 − 𝑘 (𝑞 𝑞 𝑖 )‖ < 𝛿 with drop below a given threshold: ‖𝑥 the Euclidean metric ‖ ⋅ ‖ used. When the normalized kinematics is used, the components of the vector are comparable. 6.4. Admissible and Inadmissible Directions of Motion at Singular Configurations At singular con igurations, a motion in a task‐space along vectors that belong to the corank(𝐽𝐽 ) dimen‐ sional space spanned by vectors perpendicular to independent columns 𝐽 ,⋆,𝑧 , cf. (2), of the Jacobi matrix cannot be realized using any velocities at joints. Gener‐ ally, the directions can be calculated with a numerical procedure only (using the Gramm‐Schmidt ortogonal‐ ization algorithm [1]) but for the manipulator (32) it can be done analytically. At singular con igurations localized along the 𝑧‐ axis, the irst column in (34) is a zero vector. It can be checked by direct calculations that a vector (𝑠1 , −𝑐1 , 0) is perpendicular to the two remaining columns, so along this direction the in initesimal motion cannot be performed.
Journal of Automation, Mobile Robotics and Intelligent Systems
Using the SVD‐decomposition (20), it can be eas‐ ily checked that the space of admissible motions in a task‐space is spanned by (𝑚 − 𝑐) irst columns 𝑈 ⋆,1 , … , 𝑈 ⋆,𝑚−𝑐 ) of the matrix 𝑈 while the remaining (𝑈 𝑐 columns span the space of inadmissible motions. Clearly, the higher corank 𝑐 (6) is, the more restricted admissible motions are. As already mentioned in the introductory section, singularity‐robust inverse kine‐ matics will cause inaccuracy in tracking a desired path. So, let us assume that at a singular con iguration a 𝑥 . The desired motion is described by the vector Δ𝑥 𝑥 and its projection onto the sub‐ angle between Δ𝑥 𝑈 ⋆,1 , … , 𝑈 ⋆,𝑚−𝑐 ) may serve as a space spanned by (𝑈 qualitatively good estimator of an expected tracking error. For a designer of a singularity‐robust algorithm this observation can be useful while planning a path to be followed very precisely.
7. Conclusion In this paper various singularity‐robust methods of inverse kinematics were discussed and their reliable work has been evidenced even in the most demanding circumstances. The theoretical background of the methods given is aimed to help to understand, analyze, and evaluate any particular method. Some tips for robotics practitioners are provided on how to design singularity‐robust methods and/or to properly set their parameters. In this paper, it has also been shown that the SVD‐ method with a ixed modi ication of a sum of singular values that sets all the smallest values to the same level is the best within this class of methods as it allows getting as far as possible from singularities and also reducing joint velocities. This method can be advised as the irst choice method for computational reasons as computations performed at the singularity detection stage can be used in a regular as well as in a singular case. In this paper, some questions related to facilitate passing through singular con igurations have been answered. They concern a transformation of kinematics before planning as well as preparing a path to be followed.
AUTHOR Ignacy Dulęba – Department of Cybernetics and Robotics, Wroclaw University of Science and Technology, Janiszewski St. 11/17, 50‐372 Wroclaw, Poland, e‐mail: ignacy.duleba@pwr.edu.pl, www.kcir.pwr.edu.pl/~iwd.
References [1] W. Cheney, and D. Kincaid, Linear Algebra: Theory and Applications, Jones & Bartlett Publ., 2009. [2] I. Dulęba. “Robust inverse kinematics at singu‐ lar con igurations,” A. Mazur and C. Zieliński, eds., Advances in Robotics, vol. 197 of Electronics, pp. 5–10. Publ. House of the Warsaw Univ. of Technology, 2022 (in Polish).
VOLUME 17,
N∘ 3
2023
[3] I. Duleba. “A comparison of jacobian‐based methods of inverse kinematics for serial robot manipulators,” Int. Journal of Applied Mathematics and Computer Science, vol. 23, no. 2, 2013, pp. 373–382. [4] I. Duleba. “Channel algorithm of transversal passing through singularities for non‐redundant robot manipulators,” IEEE Int. Conf. on Robotics and Automation, vol. 2, 2000, pp. 1302–1307; doi: 10.1109/ROBOT.2000.844778. [5] G. Golub, and C. Reinsch. “Singular value decomposition and least squares solutions,” Numerische Mathematik, vol. 14, no. 5, 1970, pp. 403–420. [6] B. Grossmann. “The product of two symmetric, positive semide inite matrices has non‐negative eigenvalues”. Mathematics Stack Exchange; http s://math.stackexchange.com/q/982822 (ver‐ sion: 2014‐10‐21). [7] R. Horn, and C. Johnson, Matrix analysis, Cam‐ bridge Univ. Press, 2012. [8] C.‐G. Kang. “Online trajectory planning for a PUMA robot,” Int. Journal of Precision Enginnering and Manufacturing, vol. 8, no. 4, 2007, pp. 16–21. [9] S. Lloyd, R. A. Irani, and M. Ahmadi. “Fast and robust inverse kinematics of serial robots using Halley’s method,” IEEE Transactions on Robotics, vol. 38, no. 5, 2022, pp. 2768–2780; doi: 10.1109/TRO.2022.3162954. [10] A. A. Maciejewski, and C. Klein. “The singular value decomposition: Computation and appli‐ cations to robotics,” Int. Journal of Robotics Research, vol. 8, 1989, pp. 63–79. [11] Y. Nakamura, Advanced Robotics: Redundancy and Optimization, Addison‐Wesley, 1991. [12] A. Ratajczak, J. Ratajczak, and K. Tchoń. “Task‐ priority motion planning of underactuated systems: an endogenous con iguration space approach,” Robotica, vol. 28, no. 6, 2010, pp. 885–892. [13] M. Spong, and M. Vidyasagar, Robot Dynamics and Control, MIT Press, 1989. [14] J. Sun, Y. Liu, and C. Ji. “Improved singular robust inverse solutions of redundant serial manipula‐ tors,” Int. Journal of Advanced Robotic Systems, vol. 17, no. 3, 2020, pp. 1–12; doi: 10.1177/1729 881420932046. [15] K. Tchoń, and J. Ratajczak. “Singularities of holo‐ nomic and non‐holonomic robotic systems: a normal form approach,” Journal of the Franklin Institute, vol. 358, no. 15, 2021, pp. 7698–7713. [16] L. V. Vargas, A. C. Leite, and R. R. Costa. “Over‐ coming kinematic singularities with the iltered inverse approach”, IFAC Proceedings Volumes, vol. 47, no. 3, 2014, pp. 8496–8502; doi: 10.3182/ 20140824‐6‐ZA‐1003.01841, 19th IFAC World Congress.
45
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
COMPARISON OF CURVILINEAR PARAMETRIZATION METHODS AND AVOIDANCE OF ORTHOGONAL SINGULARITIES IN THE PATH FOLLOWING TASK Submitted: 8th January 2023; accepted: 6th June 2023
Filip Dyba, Alicja Mazur DOI: 10.14313/JAMRIS/3‐2023/22
1. Introduction
Abstract: In this paper applications of curvilinear parametriza‐ tions (Serret–Frenet, Bishop) in the path following task have been considered. The parametrizations allow one to derive manipulator’s equations with respect to a path. The full mathematical model of the path following task involves two groups of equations, i.e., the dynamics of the manipulator and the equations obtained from the parametrization method, connected in the cascaded system. Based on those relations two path following algo‐ rithms have been designed according to the backstepping integrator method (dedicated to the cascaded systems). Depending on the chosen parametrization method the algorithms differ in requirements and performance. In the paper an in‐depth analysis comparing features of both considered methods has been presented. The parametric description of a path requires projection of a robot on the path. In this article the orthogonal projection has been taken into account. It introduces a singularity in the robot description. We have proposed a new form of the orthogonal projection constraint which allows a robot to not only approach the path, but also move along it. This novelty design is an important enhancement of the algorithms used so far. The problem of partially known dynamic parameters of a robot has also been addressed. In this paper, we have shown how to apply an adaptive controller to the path following task. Theoretical considerations have been verified with a simulation study conducted for a holonomic station‐ ary manipulator. Achieved results emphasized why it is strongly recommended to use the algorithm version with the orthogonal singularity outside the path. Moreover, the comparative analysis results may be used to select the best curvilinear parametrization method according to the considered task requirements.
The path following task is one of the basic robot control tasks [19]. This control problem has attracted a great deal of attention as it plays a crucial role in many important technological challenges nowadays, e.g., the usage of autonomous vehicles [29]. In the liter‐ ature, different solutions for the path following control problem have been discussed many times considering different robots and path de initions. The approach using parametrizations in the path description was harnessed in various papers, for instance, for mobile robots [14, 22, 33], for holonomic ixed‐base manip‐ ulators [9, 20], and mobile manipulators [18, 21]. The similar control problem was de ined also for more complex robotic objects, such as autonomous under‐ water vehicles [7] and lying robots [17]. However, the vast majority of the mentioned papers consider only the two‐dimensional case, which is not easily scalable to the three‐dimensional space.
Keywords: Path following, Serret–Frenet parametrization, Bishop parametrization, Orthogonal projection, Backstepping algorithm, Holonomic manipulator, Singularity
Apart from the approach using parametrizations, also a non‐parametric description of a path has been taken into account. One of such methods is e.g. the level set method, which is a tool for numerical analysis of surfaces and shapes. The advantage of the level set model is the possibility to perform numerical com‐ putations involving curves and surfaces without hav‐ ing to parametrize those objects. Exemplary solutions were provided in [5, 13, 24–26]. In the presented paper only the parametric description of a path is taken into consideration. Let us de ine a path as a geometric curve which does not depend on time but is a purely geometrical description of a motion [32]. The curve is usually parametrized with the so‐called curvilinear distance which may be interpreted as the length of a string laying perfectly on the path [23]. It has some important implications. Firstly, geometrical planning of the given path is simpler than planning a time‐dependant trajectory. Secondly, no time regimes are imposed on a controlled object due to the lack of time dependency [28]. This feature is particularly vital in applications taking into account constraints on control torques or forces. In the approach presented in the paper it is assumed that the real robot is controlled with respect to a certain reference object, which is also called
46
2023 © Dyba and Mazur. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems
a virtual one [33]. The virtual object motion is mod‐ elled with the curve geometry by harnessing curvi‐ linear parametrization formalism. The de inition of this motion allows one to additionally reduce values of control generalized forces by proper choice of the manoeuvre time. It may be arbitrarily extended due to the fact that paths are independent of time. Various methods of curvilinear parametrizations may be applied to solving this problem. The most frequently chosen method is the Serret–Frenet parametrization [8,31]. It has already been considered in many applications, e.g., for mobile platforms [22], mobile manipulators [18], and even for stationary holonomic manipulators to some extent [9]. However, it is not the only method which may be used for the considered problem. The Bishop parametrization [1] is an alternative possibility. Its application to the robot control has not been extensively analyzed so far, although the relatively parallel transport frames, which are based on the Bishop idea, have been considered in some modi ications of the known control algorithms [11]. Furthermore, the robot description with respect to a path is dependent on the projection method. Each method has its own advantages and disadvantages, but all of them allow deriving description of a robot with respect to a moving reference frame. In this article the orthogonal projection method is chosen as it minimizes the dimensionality of the control problem. It is combined with different meth‐ ods of the curvilinear parametrization which are com‐ pared to each other. It is noteworthy that the problem in the three‐dimensional space is taken into account as many solutions have been reduced to the planar case so far. The possibility of their application to the path following task for a holonomic manipulator is veri ied with a simulation study. The article is organized as follows. In Section 2 equations of the considered parametrization meth‐ ods are de ined. Also, dependencies between them are presented. Equations describing the holonomic manipulator in general, including its description with respect to the given path, are provided in Section 3. Different curvilinear parametrizations are used in order to achieve the description. In addition, the anal‐ ysis of the singularity resulting from the orthogonal projection is presented. In Section 4, the control task is de ined. Furthermore, the deployed control algo‐ rithms are also presented. Results of numerical simu‐ lations are shown in Section 5. The considerations are summarized in Section 6. This paper is an extension of the conference paper [6] and the following aspects are novel. Another method of the Bishop frame initialization has been investigated. Also, the analysis of local frames’ behav‐ ior along the curve has been extended. In particu‐ lar, their orientation changes have been taken into account. It is an essential part of the presented com‐ parative analysis. Moreover, the problem of orthogo‐ nal parametrization singularities has been addressed. In the article a modi ication of the kinematic controller has been proposed. It allows avoiding the singularities
VOLUME 17,
N∘ 3
2023
and lets the robot move along the path. Finally, appli‐ cation of an adaptive version of the dynamic controller to the designed control cascade has been veri ied. The achieved results have been compared with the non‐adaptive controller, which was mainly considered in the previous work.
2. Curvilinear Parametrizations The curvilinear parametrizations allow de ining a local frame. Its motion along a curve fully describes the curve geometry [27]. In the following sections equations of different parametrization methods are presented. 2.1. Serret–Frenet Parametrization The most common curvilinear parametrization method is the Serret–Frenet parametrization [8, 31]. The local frame consists of three vectors: tangential to the curve 𝑻, normal to the curve 𝑵, and binormal to the curve 𝑩. They create an orthonormal basis in the ℝ3 space and span the Frenet trihedron, which is presented in Figure 1 [34]. The given vectors are de ined with the following equations [34] 𝑻(𝑠) =
d𝒓(𝑠) , d𝑠
𝑵(𝑠) =
d𝑻(𝑠) d𝑠 d𝑻(𝑠) d𝑠
(1a)
,
(1b) (1c)
𝑩(𝑠) = 𝑻(𝑠) × 𝑵(𝑠),
where 𝒓(𝑠) denotes coordinates of a point on a curve expressed in the inertial reference frame. In other words, it is an analytical expression describing a curve. The evolution along a curve of the de ined frame is expressed with the following equation d𝑻(𝑠) d𝑠
d𝑵(𝑠) d𝑠
d𝑩(𝑠) d𝑠 𝑇
=
𝑻𝑇 (𝑠) 𝑵𝑇 (𝑠) 𝑩𝑇 (𝑠)
=
0 𝜅(𝑠) 0
−𝜅(𝑠) 0 𝜏(𝑠)
0 −𝜏(𝑠) 0
,
(2)
where the curvature 𝜅(𝑠) de ines the swerve of a curve from a straight line, whereas the torsion 𝜏(𝑠) expresses the swerve of a curve from a plain. binormalstraightening plane tangent
𝑩 𝑻
𝑵
𝒓(𝑠)
strictly tangent plane main normal
normal plane Figure 1. Frenet trihedron 47
Journal of Automation, Mobile Robotics and Intelligent Systems
Geometrical invariants of a curve may be formulated with equations [20] 𝜅(𝑠) =
d𝑻(𝑠) , d𝑠
𝜏(𝑠) =
1 d𝒓(𝑠) d 𝒓(𝑠) d 𝒓(𝑠) , , × 2 𝜅 (𝑠) d𝑠 d𝑠 2 d𝑠 3
(3a) 2
3
(3b)
where ⟨⋅, ⋅⟩ denotes a scalar product of vectors. 2.2. Bishop Parametrization A different parametrization method was proposed by Bishop in [1]. In this approach the local reference frame is also created with three orthogonal vectors. Again, one of them is the tangential vector 𝑻. However, the other two vectors, 𝑵1 and 𝑵2 , do not have as intuitive of a geometrical interpretation as vectors cre‐ ating the Serret–Frenet frame described in Section 2.1, although they are also orthogonal to the tangential vector 𝑻. They create relatively parallel vector ields. The Bishop frame de ined in the initial state is trans‐ ported along a curve preserving its uniqueness [16]. The frame evolution along a curve may be expressed with equations [1] d𝑻(𝑠) d𝑠
=
d𝑵1 (𝑠) d𝑠 𝑇 𝑇
𝑻 (𝑠) 𝑵1𝑇 (𝑠) 𝑵𝑇2 (𝑠)
d𝑵2 (𝑠) d𝑠
=
0 𝑘1 (𝑠) 𝑘2 (𝑠)
−𝑘1 (𝑠) 0 0
−𝑘2 (𝑠) 0 0
,
(4)
where 𝑘1 (𝑠) and 𝑘2 (𝑠) are certain functions analogical to the curvature and the torsion of a curve, which fully determines the curve geometry. Their values in the following points of a curve may be calculated with the usage of the relations [16]
VOLUME 17,
N∘ 3
2023
the curve torsion (3b). The Bishop equations are less restrictive as they allow parametrizing curves of 𝒞 2 class. Despite all the differences, there is a strong connection between the geometrical invariants of the Bishop parametrization 𝑘1 (𝑠), 𝑘2 (𝑠) and the geometrical parameters – curvature 𝜅(𝑠) and tor‐ sion 𝜏(𝑠). For a curve described with a pair (𝑘1 , 𝑘2 ) polar coordinates may be de ined. They are equal to (𝜅, ∫ 𝜏(𝑠) d𝑠) [1]. Hence, the direct relation which links the invariants of both parametrizations is de ined with the following equation [16] 𝑘1 (𝑠) = 𝜅(𝑠) cos 𝜃(𝑠),
(6a)
𝑘2 (𝑠) = 𝜅(𝑠) sin 𝜃(𝑠),
(6b)
where 𝜃(𝑠) = ∫ 𝜏(𝑠) d𝑠. Due to the integration constant, the Bishop frame may be de ined in many ways in the initial state. However, once de ined the local frame always evolves along a curve identically in the precisely described manner [30]. Furthermore, a transformation connecting the Serret–Frenet frame and the Bishop frame is de ined. The tangential vec‐ tor 𝑻 is the common part of both methods. The other vectors may be determined based on the following relations [1] 𝑵(𝑠) = cos 𝜃(𝑠)𝑵1 (𝑠) + sin 𝜃(𝑠)𝑵2 (𝑠),
(7a)
𝑩(𝑠) = − sin 𝜃(𝑠)𝑵1 (𝑠) + cos 𝜃(𝑠)𝑵2 (𝑠).
(7b)
The aforementioned equations allow one to combine both parametrization methods in order to bene it from their advantages simultaneously. An example of such a solution is the beta frame presented in [4].
3. Mathematical Model of a Manipulator 3.1. Holonomic Manipulator
d𝑻(𝑠) , 𝑵1 (𝑠) , 𝑘1 (𝑠) = d𝑠
(5a)
d𝑻(𝑠) , 𝑵2 (𝑠) . d𝑠
(5b)
𝑘2 (𝑠) =
The considered controlled object is a non‐ redundant stationary holonomic manipulator. The forward kinematics task, which de ines the end‐ effector position 𝒑 in the base frame, which will be identi ied with the inertial frame, is de ined as
2.3. Relationships Between Parametrizations Due to some differences in the de initions of the base vectors of the local frames associated with a curve, the presented parametrization methods have different properties. It is crucial that the Serret–Frenet frame is unde ined in all points where the curvature is equal to zero. In particular, it concerns all straight lines, which may be a part of many common path following tasks. It results from the singularity which appears in the normal vector de inition (1b). The key advantage of the Bishop frame is the fact that there are no constraints resulting from its de inition. However, evolution of this frame is described with a more complex operation of vector relatively parallel transport along a curve [10]. Thus, the geometrical interpretation of the Bishop base vectors is less intu‐ itive. It is also worth noticing that the Serret–Frenet frame requires that the parametrized curve be of at least 𝒞 3 class. It results from the equation de ining 48
𝒑 = 𝑘(𝒒).
(8)
Hence, velocities in the Cartesian space depend on the velocities in the joint space according to the equation with the Jacobi matrix 𝑱 [34] 𝒑̇ =
𝜕𝑘(𝒒) 𝒒̇ = 𝑱(𝒒)𝒒.̇ 𝜕𝒒
(9)
The manipulator dynamics is derived with the usage of the Euler–Lagrange formalism and the stationary‐action principle. Without loss of generality we may omit the dissipative effects in the dynamics equations as they are not crucial for the considera‐ tions presented in the paper. Moreover, direct‐drive actuators are considered. Firstly, let us assume that the dynamics model is fully known. The structure of the manipulator dynam‐ ics is described with the equation [34] ̇ 𝒒̇ + 𝑫(𝒒) = 𝒖, 𝑴(𝒒)𝒒̈ + 𝑪(𝒒, 𝒒)
(10)
Journal of Automation, Mobile Robotics and Intelligent Systems
where: ‐ 𝒒 ∈ ℝ𝑛 is the manipulator con iguration which consists of the joint positions;
VOLUME 17,
𝑍0
‐ 𝑫 ∈ ℝ𝑛 is the vector of the generalized forces resulting from the gravity effects; and ‐ 𝒖 ∈ ℝ𝑛 is the vector of generalized control forces. Secondly, let us consider the case when the dynam‐ ics are de ined with parametric uncertainty. In other words, not all parameters standing before functions occurring in the dynamical equations are known. According to the theory of adaptive systems it has to be assumed that the unknown parameters are constant and the model is linearly dependent on them [12]. If there are some unknown parameters, the model (10) may be expressed with the usage of regression matrix 𝒀
=
𝑴(𝒒, 𝒂)𝒒̈ + 𝑪(𝒒, 𝒒,̇ 𝒂)𝒒̇ + 𝑫(𝒒, 𝒂) = ̇ + 𝑪𝑎 (𝒒, 𝒒)] ̇ 𝒒̇ + [𝑴0 (𝒒) + 𝑴𝑎 (𝒒)]𝒒̈ + [𝑪0 (𝒒, 𝒒)
=
+ 𝑫0 (𝒒) + 𝑫𝑎 (𝒒) = ̇ 𝒒̇ + 𝑫0 (𝒒) + 𝒀(𝒒,̈ 𝒒,̇ 𝒒,̇ 𝒒)𝒂 𝑴0 (𝒒)𝒒̈ + 𝑪0 (𝒒, 𝒒)
𝒖 =
(11) where 𝒂 is the vector of the unknown constant param‐ eters, the elements with the subscript 0 denote the known parts of the model and the elements with 𝑎 in the subscript correspond to the part of the model dependent on the unknown parameters. In turn, the irst argument of the regression matrix 𝒀 gives the vector by which the inertia matrix is multiplied, the second component gives the vector by which the Coriolis matrix is multiplied, the third com‐ ponent de ines the velocity occurring in the Coriolis matrix and the last component de ines the trajectory along which the model is described. 3.2. Robot Equations with Respect to a Path Let us de ine a rotation matrix 𝑺(𝑠) which de ines the orientation of the local frame in a certain point of a curve de ined by the curvilinear distance 𝑠. The base vectors resulting from the de initions of the curvilin‐ ear parametrizations are columns of such a matrix. Hence, the following relation holds for the Serret– Frenet frame 𝑺(𝑠) = 𝑻
𝑵
𝑩 .
(12)
For the Bishop frame equation (12) takes the form 𝑺(𝑠) = 𝑻
𝑵1
𝑵2 .
(13)
The position of the end‐effector may be then de ined in the local frame associated with the given curve 𝒅 = 𝑑1
𝑑2
𝑑3
𝑇
= 𝑺𝑇 (𝒑 − 𝒓),
(14)
where 𝒓 is a point on a curve expressed in the inertial frame and 𝒑 is the end‐effector position in the inertial
2023
𝑑3 𝑑2 𝑃
‐ 𝑴 ∈ ℝ𝑛×𝑛 is the inertia matrix; ‐ 𝑪 ∈ ℝ𝑛×𝑛 is the matrix of Coriolis and centrifugal forces;
N∘ 3
𝑴1
𝑴2
𝑻
𝒅
𝑟3 𝒑
𝑌0 𝒓
𝑟2
𝑟1
𝑋0 Figure 2. Visualization of the orthogonal projection of a robot guidance point on a curve frame, which will be referred to as the robot guidance point. In Figure 2, a schematic view of the projection of a robot guidance point on a curve is visualized. This scheme is independent of the parametrization method as the normal versors are denoted as 𝑴1 and 𝑴2 . The pair {𝑴1 , 𝑴2 } should be understood as {𝑵, 𝑩} for the Serret–Frenet parametrization and {𝑵1 , 𝑵2 } for the Bishop parametrization. The dynamics of the end‐effector position with respect to the moving local frame may be derived by differentiating equation (14) ̇ + 𝑺̇ 𝑇 (𝒑 − 𝒓). 𝒅̇ = 𝑺𝑇 (𝒑̇ − 𝒓)
(15)
Taking into account relations describing evolution of the local frame, i.e., equation (2) for the Serret–Frenet frame or equation (4) for the Bishop frame, the follow‐ ing form may be derived ̇ − 𝑠𝑾𝒅, 𝒅̇ = 𝑺𝑇 (𝒑̇ − 𝒓) ̇
(16)
where 𝑾 is a skew‐symmetric matrix. Its elements are proper parameters describing the curve geometry. Hence, its de inition results directly from equation (2) or equation (4). The aspect of the utmost importance, which signif‐ icantly in luences the form of the constraints enforcing motion along the given path, is the method of pro‐ jection of a robot guidance point on a curve. There are two projection methods which should be dis‐ tinguished: the orthogonal projection and the non‐ orthogonal projection. The latter one does not impose any additional constraints and the reference object associated with the local frame may move freely along the curve. However, position errors in every dimen‐ sion of the local frame need to be followed. Such an approach was considered in some papers, i.e., in [2, 33]. In turn, for the orthogonal projection it is assumed 49
Journal of Automation, Mobile Robotics and Intelligent Systems
that the manipulator end‐effector is always located in the minimal distance from the path. Hence, the position of the robot in the tangent vector direction is always equal to zero. Vectors 𝑻 and (𝒑 − 𝒓) must be orthogonal, which leads to the relation ⟨𝑻, 𝒑 − 𝒓⟩ = 0.
VOLUME 17,
𝝃̇ =
Serret–Frenet parametrization Considering the Serret–Frenet de inition of the local frame (2), equation (18) is transformed to the form 𝑠̇ = −
⟨𝑻, 𝒑̇ − 𝒓⟩̇ 𝑻𝑇 =− 𝒑̇ + 𝜅 ⟨𝑵, 𝒑 − 𝒓⟩ 𝜅⟨𝑵, 𝒑 − 𝒓⟩ ⟨𝑻, 𝒓⟩̇ + = 𝑷1 𝒑̇ + 𝑅1 . 𝜅⟨𝑵, 𝒑 − 𝒓⟩
+ 𝑠𝜏⟨𝑩, ̇ 𝒑 − 𝒓⟩,
(19)
(20a) (20b)
𝑑̇ 3 = ⟨𝑩, 𝒑̇ − 𝒓⟩̇ − 𝑠𝜏⟨𝑵, ̇ 𝒑 − 𝒓⟩.
(20c)
Taking into account equation (19), equation (20) has the form 𝑑̇1 = 0, 𝑑̇ 2 = 𝑵 − − 𝑵−
(21a) 𝜏 ⟨𝑩, 𝒑 − 𝒓⟩ 𝑻 𝜅 ⟨𝑵, 𝒑 − 𝒓⟩
𝑇
𝒑̇ +
𝜏 ⟨𝑩, 𝒑 − 𝒓⟩ 𝑻, 𝒓̇ = 𝑷2 𝒑̇ + 𝑅2 , 𝜅 ⟨𝑵, 𝒑 − 𝒓⟩
𝜏 𝜏 𝑇 𝑻 𝒑̇ − 𝑩 + 𝑻, 𝒓̇ = 𝜅 𝜅 = 𝑷3 𝒑̇ + 𝑅3 .
(21b)
𝑑̇ 3 = 𝑩 +
50
𝑑2
𝑑3
𝑇
.
𝑠̇ = −
= 𝑷𝒑̇ + 𝑹.
⟨𝑻, 𝒑̇ − 𝒓⟩̇ , 𝑘1 ⟨𝑵1 , 𝒑 − 𝒓⟩ + 𝑘2 ⟨𝑵2 , 𝒑 − 𝒓⟩
(23)
(24)
which may be extended to the form 𝑠̇ =
−𝑻𝑇 𝒑̇ + 𝑘1 ⟨𝑵1 , 𝒑 − 𝒓⟩ + 𝑘2 ⟨𝑵2 , 𝒑 − 𝒓⟩ ⟨𝑻, 𝒓⟩̇ = + 𝑘1 ⟨𝑵1 , 𝒑 − 𝒓⟩ + 𝑘2 ⟨𝑵2 , 𝒑 − 𝒓⟩ = 𝑷1 𝒑̇ + 𝑅1 .
(25)
Furthermore, equations de ining the end‐effector velocities in the Bishop frame are expressed with the following relations:
(21c)
(22)
(26a)
+ 𝑠𝑘 ̇ 2 ⟨𝑵2 , 𝒑 − 𝒓⟩, 𝑑̇ 2 = ⟨𝑵1 , 𝒑̇ − 𝒓⟩̇ − 𝑠𝑘 ̇ 1 ⟨𝑻, 𝒑 − 𝒓⟩, ̇ 𝑑3 = ⟨𝑵2 , 𝒑̇ − 𝒓⟩̇ − 𝑠𝑘 ̇ 2 ⟨𝑻, 𝒑 − 𝒓⟩.
(26b) (26c)
Considering equation (25) results in 𝑑̇1 = 0, 𝑑̇ 2 = 𝑵1𝑇 𝒑̇ − ⟨𝑵1 , 𝒓⟩̇ = 𝑷2 𝒑̇ + 𝑅2 ,
(27b)
𝑑̇ 3 = 𝑵𝑇2 𝒑̇ − ⟨𝑵2 , 𝒓⟩̇ = 𝑷3 𝒑̇ + 𝑅3 .
(27c)
(27a)
Similarly, as for the Serret–Frenet parametrization approach, the end‐effector tip is located in the nor‐ mal plane, spanned by the vectors {𝑵1 , 𝑵2 } for the Bishop frame, if the constraint (25) is satis ied. It may be concluded directly from equation (27a). It is noteworthy that also the state in the Bishop frame may be described with equation (22). Hence, equa‐ tions (25), (27b), and (27c) express the evolution of the robot in the Bishop frame 𝝃̇ =
Thus, it may be concluded from equation (21a) that if the constraint (19) is preserved, the end‐effector tip is always located in the normal plane spanned by the Frenet trihedron. As a result, the state of the manipulator with respect to the Serret–Frenet frame is de ined as 𝝃= 𝑠
𝑅1 𝑅2 𝑅3
𝑑̇1 = ⟨𝑻, 𝒑̇ − 𝒓⟩̇ + 𝑠𝑘 ̇ 1 ⟨𝑵1 , 𝒑 − 𝒓⟩ +
Furthermore, velocities of a robot guidance point in the Serret–Frenet frame are equal to 𝑑̇1 = ⟨𝑻, 𝒑̇ − 𝒓⟩̇ + 𝑠𝜅⟨𝑵, ̇ 𝒑 − 𝒓⟩, 𝑑̇ 2 = ⟨𝑵, 𝒑̇ − 𝒓⟩̇ − 𝑠𝜅⟨𝑻, ̇ 𝒑 − 𝒓⟩ +
𝑷1 𝑷2 𝒑̇ + 𝑷3
Bishop parametrization A similar procedure leads to the relations de ining the robot guidance point veloc‐ ities with respect to the Bishop frame. Taking into account equation (4), equation (18) may be rewritten as
d𝑠
Preservation of this condition allows one to decrease the dimensionality of the problem. However, as a con‐ sequence the virtual object associated with the local frame cannot move freely along the given path, but according to a strictly de ined function. Moreover, equation (18) is a source of potential singularities due to the form of the denominator. It also causes the object description to be valid only locally.
2023
Evolution of the state (22) is de ined by equa‐ tions (19), (21b) and (21c). It takes the following form
(17)
Based on equation (17) the expression describing the curvilinear velocity of the local frame may be derived [20] ⟨𝑻, 𝒑̇ − 𝒓⟩̇ . (18) 𝑠̇ = − d𝑻 ,𝒑 − 𝒓
N∘ 3
𝑷1 𝑷2 𝒑̇ + 𝑷3
𝑅1 𝑅2 𝑅3
= 𝑷𝒑̇ + 𝑹.
(28)
Application of the Bishop parametrization undoubtedly allows achieving the same structure of the equations, whereas expressions de ining dynamics of values of 𝑑2 – (27b) – and 𝑑3 – (27c) – are much simpler. However, it needs to be emphasized that the denominator form in equation (24) is more complex than in equation (19). Nonetheless, for both cases it is impossible to move precisely
Journal of Automation, Mobile Robotics and Intelligent Systems
on the path as then 𝒑 − 𝒓 = 0 holds. As a result, the orthogonal parametrization in the presented form allows only asymptotic path following—the given curve cannot be reached. Furthermore, the parametrized curve cannot be straight, because for straight lines 𝜅(𝑠) = 𝑘1 (𝑠) = 𝑘2 (𝑠) = 0. For both curvilinear parametrization methods those cases lead to a singularity resulting from the orthogonal projection assumptions. 3.3. Orthogonal Singularity Avoidance
d𝒓 𝑠̇ = ⟨𝑻, 𝑻⟩ 𝑠̇ = 𝑠.̇ d𝑠
(29)
Introducing equation (29) into (18) leads to the equa‐ tion ⟨𝑻, 𝒑⟩ ̇ . (30) 𝑠̇ = − d𝑻 ,𝒑 − 𝒓 − 1 d𝑠
It needs to be emphasized that this reformulation of the equation allows us to reach the path using the orthogonal parametrization as now it is required that d𝑻 , 𝒑 − 𝒓 ≠ 1. In particular, this assumption is met d𝑠 d𝑻 when , 𝒑 − 𝒓 = 0. It means that the controlled d𝑠
robot may move along the desired path. However, the problem is still local due to the form of the denomina‐ tor of expression (30). It is noteworthy, though, that it might be assured that d𝑻 , 𝒑 − 𝒓 < 1, d𝑠
N∘ 3
2023
frame, given by equation (14), leads to the following equations 𝑠̇ = −
̇ ⟨𝑻, 𝒑⟩ 𝑻𝑇 =− 𝒑̇ = 𝑷1 𝒑,̇ (32) 𝜅 ⟨𝑵, 𝒑 − 𝒓⟩ − 1 𝜅𝑑2 − 1
𝑑̇ 2 = ⟨𝑵, 𝒑̇ − 𝒓⟩̇ − 𝑠𝜅⟨𝑻, ̇ 𝒑 − 𝒓⟩ + 𝑠𝜏⟨𝑩, ̇ 𝒑 − 𝒓⟩ = ̇ ⟨𝑻, 𝒑⟩ = ⟨𝑵, 𝒑̇ − 𝒓⟩̇ − 𝑠𝜅𝑑 ̇ 1− 𝜏𝑑 = 𝜅𝑑2 − 1 3 =0
As it has been shown in the previous subsection, the singularity of the orthogonal projection of a robot on the given path is located directly on the path. Due to increasing uncertainty of the problem, the previous solution may be considered only for approaching the path. Thus, a reformulation of equation (18) may be necessary. Based on the de inition of the tangential vector of the local frame associated with the curve, given by equation (1a), the following relation may be observed ⟨𝑻, 𝒓⟩̇ = 𝑻,
VOLUME 17,
= 𝑵−𝜏
𝑻𝑑3 = 𝑵−𝜏 𝜅𝑑2 − 1
Serret–Frenet parametrization Let us consider equa‐ tion (30) in the de inition of the time‐derivatives of the vector 𝝃 elements. Taking into account de inition of the robot position with respect to the Serret–Frenet local
𝑇
𝒑̇ − ⟨𝑵, 𝑻⟩ 𝑠̇ = 𝑷2 𝒑,̇ (33) =0
𝑑̇ 3 = ⟨𝑩, 𝒑̇ − 𝒓⟩̇ − 𝑠𝜏⟨𝑵, ̇ 𝒑 − 𝒓⟩ = ̇ ⟨𝑻, 𝒑⟩ = ⟨𝑩, 𝒑̇ − 𝒓⟩̇ + 𝜏𝑑 = 𝜅𝑑2 − 1 2 = 𝑩+𝜏 = 𝑩+𝜏
𝑻𝑑2 , 𝒑̇ − ⟨𝑩, 𝒓⟩̇ = 𝜅𝑑2 − 1
𝑻𝑑2 𝜅𝑑2 − 1
𝑇
𝒑̇ − ⟨𝑩, 𝑻⟩ 𝑠̇ = 𝑷3 𝒑.̇ (34) =0
Equations (32), (33), and (34) correspond to equa‐ tions (19), (21b), and (21c), respectively. It may be observed that after the translation of the orthogonal singularity the aforementioned equations can also be written in the concise form 𝝃̇ =
𝑷1 𝑷2 𝒑̇ + 𝑷3
0 0 0
= 𝑷𝒑̇ + 𝑹,
(35)
where the vector 𝑹 is equal to zero. It is worth noticing that for the Serret–Frenet parametrization the con‐ straint (31) takes the form 𝑑2 <
(31)
i.e., the manipulator is close enough to the path during the whole motion, so the orthogonal singularity is always avoided. Although equation (31) is quite con‐ servative and may be violated by some large oscilla‐ tions, especially in the transient state, it signi icantly broadens the usability of the orthogonal projection in comparison with the previous approach presented in Section 3.2. Hence, the equations describing robot evolution in the local frame, de ined by equations (23) and (28) for the Serret–Frenet parametrization and the Bishop parametrization, respectively, may be properly rewritten.
𝑻𝑑3 , 𝒑̇ − ⟨𝑵, 𝒓⟩̇ = 𝜅𝑑2 − 1
1 , 𝜅
(36)
which indicates that the robot should always be located close enough to the curve in the normal vec‐ tor direction. In particular, it may be conservatively assumed that the robot guidance point is close enough to the path along the whole curve by introducing the maximal value of the curvature in equation (36). As a result, the singularity avoidance is always guar‐ anteed. Bishop parametrization A similar approach may be applied for the Bishop parametrization. The analo‐ gous procedure leads us to the reformulation of equa‐ tion (28) 𝑻𝑇 𝒑̇ = 𝑷1 𝒑,̇ 𝑘1 𝑑2 + 𝑘2 𝑑3 − 1 𝑑̇ 2 = 𝑵1𝑇 𝒑̇ = 𝑷2 𝒑,̇ 𝑑̇ 3 = 𝑵𝑇2 𝒑̇ = 𝑷3 𝒑,̇ 𝑠̇ = −
(37a) (37b) (37c) 51
Journal of Automation, Mobile Robotics and Intelligent Systems
which in the concise form is expressed as 𝝃̇ =
𝑷1 𝑷2 𝒑̇ + 𝑷3
0 0 0
= 𝑷𝒑̇ + 𝑹.
(38)
It may be observed that equations (37) correspond to equations (25), (27b), and (27c). Furthermore, it is worth noticing that again the interpretation of the constraint (31) for the Bishop parametrization, 𝑘1 𝑑2 + 𝑘2 𝑑3 < 1,
(39)
is much more complicated as it is a linear combina‐ tion of the robot positions in both normal vectors’ directions. However, the new form of the denominator in expression (30) allows not only moving precisely along the given path, but also following zero‐curvature paths. Thus, using this approach one may bene it from all Bishop parametrization advantages. It is noteworthy that for both parametrization methods the vector 𝑹 is reduced to the zero vector. However, it is involved in equations (35) and (38) in order to use the same general form of the kinematic controller as earlier. Moreover, it may be observed that the matrix 𝑷 tends to the transposed matrix 𝑺 while the robot guidance point tends to the local frame associated with the curve. It results from the fact that the positions in both normal vectors directions tend to zero. Hence, it may be formulated that 𝑑2 ,𝑑3 →0
𝑷 −−−−−−→ 𝑺𝑇 .
N∘ 3
2023
means that the path is de ined only geometrically and the desired motion of a local frame associated with the path may be chosen arbitrarily. Although the evolution of the local frame is time‐dependent, the choice of its velocity pro ile is the secondary sub‐problem of the path following task. In general, the path can be still understood as a pure geometrical object. The equations describing robot motion following the path have a structure which is similar to non‐ holonomic constraints. Thus, the considered system has a cascade structure and the control algorithms may be designed based on the backstepping integrator method [12]. It means that the control law has to consist of two controllers working in parallel: (i) kinematic controller 𝒒̇ 𝑟𝑒𝑓 —represents a vector of embedded control inputs, which ensure realization of the task for the geometric path tracking problem if the dynamics were not present. Such a controller generates a “velocity pro ile” which can be executed in practice to follow the desired curve in ℝ3 . (ii) dynamic controller 𝒖—as a consequence of the cas‐ caded structure of the robot model, the system’s velocities cannot be commanded directly, as it is assumed in the design of kinematic controller, and instead they must be realized as the output of the dynamics driven by 𝒖. The schematic view of the control system cascade is presented in Figure 3. 4.1. Control Law – Non‐adaptive Case
(40)
Based on relation (40), it may be concluded that the matrix 𝑷 tends to be a rotation matrix transforming the end‐effector velocities expressed in the inertial frame to the reference frame associated with the path.
4. Control Problem Statement Let us consider a control task which is following the path in the three‐dimensional space by the station‐ ary holonomic manipulator. The following assumptions have to be ful illed: 1) The desired path is smooth enough, i.e., of 𝒞 2 class (for the Bishop parametrization) or of 𝒞 3 class (for the Serret–Frenet parametrization) and is located in the workspace of the manipulator, 2) The mathematical model of the manipulator following the path is de ined by two groups of equations, namely by the dynamics (10) and the kinematics (35) or (38) (i.e., description of the robot relative to the path; the form depends on the chosen parametrization method), 3) The kinematics is known precisely and parameters of the manipulator dynamics may be fully (non‐ adaptive case) or partially known (adaptive case). Only position control is taken into account. According to [11], the path following task is de ined as a problem of enforcing a robot to converge to and follow the spatial curve while a desired veloc‐ ity pro ile along the path is asymptotically tracked. It 52
VOLUME 17,
The irst stage of the cascade is the kinematic controller. It should generate velocity pro iles 𝒒̇ 𝑟𝑒𝑓 in order to follow the given curve. Taking into account the structure of equations describing the controlled object motion along the curve, (35) or (38), and also relation (9), the following kinematic controller is pro‐ posed based on [20] 𝒒̇ 𝑟𝑒𝑓 = 𝑱−1 𝑷−1 𝝃̇ 𝑑 − 𝑲𝑘 𝒆𝝃 − 𝑹 ,
(41)
where 𝒆𝝃 = 𝝃 − 𝝃𝑑 is the path following error, 𝝃𝑑 is the vector of the desired state of the real object with respect to the path, and 𝑲𝑘 is the positive‐de inite matrix of control coef icients. In equation (41), it is assumed that only non‐singular con igurations of a non‐redundant manipulator are taken into account. Thus, the inverse of the Jacobi matrix 𝑱 is considered. The second stage of the cascade is the dynamic controller which guarantees following the velocity pro ile 𝒒̇ 𝑟𝑒𝑓 by generating proper control generalized forces 𝒖. The control algorithm is designed based on the fully known model of the manipulator dynam‐ ics (10). The following dynamic controller is proposed ̇ 𝒒̇ 𝑟𝑒𝑓 + 𝑫(𝒒) − 𝑲𝑑 𝒆̇ 𝒒 , (42) 𝒖 = 𝑴(𝒒)𝒒̈ 𝑟𝑒𝑓 + 𝑪(𝒒, 𝒒) where 𝒆̇ 𝒒 = 𝒒̇ − 𝒒̇ 𝑟𝑒𝑓 is the vector of the velocity pro‐ iles following errors, and 𝑲𝑑 is the positive‐de inite coef icient matrix. Such a designed system guarantees asymptotic convergence of the de ined errors (𝒆𝝃 , 𝒆̇ 𝒒 ) to zero.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
Desired path generator Control object
𝒒, 𝒒̇
Kinematic controller
𝒖
𝒒̇ 𝑟𝑒𝑓
Dynamic controller
𝒓(𝑠) 𝑠̇𝑑 Velocity pro ile along path
Figure 3. Control system structure Proof of the non-adaptive case. Based on the backstepping integrator approach [12], the stability of the kinematic and the dynamic control laws may be considered separately with the usage of the Lyapunov theory. Thus, the kinematic level of the cascade is taken into account irst. The system (35) or (38), depending on the chosen parametrization method, in the closed‐feedback loop with the control law (41) may be expressed with the equation 𝒆̇ 𝝃 + 𝑲𝑘 𝒆𝝃 = 0.
(43)
Let us de ine the Lypaunov‐like function 𝑉1 (𝒆𝝃 ) =
1 𝑇 𝒆 𝒆 . 2 𝝃 𝝃
(44)
Thus, the time derivative of 𝑉1 is equal to 𝑉1̇ (𝒆𝝃 ) = 𝒆𝑇𝝃 𝒆̇ 𝝃 = 𝒆𝑇𝝃 (−𝑲𝑘 𝒆𝝃 ) = −𝒆𝑇𝝃 𝑲𝑘 𝒆𝝃 = = −𝑊1 (𝒆𝝃 ) ≤ 0.
(45)
It is clear that the path following errors 𝒆𝝃 con‐ verge asymptotically to zero based on the LaSalle invariance principle [3]. It con irms that the subsys‐ tem on the kinematic level of the cascade is asymp‐ totically stable with zero equilibrium point for the positive‐de inite matrix 𝑲𝑘 . Secondly, the dynamic level needs to be consid‐ ered. It it noteworthy that the results from the previ‐ ous cascade stage have to be taken into account. Thus, the function form of the dynamic level errors is strictly de ined and depends on the reference velocity pro iles given by equation (41). Let us de ine the Lyapunov‐like function for the second level of the control system cas‐ cade. It depends on both the path following errors 𝒆𝝃 and the velocity pro ile following errors 𝒆̇ 𝒒 𝑉2 (𝒆𝝃 , 𝒆̇ 𝒒 ) = 𝑉1 (𝒆𝝃 ) +
1 𝑇 𝒆̇ 𝑴(𝒒)𝒆̇ 𝒒 . 2 𝒒
(46)
The function 𝑉2 is non‐negative due to positive def‐ initeness of the inertia matrix 𝑴 and the de inition of the function 𝑉1 given by equation (44). The time derivative of 𝑉2 is equal to 𝑉̇2 (𝒆𝝃 , 𝒆̇ 𝒒 ) = 𝑉1̇ (𝒆𝝃 ) + 𝒆̇ 𝑇𝒒 𝑴𝒆̈ 𝒒 +
1 𝑇 𝒆̇ 𝑴̇ 𝒆̇ 𝒒 . 2 𝒒
(47)
Considering the system (10) in the closed‐feedback loop with the control law (42) 𝑴𝒆̈ 𝒒 + 𝑪𝒆̇ 𝒒 + 𝑲𝑑 𝒆̇ 𝒒 = 0,
(48)
and the skew‐symmetry between the inertia matrix 𝑴 and the Coriolis and centrifugal forces matrix 𝑪 [34] 𝑴̇ = 𝑪 + 𝑪𝑇 ,
(49)
equation (47) may be rewritten as 𝑉̇2 (𝒆𝝃 , 𝒆̇ 𝒒 ) = 𝑉1̇ (𝒆𝝃 ) − 𝒆̇ 𝑇𝒒 𝑲𝑑 𝒆̇ 𝒒 = = −𝑊2 (𝒆𝝃 , 𝒆̇ 𝒒 ) ≤ 0.
(50)
Once again, the LaSalle invariance principle [3] may be harnessed. Based on that theorem it is proven that the errors (𝒆𝝃 , 𝒆̇ 𝒒 ) asymptotically converge to zero for the positive‐de inite matrix 𝑲𝑑 . Thus, the system (48) has an asymptotically stable zero equilibrium point. It is noteworthy that the stability analysis for the dynamic level includes the trajectories of the errors achieved on the previous level. Hence, the proposed Lyapunov‐ like functions con irm that the designed control law guarantees the correct path following and the control system cascade is stable on all levels. 4.2. Control Law – Adaptive Case In the previous subsection, the full knowledge about the manipulator’s dynamics has been assumed. Now the case with the partial knowledge about the dynamics will be considered. For the partially known model of the manipulator’s dynamics (11), the following dynamic adaptive control law is proposed ̇ 𝒒̇ 𝑟𝑒𝑓 + 𝑫0 (𝒒)+ 𝒖 = 𝑴0 (𝒒)𝒒̈ 𝑟𝑒𝑓 + 𝑪0 (𝒒, 𝒒) ̂ + 𝒀(𝒒̈ 𝑟𝑒𝑓 , 𝒒̇ 𝑟𝑒𝑓 , 𝒒,̇ 𝒒)𝒂(𝑡) − 𝑲𝑑 𝒆̇ 𝒒 ,
(51)
̂ where 𝒂(𝑡) is the current estimate of the vector of unknown parameters generated due to the adaptation law de ined below ̇̂ ̃̇ 𝒂(𝑡) = 𝒂(𝑡) = −𝜞𝒀𝑇 (𝒒̈ 𝑟𝑒𝑓 , 𝒒̇ 𝑟𝑒𝑓 , 𝒒,̇ 𝒒)𝒆̇ 𝒒 .
(52)
̃ ̂ The symbol 𝒂(𝑡) = 𝒂(𝑡) − 𝒂 denotes the difference between the estimated values and the unknown con‐ stant real values of the parameters, and 𝜞 = diag {𝛾} is a positive‐de inite matrix of adaptation gains. 53
Journal of Automation, Mobile Robotics and Intelligent Systems
Proof of the adaptive case. The adaptive case applies only to the equations of the dynamics, i.e., the second stage of the cascade. Therefore, the form of the kine‐ matic controller and the proof of convergence for the irst stage of the cascade remain unchanged. The equations of the system (11) with the closed loop of the control law (51) and the adaptation law (52) can be calculated as follows (for the transparency of the notation the matrix arguments are neglected): 𝑴𝒒̈ + 𝑪𝒒̇ + 𝑫
= 𝑴0 𝒒̈ 𝑟𝑒𝑓 + 𝑪0 𝒒̇ 𝑟𝑒𝑓 + 𝑫0 + 𝒀𝑟 𝒂̂ + − 𝑲𝑑 𝒆̇ 𝒒 = =
𝑴0 𝒒̈ 𝑟𝑒𝑓 + 𝑪0 𝒒̇ 𝑟𝑒𝑓 + 𝑫0 + 𝒀𝑟 𝒂̂ + + 𝒀𝑟 𝒂 − 𝒀𝑟 𝒂 − 𝑲𝑑 𝒆̇ 𝒒 =
=
𝑴𝒒̈ 𝑟𝑒𝑓 + 𝑪𝒒̇ 𝑟𝑒𝑓 + 𝑫 + 𝒀𝑟 𝒂̃ + − 𝑲𝑑 𝒆̇ 𝒒
VOLUME 17,
N∘ 3
2023
𝑍0 𝑙2
𝑞3 𝑙3
𝑞2 𝑌0 𝑞1 𝑋0 Figure 4. Structure of the RTR manipulator
After rewriting the above equation, one gets the form of the closed‐loop system 𝑴(𝒒̈ − 𝒒̈ 𝑟𝑒𝑓 ) + 𝑪(𝒒̇ − 𝒒̇ 𝑟𝑒𝑓 ) − 𝒀𝑟 𝒂̃ + 𝑲𝑑 𝒆̇ 𝒒 = = 𝑴𝒆̈ 𝒒 + 𝑪𝒆̇ 𝒒 − 𝒀𝑟 𝒂̃ + 𝑲𝑑 𝒆̇ 𝒒 = 0. (53) For the second stage—the dynamics level—the Lyapunov‐like function for the adaptive case has the form 1 𝑇 𝒆̇ 𝑴(𝒒)𝒆̇ 𝒒 + 2 𝒒 1 ̃ + 𝒂̃ 𝑇 (𝑡)𝜞−1 𝒂(𝑡). 2
̃ 𝑉2𝑎 (𝒆𝝃 , 𝒆̇ 𝒒 , 𝒂(𝑡)) = 𝑉1 (𝒆𝝃 ) +
(54)
(a) Comparison of different parametrization methods
Time derivative of the 𝑉2𝑎 function is equal to 𝑉̇2𝑎 = 𝑉1̇ + 𝒆̇ 𝑇𝒒 𝑴(𝒒)𝒆̈ 𝒒 +
1 𝑇 ̇ 𝒆̇ 𝑴(𝒒) 𝒆̇ 𝒒 + 2 𝒒 + 𝒂̃ 𝑇 𝜞−1 𝒂.̃̇
(55)
Putting the closed‐loop system’s dynamics (53), the time derivative of the 𝑉1 (45) and the adaptation law (52) into the equation (55) we obtain 𝑉̇2𝑎 (𝒆𝝃 , 𝒆̇ 𝒒 ) = 𝑉1̇ (𝒆𝝃 ) − 𝒆̇ 𝑇𝒒 𝑲𝑑 𝒆̇ 𝒒 = = −𝑊2𝑎 (𝒆𝝃 , 𝒆̇ 𝒒 ) ≤ 0.
(56)
From the LaSalle theorem we can conclude that the invariant set is de ined by 𝑊2𝑎 (𝒆𝝃 , 𝒆̇ 𝒒 ) = 0. As a consequence, the errors (𝒆𝝃 , 𝒆̇ 𝒒 ) asymptotically con‐ verge to zero and the point (𝒆𝝃 , 𝒆̇ 𝒒 ) = (0, 0) is the asymptotically stable equilibrium point of the closed‐ loop system. It is worth to be mentioned that from the proof dedicated to the adaptive case of the dynamic control it has been only shown that the estimation errors 𝒂̃ remain limited and do not necessarily tend to zero. This ends the proof.
5. Simulation Study Numerical simulations were conducted in order to verify the performance of the presented algorithms. It was assumed that the robot was a non‐redundant 54
(b) Zoomed view of the graph
Figure 5. Plots of path following error 𝑒𝜉1 = 𝑠 − 𝑠𝑑 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily
holonomic stationary manipulator of three degrees of freedom (rotational, translational and rotational). Its schematic view is presented in Figure 4 [20]. The manipulator parameters were chosen as follows. The manipulator links weighed 20kg and their lengths were equal to 𝑙2 = 1.5m and 𝑙3 = 1m.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
(a) Comparison of different parametrization methods
(a) Comparison of different parametrization methods
(b) Zoomed view of the graph
(b) Zoomed view of the graph
Figure 6. Plots of path following error 𝑒𝜉2 = 𝑑2 − 𝑑2𝑑 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily
Figure 7. Plots of path following error 𝑒𝜉3 = 𝑑3 − 𝑑3𝑑 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily
The considered control task was following the cylindrical helix de ined with equation [27]
move along it. The velocity pro ile along the curve de ined for the local reference frame resulting from the chosen parametrization method was constant dur‐ ing the whole motion and equal to
𝑠
𝒓(𝑠) = 𝑎 cos 𝑐
𝑎 sin
𝑠 𝑐
𝑏𝑠 𝑇 , 𝑐
(57)
where 𝑐 = √𝑎2 + 𝑏 2 . The following values of the helix parameters were assumed: 𝑎 = 1, 𝑏 = 0.1. Different de initions of the local frame associated with the given curve were taken into account: the Serret–Frenet frame, the Bishop frame based on the de inition (4), and the Bishop frame derived from the Serret–Frenet frame according to the transformation (7). Further‐ more, the Bishop frame was initialized in two con ig‐ urations. One of them was equal to the Serret–Frenet frame in the initial state. For the second case the frame was initialized with an arbitrary choice of the nor‐ mal vectors, different from the Serret–Frenet frame. For the transformation (7), the value 𝜃(0) = 0 was assumed. As a result, for this case the Serret–Frenet frame and the Bishop frame, which is created based on the previous one, were also equal to each other in the initial state. Moreover, the following coef icient matri‐ ces were taken into account: 𝑲𝑘 = diag3×3 {0.5}, 𝑲𝑑 = diag3×3 {100}. Simulations lasted 𝑡 = 30𝑠. The robot was expected to reach the desired path and
𝑠̇𝑑 =
1 . 5
(58)
5.1. Singularity on the Path Firstly, the simulation case when the orthogo‐ nal singularity is located on the path was analyzed. In Figures 5–7 the path following errors are pre‐ sented for different parametrization methods. For every graph the magni ication of the view is also pro‐ vided. The presented graphs con irm that the robot approaches the path correctly as the end‐effector posi‐ tion errors in the local frame, associated with the given curve, converge toward zero. It is noteworthy that the control system may behave differently depending on the chosen parametrization method. In particular, it may be observed for the Bishop frames de ined for various initial conditions. However, their evolution is always the same and described with equation (4). It may be observed in Figures 6 and 7 that due to different initial conditions of the Bishop frame, there 55
Journal of Automation, Mobile Robotics and Intelligent Systems
(a) Control torques 𝑢1
VOLUME 17,
N∘ 3
2023
rotations with respect to the curve, which are visual‐ ized in Figure 18. Furthermore, Figure 8 shows control generalized forces for every joint which allowed the robot to achieve the presented performance. It is noteworthy that the differences between parametrization meth‐ ods are not relevant. However, it might be worrying that at a certain time moment the trajectories presented in Figure 5b begin to move away from the zero value. It means that the constraint of the orthogonal projection, i.e., the end‐effector and the virtual object that are in the closest proximity, is disrupted. Thus, it may be taken under discussion if the assumption that the manip‐ ulator tip is located in the normal plane of the local frame is satis ied. This phenomenon may be inter‐ preted as dif iculties of the local frame associated with the curve in maintaining the pace of the end‐effector motion. Such behavior results from the fact that the system gets close to the orthogonal singularity de ined with equation (18). The values of the denominator of this expression for different parametrization methods are shown in Figure 9. It is worth noticing that the
(b) Control forces 𝑢2
(a) Comparison of different parametrization methods (c) Control torques 𝑢3
Figure 8. Plots of control torques and forces 𝒖 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily
are different trajectories of path following errors in the normal vectors directions. However, for every case the errors converge to zero. In contrast, the trajecto‐ ries for the 𝑒𝜉1 error are the same. It means that the local frame moves along the curve in the same way in spite of different evolution de initions. It results from the fact that all considered parametrization methods share the de inition of the tangential vector 𝑻 and the same velocity pro ile along the path is taken into account. The differences may be observed in the frame 56
(b) Zoomed view of the graph
Figure 9. Value of the denominator of the expression (18) defining curvilinear velocity 𝑠̇ 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
(a) Comparison of different parametrization methods (a) Plots of path following error 𝑒𝜉2 = 𝑑2 − 𝑑2𝑑
(b) Zoomed view of the graph
Figure 10. Plots of path following error 𝑒𝜉1 = 𝑠 − 𝑠𝑑 for the avoided singularity 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily value of this expression converges to zero while the manipulator end‐effector gets closer to the path. It con irms that the presented solution may be applied only for asymptotic approaching the path. The closer to the singularity the system is, the higher inaccuracy may be observed in the trajectories associated with error 𝑒𝜉1 . However, if the algorithm is robust to getting close to the singularity, the presented approach may be applied to reach any positions outside the given path. The geometrical interpretation of the denom‐ inator form for the Serret–Frenet parametrization, which is given in equation (19), is quite clear. It is strictly connected with the motion in the normal vec‐ tor 𝑵 direction. Thus, taking into account proper lim‐ itations for motion in this direction may guarantee that the singularity is not reached. However, in the Bishop frame version of the denominator, which is presented in equation (24), the combined motion in both normal vectors’ directions is considered. Hence, it may be more dif icult to de ine the manipulator motion avoiding the orthogonal singularity for this method. 5.2. Orthogonal Singularity Avoidance Another way to avoid the orthogonal singularity is to translate it away from the path. Such an approach
(b) Plots of path following error 𝑒𝜉3 = 𝑑3 − 𝑑3𝑑
Figure 11. Plots of path following errors in the normal vectors directions for the singularity avoided 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily
d𝑻
Figure 12. Value of the expression , 𝒑 − 𝒓 for the d𝑠 singularity avoided 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily 57
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
d𝑻
Figure 13. Value of the expression , 𝒑 − 𝒓 for d𝑠 different initial configurations of the RTR manipulator
(a) Control torques 𝑢1
(b) Control forces 𝑢2
(a) Path following errors 𝑒𝜉1
(b) Path following errors 𝑒𝜉2
(c) Path following errors 𝑒𝜉3
Figure 15. Plots of path following errors 𝒆𝝃 for the adaptive and non‐adaptive controllers 1 – Serret–Frenet parametrization (non‐adaptive case), 2 – Serret–Frenet parametrization (adaptive case), 3 – Bishop parametrization (non‐adaptive case), 4 – Bishop parametrization (adaptive case)
(c) Control torques 𝑢3
Figure 14. Plots of control torques and forces 𝒖 for the singularity avoided 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily 58
has been described in section 3.3. In Figures 10 and 11 the path following errors for this case are shown. It is noteworthy that the proposed reformulation does not affect the errors in the normal vectors directions. However, signi icant changes may be observed for the error 𝑒𝜉1 . This error preserves its monotonicity and converges to zero. It is the result of the singularity translation. The denominator of the expression (30) tends to −1 as the robot approaches the path. The d𝑻
value of the expression , 𝒑 − 𝒓 is presented in d𝑠 Figure 12. That value is lower than zero during the
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
(a) Control torques 𝑢1
(a) Estimation error 𝑎̃ 1
(b) Control forces 𝑢2
(b) Estimation error 𝑎̃ 2
N∘ 3
2023
Figure 17. Plots of estimation errors 𝒂̃ but they are lower than 1. It means that for the con‐ sidered path there is no singularity resulting from the orthogonal projection. In Figure 14 generalized control forces are pre‐ sented. They allowed the manipulator to perform the path following task. It is noteworthy that they are the same as that of the base solution case (see Fig. 8), although the form of the description of the robot with respect to the path has changed. (c) Control torques 𝑢3
5.3. Adaptive Controller
Figure 16. Plots of control torques and forces 𝒖 for the adaptive and non‐adaptive controllers 1 – Serret–Frenet parametrization (non‐adaptive case), 2 – Serret–Frenet parametrization (adaptive case), 3 – Bishop parametrization (non‐adaptive case), 4 – Bishop parametrization (adaptive case)
whole motion. As a result, the local constraints given by equations (36) and (39) are satis ied. It shows that it is quite easy to meet those assumptions for the cylindrical helix, even though the form for the Bishop parametrization is more complicated. More‐ d𝑻
over, in Figure 13 values of the expression ,𝒑 − 𝒓 d𝑠 for different initial con igurations of the RTR manip‐ ulator are presented. They are the same for both parametrization methods. It may be observed that the highest values are located in the middle of the helix,
In the last simulation case study the partial knowledge of the dynamic parameters of the manip‐ ulator was assumed. This section proves that the pro‐ posed control algorithms may be applied even if the dynamic model is not fully known. At the second stage of the control cascade the control law given with equation (51) was considered. It was assumed that there are two parameters which values are unknown: 𝑎1 = 𝑚3 𝑙3 , 𝑎2 = 𝑚3 𝑙3 𝑙2 . The gain matrix of the adaptation law (52) was equal to 𝜞 = diag2×2 {50} and the estimated values 𝒂̂ were initialized as zero values, 𝑇 ̂ i.e., 𝒂(0) = 0 0 . In Figure 15 the path following errors are pre‐ sented. The performance of the adaptive controller was compared with the performance of the non‐ adaptive one applied to the model with the fully known dynamics. Results for both the Serret–Frenet parametrization and the Bishop parametrization are shown. It is noteworthy that for the adaptive case 59
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
(a) Serret–Frenet parametrization
(b) Bishop parametrization derived from the Serret–Frenet parametrization
(c) Bishop parametrization initialized as the Serret–Frenet frame
(d) Bishop parametrization initialized arbitrarily
Figure 18. Evolution of the local frame along the given curve depending on the parametrization method
Figure 19. Evolution of the local frame orientation presented in the unit quaternion space (real part of the quaternion neglected) 1 – Serret–Frenet parametrization, 2 – Bishop parametrization based on the Serret–Frenet frame, 3 – Bishop parametrization initialized as the Serret–Frenet frame, 4 – Bishop parametrization initialized arbitrarily the pace of convergence is slightly slower. However, the errors converge to zero and the orthogonal sin‐ gularity is avoided. Furthermore, the choice of the parametrization method is not so crucial for the 60
manipulator performance. The achieved results are comparable. In Figure 16 the control signals are compared. The adaptive controller hardly changes the generated val‐ ues. The initial peak, resulting from errors in the ini‐ tial state, is quite similar and the control generalized forces tend to the same values as for the non‐adaptive case. Finally, in Figure 17, the estimation errors of the unknown parameters are presented. It is worth emphasizing that the estimated values do not con‐ verge to the real ones. As mentioned in section 4.2, the estimated values are limited, but the proposed control law does not guarantee the correctness of the esti‐ mation. Despite that fact, the path is approached and followed properly. Thus, the adaptive controller may be applied when the manipulator dynamic parameters are only partially known or even completely unknown. Moreover, the choice of the parametrization method on the kinematic level does not have any impact on the estimation performed by the adaptive controller on the dynamic level of the control system cascade. 5.4. Local Frame Orientation Analysis Veri ication of the frames behavior along the path is also a crucial aspect of the comparative analy‐ sis. Evolution along the path of all considered local frames is presented in Figure 18. The tangent vec‐ tor 𝑻 is always denoted with a red colour, while
Journal of Automation, Mobile Robotics and Intelligent Systems
(a) Orientation of the Serret–Frenet frame
(b) Orientation of the Bishop frame initialized as the Serret–Frenet frame
(c) Orientation of the Bishop frame initialized arbitrarily
Figure 20. Orientation of the local frame presented in the unit quaternion space normal vectors of the local reference frame are denoted with blue and green. For the Serret–Frenet parametrization, they refer to the normal vector and the binormal vector, respectively. For the Bishop,
VOLUME 17,
N∘ 3
2023
frame the blue vector corresponds with the vector 𝑵1 and the green one with the vector 𝑵2 . It is notewor‐ thy that the Serret–Frenet frame preserves its normal vector directed to the center of the helix. It results from the fact that the frame comes back to the same orientation every time when the curve goes through the same axis of the coordinate frame [10]. The Bishop frame cannot be characterized with this feature. Non‐ zero torsion of the helix results in frame rotations with a constant speed. However, its evolution guarantees reduction of the performed rotations around the curve during the whole motion [30]. Figure 18 shows that the same evolution may be achieved by transforming the Serret–Frenet frame to the Bishop frame (Fig. 18b) or by de ining the Bishop frame directly from the def‐ inition based on the relatively parallel transport of vectors (Fig. 18c). Different initial conditions of the Bishop frame are visualized in Figure 18d. The frame in the initial state is rotated in comparison with the frames presented in Figures 18b and 18c. However, it may be observed that it evolves along the curve exactly in the same manner. As mentioned earlier, the Bishop parametrization minimizes rotations of the local frame along the curve. It may be observed in the unit quaternion space. In Fig‐ ure 19, trajectories of frames’ rotation evolution are presented. In the igure the trajectories are placed in the unit sphere. Moreover, the value of the real part of the quaternion is neglected. It is worth noticing that the trajectory of the Serret–Frenet frame rotation is a closed curve. It corresponds with the fact that the normal vector is directed to the center of the curva‐ ture all the time. Furthermore, the initial point of the Bishop frame initialized in the Serret–Frenet con igu‐ ration (no. 2 and 3) is identical as the Serret–Frenet initial point. However, the evolution is completely dif‐ ferent. In addition, characteristics of the motion of the Bishop frame is similarly independent of the initial con iguration. Finally, in Figure 20 the values of quaternions for respective local frames are presented. It may be noticed that the real part value is quite high, in the sense of absolute value, for the trajectories parts that are far away from the unit sphere surface. This may be observed particularly for the Serret–Frenet frame. The real part value of the Bishop frame is signi icantly lower as the trajectories are often placed on the sphere
Table 1. Comparison of the orthogonal parametrization methods Feature
Serret–Frenet parametrization
Bishop parametrization
Minimal class of a curve
𝒞3
𝒞2
De ined for zero‐curvature points
No
Yes
De inition in the initial state
Determined
Rotations minimization
Around the normal vector 𝑵
Arbitrary Around the tangent vector 𝑻 (along a curve)
Geometrical invariants
𝜅, 𝜏
𝑘1 , 𝑘2
Orthogonal projection constraint
𝜅𝑑2 ≠ 1
Robot description with respect to a curve
𝝃̇ = 𝑷𝒑̇ + 𝑹
𝑘1 𝑑2 + 𝑘2 𝑑3 ≠ 1 𝝃̇ = 𝑷𝒑̇ + 𝑹
61
Journal of Automation, Mobile Robotics and Intelligent Systems
surface. The lengths of these trajectories con irm the fact that the Bishop frame minimizes rotations along the curve. The length of the trajectory of the Serret– Frenet frame is equal to 14.9226, whereas for the Bishop frame it is equal to 14.8539 independently of its initial condition. The difference may be even more signi icant for other curves or curve parameters.
6. Conclusion In this paper, the solution to the path following problem for stationary manipulators has been addressed. As a tool for following a curve in the three‐dimensional space two different orthogonal parametrizations have been checked, namely the Serret–Frenet parametrization and the Bishop parametrization. The orthogonal parametrization methods allow one to minimize the control problem dimensionality as the position in the tangent vector direction is ixed. However, they are correct only locally due to some constraints resulting from the orthogonal projection assumption. There are some important differences between parametrization methods. They may play a crucial role while considering some more sophisticated paths than the cylindrical helix presented in the simula‐ tions. Despite a more intuitive geometrical interpre‐ tation of the Serret–Frenet frame, it imposes more constraints on a parametrized curve. First of all, Serret–Frenet parametrization is unde ined in the zero‐curvature points which means that many paths, including straight lines, cannot be parametrized. The Bishop frame is a solution for this problem. More‐ over, the Bishop frame orientation may be chosen arbitrarily in the initial state. Although the presented behavior of the Serret–Frenet frame, which always points its normal vector to the center of the helix, may be desirable in various applications, the usage of the Bishop frame, which minimizes rotations around the curve, may appear much more bene icial for the direct control of orientation in the ℝ3 space. The brief com‐ parison of the analyzed orthogonal parametrization methods is presented in Table 1. It is very important to remember that orthogonal parametrizations are valid only locally due to singu‐ larities occurring in the description of the methods. In the previous approach the singularity was located on the path. Due to that fact, the proposed control algorithms allowed the robot only to approach the path asymptotically, not to move along the path. In this paper, it has been shown that the orthogonal parametrization may be transformed, i.e., that the sin‐ gularity can be translated to a different location. Due to that fact, a path may be reached and a robot may move precisely along it. The presented translation of the orthogonal singularity solves the problem of the system excitation on the path. Not only can a robot move along the path, but also following zero‐curvature paths for the Bishop frame is possible. It is strongly recommended to consider the robot description ver‐ sion with the orthogonal singularity outside the path for control applications. 62
VOLUME 17,
N∘ 3
2023
The simulation study presented in the article let us verify the orthogonal path following algorithms based on different parametrization methods (Serret– Frenet, Bishop). For the cases with the orthogonal projection singularity outside the path the asymp‐ totic convergence of the path following errors to zero was con irmed. Thus, the path following task was ful illed correctly. The proposed control algo‐ rithms allowed the robot to approach the path and follow it while the desired velocity pro ile along the path was asymptotically tracked. In fact, the manip‐ ulator performance was independent of the chosen parametrization method as only the end‐effector posi‐ tion was taken into account in the control task. It may be noticed that the generated control signals were similar for all simulation cases. Moreover, the form of the general description of the robot with respect to the local frame associated with the path (the irst stage of the cascade) is the same for both methods. In this article, the case with the partially known dynamics was also considered. Although the pace of error convergence was slower for the adaptive con‐ troller, it allowed the robot to follow the desired path correctly. It con irmed that despite the lack of knowl‐ edge of the dynamic parameters, the proposed control algorithms allow a robot to follow the desired path. It is worth noticing that the assumption of the fully known dynamics of the manipulator is not crucial for the presented algorithms. In general, it shows that any dynamic controller, which guarantees the asymptotic convergence of the velocity pro iles’ errors to zero, lets us succeed in the path following task. The object description with respect to the local frame associated with the given path is not affected by the change of the dynamic controller. Indeed, the kinematic controller designed for the irst stage of the cascade is respon‐ sible for generating velocity pro iles that guarantee motion along the given curve. The second stage of the control system cascade may be chosen arbitrarily according to the other requirements, e.g., the knowl‐ edge of the dynamic parameters. In further research, it may be important to inves‐ tigate control algorithms which allow controlling the robot orientation with respect to the local frame asso‐ ciated with the path. However, the considered robot will have to have more degrees of freedom than the RTR manipulator. It will be a veri ication of the pre‐ sented features of the rotation changes of the consid‐ ered frames. One may bene it from the Bishop frame properties in order to choose the local frame behavior appropriate for the considered application. Furthermore, the impact of measurement noise on the robot behavior should be veri ied. On the irst stage of the cascade, the kinematic level, the sys‐ tem may be quite close to the orthogonal singularity. Thus, some uncertainty may play a vital role in the manipulator performance. Even if the singularity is translated, the system may reach the singularity when approaching the path. Due to that fact, some research may be conducted according to [15]. Finally, some global algorithms should be taken into consideration. Hence, algorithms based on the
Journal of Automation, Mobile Robotics and Intelligent Systems
non‐orthogonal parametrization, in particular the Bishop parametrization, may be worth developing as they do not introduce any additional constraints on robots, although the task dimensionality increases.
AUTHORS Filip Dyba∗ – Department of Cybernetics and Robotics, Faculty of Electronics, Photonics and Microsystems, Wrocław University of Science and Technology, Janiszewskiego Street 11/17, Wrocław, 50 ‐ 372, Poland, e‐mail: ilip.dyba@pwr.edu.pl. Alicja Mazur – Department of Cybernetics and Robotics, Faculty of Electronics, Photonics and Microsystems, Wrocław University of Science and Technology, Janiszewskiego Street 11/17, Wrocław, 50 ‐ 372, Poland, e‐mail: alicja.mazur@pwr.edu.pl. ∗
Corresponding author
References [1] R. L. Bishop. “There is more than one way to frame a curve,” The American Mathematical Monthly, vol. 82, no. 3, 1975, pp. 246–251. [2] M. Breivik, and T. Fossen. “Principles of guidance‐based path following in 2D and 3D,” Proceedings of the 44th IEEE Conference on Decision and Control, and the European Control Conference 2005, Seville, Spain, 2005, pp. 627–634; doi: 10.1109/CDC.2005.1582226. [3] C. Canudas de Wit, G. Bastin, and B. Siciliano, Theory of Robot Control, Springer‐Verlag: London, 1996. [4] D. Carroll, E. Köse, and I. Sterling. “Improving Frenet’s Frame Using Bishop’s Frame,” Journal of Mathematics Research, vol. 5, 2013, pp. 97–106; doi: 10.5539/jmr.v5n4p97. [5] L. Consolini, M. Maggiore, C. Nielsen, and M. Tosques. “Path following for the PVTOL aircraft,” Automatica, vol. 46, no. 8, 2010, pp. 1284–1296; doi: 10.1016/j.automatica. 2010.05.014. [6] F. Dyba, and A. Mazur. “Comparison of Curvilin‐ ear Methods in Path Following Task for a Holo‐ nomic Manipulator,” A. Mazur and C. Zieliński, eds., Advances in Robotics, vol. 1, Warsaw Uni‐ versity of Technology Publishing House, 2022, pp. 33–44, (in Polish). [7] P. Encarnacao, and A. Pascoal. “3D path follow‐ ing for autonomous underwater vehicle,” Proceedings of the 39th IEEE Conference on Decision and Control, Sydney, 2000, pp. 2977–2982; doi: 10.1109/CDC.2000.914272. [8] F. Frenet. “Sur les courbes à double courbure,” Journal de Mathématiques Pures et Appliquées, 1852, pp. 437–447.
VOLUME 17,
N∘ 3
2023
[9] M. Galicki. “Adaptive Control of Kinematically Redundant Manipulator along a Prescribed Geo‐ metric Path,” K. Kozłowski, ed., Robot Motion and Control. Lecture Notes in Control and Information Sciences, vol. 335, Springer, 2006, pp. 129–139. [10] A. J. Hanson, and H. Ma, Parallel Transport Approach to Curve Framing. Technical report, Indiana University, 1995. [11] N. Hung et al. “A review of path following con‐ trol strategies for autonomous robotic vehicles: Theory, simulations, and experiments,” Journal of Field Robotics, vol. 40, no. 3, 2023, pp. 747– 779; doi: 10.1002/rob.22142. [12] M. Krstić, I. Kanellakopoulos, and P. V. Koko‐ tovic, Nonlinear and Adaptive Control Design, John Wiley & Sons, Inc., 1995. [13] X. Li, G. Zhao, and B. Li. “Generating optimal path by level set approach for a mobile robot moving in static/dynamic environments,” Applied Mathematical Modelling, vol. 85, 2020, pp. 210–230. [14] Y.‐L. Liao, M.‐J. Zhang, and L. Wan. “Serret−Frenet frame based on path following control for underactuated unmanned surface vehicles with dynamic uncertainties,” Journal of Central South University, vol. 22, 2015, pp. 214–223. [15] U. Libal, and J. Płaskonka. “Noise sensitivity of selected kinematic path following controllers for a unicycle,” Bulletin of the Polish Academy of Sciences: Technical Sciences, vol. 62, no. 1, 2014, pp. 3 – 13; doi: 10.2478/bpasts‐2014‐0001. [16] H. Liu, and D. Pei. “Singularities of a space curve according to the relatively parallel adapted frame and its visualization,” Mathematical Problems in Engineering, 2013; doi: 10.1155/2013/512020. [17] I. Lugo‐Cárdenas, S. Salazar, and R. Lozano. “Lya‐ punov Based 3D Path Following Kinematic Con‐ troller for a Fixed Wing UAV,” IFAC-PapersOnLine, vol. 50, no. 1, 2017, pp. 15946–15951; doi: 10.1016/j.ifacol.2017.08.1747, 20th IFAC World Congress. [18] A. Mazur. “Hybrid adaptive control laws solv‐ ing a path following problem for non‐holonomic mobile manipulators,” International Journal of Control, vol. 77, no. 15, 2004, pp. 1297–1306; doi: 10.1080/0020717042000297162. [19] A. Mazur. Model-based control for non-holonomic mobile manipulators, Publishing House of Wro‐ claw University of Science and Technology, 2009, (in Polish). [20] A. Mazur, J. Płaskonka, and M. Kaczmarek. “Fol‐ lowing 3D paths by a manipulator,” Archives of Control Sciences, vol. 25, no. 1, 2015, pp. 117– 133; doi: 10.1515/acsc‐2015‐0008. [21] A. Mazur, and D. Szakiel. “On path following control of nonholonomic mobile manipulators,” International Journal of Applied Mathematics 63
Journal of Automation, Mobile Robotics and Intelligent Systems
and Computer Science, vol. 19, no. 4, 2009, pp. 561–574. [22] A. Micaelli, and C. Samson. “Trajectory tracking for unicycle‐type and two‐steering‐wheels mobile robots,” Technical Report No. 2097, Sophia‐Antipolis, 1993. [23] M. M. Michałek, and D. Pazderski, Mobile robots control. Laboratory, Publishing House of Poznan University of Technology, 2012, (in Polish). [24] M. M. Michałek. “A highly scalable path‐following controller for N‐trailers with off‐axle hitching,” Control Engineering Practice, vol. 29, 2014, pp. 61–73; doi: 10.1016/j.conengprac.2014. 04.001. [25] M. M. Michałek, and T. Gawron. “VFO path fol‐ lowing control with guarantees of positionally constrained transients for unicycle‐like robots with constrained control input,” Journal of Intelligent and Robotic Systems: Theory and Applications, vol. 89, no. 1‐2, 2018, pp. 191–210; doi: 10.1007/s10846‐017‐0482‐0. [26] A. Morro, A. Sgorbissa, and R. Zaccaria. “Path following for unicycle robots with an arbi‐ trary path curvature,” IEEE Transactions on Robotics, vol. 27, no. 5, 2011, pp. 1016–1023; doi: 10.1109/TRO.2011.2148250. [27] J. Oprea. Differential Geometry and Its Applications, Prentice Hall, 2007. [28] J. Płaskonka. “Different kinematic path following controllers for a wheeled mobile robot of (2,0) type,” Journal of Intelligent & Robotic Systems, vol. 77, 2013, pp. 481–498; doi: 10.1007/s10846‐013‐9879‐6.
64
VOLUME 17,
N∘ 3
2023
[29] M. Rokonuzzaman, N. Mohajer, S. Nahavandi, and S. Mohamed. “Review and performance evaluation of path tracking controllers of autonomous vehicles,” IET Intelligent Transport Systems, vol. 15, no. 5, 2021, pp. 646–670; doi: 10.1049/itr2.12051. [30] J. M. Selig, and Y. Wu. “Interpolated rigid‐ body motions and robotics,” 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2006, pp. 1086–1091; doi: 10.1109/IROS.2006.281815. [31] J.‐A. Serret. “Sur quelques formules relatives à la théorie des courbes à double courbure,” Journal de Mathématiques Pures et Appliquées, 1851, pp. 193–207. [32] B. Siciliano, L. Sciavicco, L. Villani, and G. Ori‐ olo, Robotics: Modelling, Planning and Control, Springer, 2008. [33] D. Soetanto, L. Lapierre, and A. Pascoal. “Adaptive, non‐singular path‐following control of dynamic wheeled robots,” Proceedings of the IEEE Conference on Decision and Control, IEEE, 2003, pp. 1765–1770; doi: 10.1109/CDC.2003.1272868. [34] K. Tchoń, A. Mazur, I. Dulęba, R. Hossa, and R. Muszyński, Manipulators and Mobile Robots: Models, Motion Planning, Control, Academic Pub‐ lishing House PLJ, 2000, (in Polish).
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
ADAPTIVE AND ROBUST FOLLOWING OF 3D PATHS BY A HOLONOMIC MANIPULATOR Submitted: 28th March 2023; accepted: 31st July 2023
Alicja Mazur, Mirela Kaczmarek DOI: 10.14313/JAMRIS/3‐2023/23 Abstract: This paper addresses the problem of the following three‐dimensional path by holonomic manipulator with parametric or structural uncertainty in the dynamics. Description of the manipulator relative to a desired three‐ dimensional path was presented. The path is parame‐ terized orthogonally to the Serret‐Frenet frame, which is moving along the curve. The adaptive and robust control laws for a stationary manipulator which ensures realiza‐ tion of the task were specified. Theoretical considerations are supported by the results of computer simulations conducted for an RTR manipulator. Keywords: Path following, Serret–Frenet parametriza‐ tion, Orthogonal projection, Backstepping algorithm, Holonomic manipulator
1. Introduction Over the years, the use of robots has been increas‐ ing, especially in industry. Manipulators are able to realize different tasks, including welding, paint‐ ing, assembly and palletizing, among others. During those tasks, high endurance, speed, and precision are required. Taking into account the control point of view, three types of tasks for industrial manipulators can be de ined: point stabilization; trajectory tracking when the robot has to follow a desired curve which is time‐ parametrized, and path following – the robot has to follow a curve parametrized by a curvilinear distance from a ixed point. During the trajectory tracking task, the robot’s particular position at a prespeci ied time is required. To the contrary, a path following task requires the robot to converge to a geometric curve with any feasible speed pro ile. In the paper, only the following desired path, i.e. a curve parameterized by curvilinear distance, has been considered. Recently, path‐following tasks have been discussed many times for different robots: usually for mobile platform [9–11], but also for ixed base manipula‐ tors and mobile manipulators [2, 3, 6, 8]. Most of the papers deal only with two‐dimensional paths. Moreover, algorithms presented in the literature are devoted to robots with fully known dynamic models. When the full model of manipulator dynamics is unknown, there are two possibilities – structural uncertainty, when mathematical expressions of some
forces in a dynamical model are unknown, and para‐ metric uncertainty, when we do not know certain number of model parameters. In this article, both types are being considered. Developing research presented in the article [7] will be continued in this paper. In the work, a gen‐ eral solution to the tracking of three‐dimensional curves for manipulators with different level of dynam‐ ics knowledge has been proposed. In Section 2, mod‐ els of dynamics with different types of uncertainties have been presented. General equations of the robot’s motion describing its position relative to the path have been established in Section 3. Control problem formu‐ lated in the paper has been presented in Section 4. Since the system equations have a cascade structure, the control will consist of a kinematic (Section 5) and a dynamic controller (Sections 6–8). The main result is a class of dynamic controllers dedicated to different lev‐ els of ixed‐base manipulator dynamics knowledge. In Section 6, dynamic control algorithm for fully known manipulator has been presented, Section 7 contains a dynamic control law for the case of parametric uncer‐ tainty in the model, and Section 8 presents a robust version of the dynamic algorithm for parametric and structural uncertainty in the dynamics. All consider‐ ations were illustrated with simulations for an RTR manipulator, presented in Section 9. Section 10 con‐ tains a brief summary of the results. This paper is an extension of the conference paper [5]. Other methods of the time‐dependent description of curvilinear distance measured along the path have been investigated. Moreover, the prob‐ lem of parametric and structural uncertainties occur‐ ring in the dynamics has been considered. Finally, an adaptive and robust version of the dynamic controller has been introduced. The proofs of asymptotic stabil‐ ity of the proposed algorithms have been shown.
2. Dynamics of a Holonomic Manipulator Typically, it is assumed that the dynamics of the manipulator are fully known. This approach is the starting point for designing subsequent control algo‐ rithms that require less and less knowledge about the dynamics of the object. In practice, such a sit‐ uation is rare, because it requires fully identi ied dynamics of the object (identi ication process con‐ ducted before regulation gives us all parameters of the dynamics) and moreover, that the robot does not carry
2023 © Mazur and Kaczmarek. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
65
Journal of Automation, Mobile Robotics and Intelligent Systems
the payload during operation (mass and moment of inertia of the unknown payload are added to the parameters of the last link, which contradicts the full knowledge of dynamics). A typical situation that we face during control is uncertainty about the dynamics model. When the full model of the manipulator is unknown, there are two possibilities – the structural uncertainty, when forms of functions describing some elements of the dynam‐ ical model are unknown (or some “impacts” – forces or torques – are not included in the model), and the parametric uncertainty, when we do not know certain number of model parameters. The control for each of these cases requires a different dynamic algorithm. For this reason, three different control algorithms will be presented: for full knowledge of the model (a non‐adaptive algo‐ rithm), for parametric uncertainty (an adaptive algo‐ rithm with full or partial parameterization of the model), and for structural uncertainty (a robust algo‐ rithm using a sliding mode approach [13] simultane‐ ously solving the problem of parametric and structural uncertainty). For each of the mentioned types of algorithms, the dynamics model should be presented in a slightly different way. This will be presented in the section. 2.1. Model of the Fully Known Manipulator Suppose that the model of the holonomic manipu‐ lator with 𝑛 degrees of freedom is fully known. Then, the robot dynamics are described in joint coordinates 𝜃 = (𝜃1 , ..., 𝜃𝑛 ) in the following form ̇ 𝜃̇ + 𝐷(𝜃) = 𝑢, 𝑀(𝜃)𝜃̈ + 𝐶(𝜃, 𝜃)
(1)
where the left‐hand side of the expression describes the dynamics of the manipulator with the elements: ‐ 𝜃, 𝜃,̇ 𝜃̈ ∈ 𝑅𝑛 – vectors of joint positions, velocities and accelerations, ‐ 𝑀(𝜃) – symmetrical, positive de inite inertia matrix, ̇ – matrix of Coriolis and centripetal forces, ‐ 𝐶(𝜃, 𝜃) ‐ 𝐷(𝜃) – vector of gravitational forces. The right‐hand side of (1) includes control vector 𝑢. It is clear that the above model describes only serial chain manipulators with neglected dissipative interactions, such as friction forces. This approach is not very restrictive because friction forces can be included in the model in a form linearly dependent on unknown parameters, and in the article, they have been omitted only for the sake of simplicity. 2.2. Model of the Manipulator with Parametric Uncer‐ tainty When the full model of the manipulator is unknown, there are two possibilities – structural uncertainty, when some forces acting on the manipulator are not included in the mathematical model of dynamics, and parametric uncertainty, when we know functions describing real forces but a certain number of model parameters standing before functions expressing forces are unknown. 66
VOLUME 17,
N∘ 3
2023
At the beginning, parametric uncertainty is being considered. From the control point of view, it is crucial to present the dynamics model as linearly dependent on unknown parameters as follows 𝑀(𝜃, 𝑎)𝜃̈ + 𝐶(𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷(𝜃, 𝑎) = ̈ 𝑌(𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 = 𝑢,
(2)
𝑝
where 𝑎 ∈ 𝑅 is a vector of unknown parameters and ̈ is a so‐called regression matrix. matrix 𝑌(𝜃, 𝜃,̇ 𝜃,̇ 𝜃) The irst argument of the regression matrix 𝑌 de ines trajectory, along which the model is described; the second component de ines velocity occuring in Coriolis matrix; the third component gives the vector by which the Coriolis matrix is multiplied; and the last component gives the vector by which the inertia matrix is multiplied. When all parameters 𝑎 are unknown, then the model (2) is called fully parametrized. However, only some parameters 𝑎 are unknown, and parts of the dynamic model can be represented as 𝑀(𝜃, 𝑎) = 𝑀0 (𝜃) + 𝑀1 (𝜃, 𝑎), ̇ + 𝐶1 (𝜃, 𝜃,̇ 𝑎), 𝐶(𝜃, 𝜃,̇ 𝑎) = 𝐶0 (𝜃, 𝜃) 𝐷(𝜃, 𝑎)
=
𝐷0 (𝜃) + 𝐷1 (𝜃, 𝑎),
̇ and 𝐷0 (𝜃) represent known where 𝑀0 (𝜃), 𝐶0 (𝜃, 𝜃), parts of model, while 𝑀1 (𝜃, 𝑎), 𝐶1 (𝜃, 𝜃,̇ 𝑎), and 𝐷1 (𝜃, 𝑎) include unknown parameters in vector 𝑎. Then, for partial knowledge of the model, the dynamic equation has the following form ̈ = 𝑢, 𝑀0 (𝜃)𝜃̈ + 𝐶0 (𝜃,̇ 𝜃)𝜃̇ + 𝐷0 (𝜃) + 𝑌1 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 (3) where ̈ = 𝑀1 (𝜃, 𝑎)𝜃̈ + 𝐶1 (𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷1 (𝜃, 𝑎). 𝑌1 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 In further considerations, we will mention the model (3) as dynamics with partial parametrization. 2.3. Model of the Manipulator with Structural and Para‐ metric Uncertainty In the case under consideration, a model with both parametric and structural uncertainty will be pre‐ sented. The idea is to get the dynamics corresponding to all possible situations that can happen in practice. Suppose that the dynamics have parametric and structural uncertainties, i.e., the model has a form 𝑀(𝜃, 𝑎)𝜃̈ + 𝐶(𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷(𝜃, 𝑎)+𝛿 = ̈ ̈ +𝛿 = 𝑀0 (𝜃)𝜃 +𝐶0 (𝜃,̇ 𝜃)𝜃̇ +𝐷0 (𝜃)+𝑌2 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 𝑢.
(4)
The symbol 𝛿 represents an unknown part of the model, e.g., forces not included in the dynamics equa‐ tions. For further considerations, some assumption has to be taken. Assumption 1 Unknown part of the dynamics 𝛿 is bounded by physical reasons. However, an estimate for the structural uncertainty is known, i.e., ∥ 𝛿 ∥≤ 𝐴 ∈ 𝑅+ .
(5)
Journal of Automation, Mobile Robotics and Intelligent Systems
3. Equation of Robot Motion Relative to a Path A path‐following task requires describing the robot’s motion relative to an object moving along a curve. To obtain such a description, we will use the Serret‐Frenet orthogonal parameterization. Using this parameterization, we get the equations that must be met for the system to correctly follow the desired path. 3.1. Serret‐Frenet Parametrization for 3D Curve Let’s consider the manipulator’s movement along a given curve 𝑟(𝑠) = (𝑟1 (𝑠), 𝑟2 (𝑠), 𝑟3 (𝑠))
𝑇
(6)
in three‐dimensional space, as in Figure 1. Point 𝑀 describes the position of the manipulator’s end‐ effector and can be de ined by Cartesian coordinates 𝑝 = (𝑥, 𝑦, 𝑧)𝑇 expressed relative to base body‐ ixed frame 𝑋0 𝑌0 𝑍0 . In some distance 𝑠 calculated along the path, the Serret‐Frenet frame 𝑄(𝑠) should be located. Parameter 𝑠 is so‐called curvilinear distance which may be interpreted as the length of a string laying perfectly on the path. The Serret‐Frenet frame is an orthonormal basis of 3‐vectors: 𝑇(𝑠) – the unit tan‐ gent, 𝑁(𝑠) – the unit normal, and 𝐵(𝑠) – the unit binormal, de ined as follows 𝑇=
𝑑𝑇 𝑑𝑠 𝑁 = 𝑑𝑇 , ‖ ‖ 𝑑𝑠
𝑑𝑟 , 𝑑𝑠
𝐵 = 𝑇 × 𝑁.
(7)
Vectors 𝑇(𝑠), 𝑁(𝑠), 𝐵(𝑠) are completely determined by the curvature 𝑐(𝑠) and torsion 𝜏(𝑠) of the three‐ dimensional curve as a function of 𝑠. The curvature of a plain curve in some point is equal to the inversion of the radius of such a circle, which is tangent to the curve in the same point, and can be calculated from the de inition as follows 𝑑 𝑑𝑟 𝑐(𝑠) = 𝑑𝑠 𝑑𝑠
𝑑2 𝑟 = 𝑑𝑠 2
𝑑𝑟12 𝑑𝑠 2
=
2
𝑑𝑟22 + 𝑑𝑠 2
2
. (8)
N∘ 3
VOLUME 17,
In turn, torsion de ines how much the curve swerves from the plane and is de ined in the follow‐ ing way 𝑑𝐵 𝑑𝑠
𝜏(𝑠) =
=
1 𝑑𝑟 𝑑2 𝑑3 𝑟 × , , 𝑐 2 (𝑠) 𝑑𝑠 𝑑𝑠 2 𝑑𝑠 3
r3
𝑑𝑄 = 𝑄(𝑠)𝐾(𝑠), 𝑑𝑠
(10)
which can be rewritten in matrix form 𝑑𝑇 𝑑𝑠
⎛ ⎜
𝑑𝑁 𝑑𝑠
⎞ ⎟ =
⎝
𝑑𝐵 𝑑𝑠
⎠
0 −𝑐(𝑠) 0
= 𝐾(𝑠) ⋅
𝑐(𝑠) 0 −𝜏(𝑠) 𝑇 𝑁 𝐵
0 𝜏(𝑠) 0
⋅
.
Y0
p
q
r2
(11)
3.2. Equation of Robot Motion Relative to Path Considering the manipulator’s movement along a given curve, as it has been presented in Figure 1, we can observe that the coordinates of point 𝑀 relative to the Serret‐Frenet frame are equal to 𝑞 = (𝑞1 , 𝑞2 , 𝑞3 )𝑇 . Whereas in the normal plane (spanned by 𝑁 and 𝐵 unit vectors) the position of the same point is de ined by coordinates (𝑞2 , 𝑞3 )𝑇 . To describe a robot’s motion relative to the moving Serret‐Frenet frame it is nec‐ essary to obtain (𝑠, 𝑞2 , 𝑞3 )𝑇 coordinates. According to the above assumption, to locate point M in the normal plane of the path, the following condition has to be satis ied (12)
r(t)
N r
𝑇 𝑁 𝐵
After making some more transformations, which are presented with details in [7], the following equa‐ tions describing robot position relative to the moving Serret‐Frenet frame were obtained
q2 M
(9)
where (⋅ , ⋅) denotes dot product of two vectors. According to the de initions of curvature and tor‐ sion, motion of the the Serret‐Frenet frame 𝑄(𝑠) = [𝑇(𝑠), 𝑁(𝑠), 𝐵(𝑠)] de ined along a given path can be expressed by Serret‐Frenet matrix 𝐾(𝑠) equations (using curvilinear distance s) as follows
𝑝 − 𝑟 ⊥ 𝑇 ⟹ (𝑇, 𝑝 − 𝑟) = 0. Z0
2023
T
𝑠̇
=
𝑞̇ 2
=
𝑞̇ 3
=
𝑇̇
=
𝑁̇
=
𝐵̇
=
M' q3
r1
B
X0
Figure 1. Illustration of path tracking problem using three‐dimensional Serret‐Frenet frame with orthogonal projection on a path
(𝑇, 𝑝̇ − 𝑟)̇ , 𝑐(𝑁, 𝑝 − 𝑟) 𝜏 (𝐵, 𝑝 − 𝑟) 𝑁− 𝑇, 𝑝̇ − 𝑟̇ , 𝑐 (𝑁, 𝑝 − 𝑟) 𝜏 𝐵 + 𝑇, 𝑝̇ − 𝑟̇ 𝑐 (𝑇, 𝑝̇ − 𝑟)̇ 𝑁, − (𝑁, 𝑝 − 𝑟) (𝑇, 𝑝̇ − 𝑟)̇ 𝜏 𝑇− 𝐵 , (𝑁, 𝑝 − 𝑟) 𝑐 𝜏 (𝑇, 𝑝̇ − 𝑟)̇ 𝑁. 𝑐 (𝑁, 𝑝 − 𝑟) −
(13) (14) (15) (16) (17) (18) 67
Journal of Automation, Mobile Robotics and Intelligent Systems
The above expressions are the point of departure to design control algorithms for three‐dimensional path tracking. The crucial point to note here is that vector 𝑝 = (𝑥, 𝑦, 𝑧)𝑇 describes the robot’s Cartesian position relative to base body‐ ixed frame 𝑋0 𝑌0 𝑍0 , vec‐ tor 𝑟 = (𝑟1 , 𝑟2 , 𝑟3 )𝑇 describes the given path in 𝑅3 rela‐ tive to the same base body‐ ixed frame and (𝑠, 𝑞2 , 𝑞3 )𝑇 are coordinates of the robot relative to the path. 3.3. Description of Manipulator Moving Along the Curve The Equations (13,18) can be rewritten as follows 𝑠̇
=
𝑞̇ 2
= =
𝑞̇ 3
=
(𝑇, 𝑝̇ − 𝑟)̇ = 𝑃1 𝑝̇ + 𝑅1 , 𝑐(𝑁, 𝑝 − 𝑟) 𝜏 (𝐵, 𝑝 − 𝑟) 𝑁− 𝑇, 𝑝̇ − 𝑟̇ 𝑐 (𝑁, 𝑝 − 𝑟) 𝑃2 𝑝̇ + 𝑅2 , 𝜏 𝐵 + 𝑇, 𝑝̇ − 𝑟̇ = 𝑃3 𝑝̇ + 𝑅3 , 𝑐
−
(19)
(20) (21)
with the elements equal to 𝑃1
= −
𝑃2
=
𝑇𝑇 , 𝑐(𝑁, 𝑝 − 𝑟)
𝑁−
𝜏 (𝐵, 𝑝 − 𝑟) 𝑇 𝑐 (𝑁, 𝑝 − 𝑟)
𝑇
,
=
𝑅1 𝑅2 𝑅3
𝑠 𝑞2 𝑞3
,
𝑃=
𝑃1 𝑃2 𝑃3
,
𝑅=
𝑅1 𝑅2 𝑅3
4. Control Problem Statement As it was mentioned in the introduction, this paper addresses the following control problem: A ixed‐based manipulator with parametric or structural uncertainty in the dynamics should follow the desired smooth path de ined in 𝑅3 space. The essential issue to note here is that the complete mathematical equations describing the manipulator relative to desired curve in 𝑅3 has a cascaded structure consisting of two groups of equations: ‐ kinematics (25) – description of robot motion rela‐ tive to the path (plays a role of constraints) and ‐ dynamics (1). Because the model structure is a cascade, it is nec‐ essary to use the control method intended for cas‐ caded systems, i.e., backstepping integrator method [4]. Therefore structure of the path‐following con‐ troller is divided into two parts due to a backstepping‐ like procedure, as in Figure 2:
Using those controllers working simultaneously, it is possible to solve the presented control problem for the manipulators. .
Then, the equations (19,21) can be expressed in the matrix form as below 𝜉̇ = 𝑃𝑝̇ + 𝑅.
(22)
5. Kinematic Control Algorithm To ensure that the Jacobi matrix 𝐽(𝜃) is invertible, we can assume that the manipulator is non‐redundant and the desired path does not pass through singular con igurations of the manipulator. Matrix 𝑃 is invertible if the following condition is satis ied
It is easy to see that Cartesian coordinates 𝑝 of the end‐effector are functions of joint variables, given by manipulator’s kinematics 𝑝 = 𝑘(𝜃),
68
det 𝑃 =
(23)
so 𝑝̇ depends on joint velocities in the following man‐ ner 𝜕𝑘 (24) 𝑝̇ = 𝜃̇ = 𝐽(𝜃)𝜃,̇ 𝜕𝜃 where 𝐽(𝜃) is the Jacobi matrix for position coor‐ dinates. Substituting (24) into (22) we obtain the expression 𝜉̇ = 𝑃𝐽𝜃̇ + 𝑅, (25) where signal 𝜃̇ plays a role of a control input.
2023
‐ Dynamic controller – as a consequence of the cas‐ caded structure of the system model, the system’s velocities cannot be commanded directly, as it is assumed in the design of the kinematic controller, and instead, they must be realized as the output of the dynamics driven by 𝑢.
Let’s introduce the following notation 𝜉=
N∘ 3
‐ Kinematic controller 𝜃𝑟𝑒𝑓 – represents a vector of embedded control inputs, which ensure the real‐ ization of the task for the geometric path tracking problem if the dynamics were not present. ’Velocity pro ile’, which can be executed in practice, to follow the desired curve in 𝑅3 is generated.
𝜏 𝑇 𝐵+ 𝑇 , 𝑐 𝑇, 𝑟̇ = , 𝑐(𝑁, 𝑝 − 𝑟) 𝜏 (𝐵, 𝑝 − 𝑟) = − 𝑁− 𝑇, 𝑟̇ , 𝑐 (𝑁, 𝑝 − 𝑟) 𝜏 = − 𝐵 + 𝑇, 𝑟̇ . 𝑐
𝑃3
VOLUME 17,
Control object
−1 ≠ 0. 𝑐(𝑠)(𝑁, 𝑝 − 𝑟)
𝜃, 𝜃̇
(26)
Desired path 𝒓(𝑠)
𝒖 Dynamic controller
𝜃̇𝑟𝑒𝑓
Kinematic controller
Figure 2. Structure of the proposed control algorithm
Journal of Automation, Mobile Robotics and Intelligent Systems
Cases where det 𝑃 = 0 are called singularities of Serret-Frenet orthogonal parameterization, e.g., ‐ 𝑐(𝑠) = 0 – curvature of the curve is equal 0, i.e., the curve is infact the straight line, ‐ 𝑝 − 𝑟 = 0 – at the end of regulation process gripper is located strictly on the path. The weak side of the Serret‐Frenet orthogonal param‐ eterization is the fact that it does not allow straight lines to be traced. This problem can be solved in another way, namely by moving the singularity to another place, not necessarily lying on the path. The singularity shifting procedure in the orthog‐ onal parameterizations has been presented in the paper [1]. According to the above remark, the following kine‐ matic control algorithm can be proposed
VOLUME 17,
N∘ 3
2023
6. Dynamic Controller – Full Knowledge of the Manipulator’s Dynamics When the manipulator’s dynamics are fully known, then non‐adaptive version of the dynamic control law has been proposed: ̇ 𝜃̇𝑟𝑒𝑓 + 𝐷(𝜃) − 𝐾𝑑 𝑒̇𝜃 . (32) 𝑢 = 𝑀(𝜃)𝜃̈𝑟𝑒𝑓 + 𝐶(𝜃, 𝜃) In the above equation 𝜃̇𝑟𝑒𝑓 is a control signal (27) coming from the kinematic controller, 𝐾𝑑 is the posi‐ tive de inite matrix of regulation parameters and 𝑒̇𝜃 = 𝜃̇ − 𝜃̇𝑟𝑒𝑓 . 6.1. Proof of the Convergence After substituting non‐adaptive control law (32) into the model of fully known dynamics (1), we obtain the closed‐loop system ̇ 𝑒̇𝜃 + 𝐾𝑑 𝑒̇𝜃 = 0. 𝑀(𝜃)𝑒̈𝜃 + 𝐶(𝜃, 𝜃)
𝜃̇𝑟𝑒𝑓 = 𝐽−1 𝑃−1 𝜉𝑑̇ − 𝐾𝑝 𝑒𝜉 − 𝑅 , 𝑒𝜉 = 𝜉 − 𝜉𝑑 , (27)
(33)
Let’s propose the following Lyapunov‐like function with a positive de inite regulation matrix 𝐾𝑝 > 0. Vector 𝜉𝑑 = (𝑠𝑑 (𝑡), 𝑞2𝑑 , 𝑞3𝑑 ) where usually 𝑞2𝑑 = 0, 𝑞3𝑑 = 0 and 𝑠𝑑 (𝑡) – desired path parametrization (dependency on time) can be an arbitrary function, depending on the designer’s choice. Signal 𝜃̇𝑟𝑒𝑓 is proposed velocity of the robot’s joints, i.e. ’velocity pro ile’ coming from the kinematic controller – motion planning subsystem. Such velocity has to be realized on a dynamic level. 5.1. Proof of the Convergence After substituting kinematic control law (27) to the constraints equations (25), we get a kinematic closed‐loop system in the form 𝜉̇ 𝜉̇
𝑃𝐽𝐽−1 𝑃−1 𝜉𝑑̇ − 𝐾𝑝 𝑒𝜉 − 𝑅 + 𝑅 = 𝜉𝑑̇ − 𝐾𝑝 𝑒𝜉 , =
(28)
or equivalently, after moving to one side, 𝜉̇ − 𝜉𝑑̇ + 𝐾𝑝 𝑒𝜉
= 0,
𝑒̇𝜉 + 𝐾𝑝 𝑒𝜉
= 0.
(29)
Let’s propose the following Lyapunov‐like function 𝑉1 (𝑒𝜉 ) =
1 𝑇 𝑒 𝑒 . 2 𝜉 𝜉
(30)
The time derivative of this function calculated along trajectories of the closed‐loop system (29) is equal to 𝑉1̇ = 𝑒𝜉𝑇 𝑒̇𝜉 = 𝑒𝜉𝑇 (−𝐾𝑝 𝑒𝜉 ) = −𝑒𝜉𝑇 𝐾𝑝 𝑒𝜉 ≤ 0.
(31)
From LaSalle’s invariance principle [4], we know that the equilibrium point of (31) is equal to 𝑒𝜉 = 0. This ends the proof of asymptotic stability of the kinematic controller.
𝑉2 (𝑒𝜉 , 𝑒̇𝜃 ) = 𝑉1 (𝑒𝜉 ) +
1 𝑇 𝑒̇ 𝑀(𝜃)𝑒̇𝜃 . 2 𝜃
(34)
The time derivative of this function calculated along trajectories of the closed‐loop system (33) and (29) is equal to 𝑉̇2
1 𝑇 ̇ 𝑒̇ 𝑀(𝜃) 𝑒̇𝜃 2 𝜃
=
𝑉1̇ + 𝑒̇𝜃𝑇 𝑀(𝜃)𝑒̈𝜃 +
=
̇ 𝑒̇𝜃 − 𝐾𝑑 𝑒̇𝜃 ) + 𝑉1̇ + 𝑒̇𝜃𝑇 (−𝐶(𝜃, 𝜃)
1 𝑇 ̇ 𝑒̇ 𝑀(𝜃) 𝑒̇𝜃 . 2 𝜃
Using skew‐symmetry between inertia and Corio‐ lis matrices, we get 𝑉̇2 = −𝑒𝜉𝑇 𝐾𝑝 𝑒𝜉 − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 ≤ 0.
(35)
Again, from LaSalle’s invariance principle, it can be concluded that the equilibrium point of the control algorithm is (𝑒𝜉 , 𝑒̇𝜃 ) = (0, 0). This ends the proof of asymptotic stability of the non‐adaptive control algorithm for the cascaded sys‐ tem (25)‐(1).
7. Adaptive Dynamic Controller – Parametric Uncertainty of the Manipulator’s Dynamics The parametric uncertainty applies only to the dynamics model, and the unknown parameters are most often related to the payload being carried. In order to design an adaptive control algorithm that per‐ forms the path‐following task, it is necessary to [12]: ‐ assume that an estimate of the unknown parameters 𝑎(𝑡) ̂ is available at any time, ‐ ind the 𝑢 control algorithm that uses the current estimates of the unknown parameters, ‐ ind an algorithm for estimating unknown parame‐ ters and ‐ prove the asymptotic stability of the proposed adap‐ tive control system consisting of a control subsystem and a subsystem for estimating unknown parame‐ ters. 69
Journal of Automation, Mobile Robotics and Intelligent Systems
Since the unacquaintance of the dynamics parameters of the object does not affect the kinematic control, the considerations presented in Section 6 regarding the control structure and the kinematic controller are still valid. The kinematic controller for obtaining the velocity pro ile is therefore given by the equation (27). Due to the presence of unknown parameters 𝑎, it is neces‐ sary to use the adaptive version of the algorithm (32), preferably in version (3) for partial parameterization of the model. When the full model of the manipulator is unknown, it is crucial to design a proper dynamic controller. As mentioned before, in this article, the parametric uncertainty, when the certain number of model parameters is unknown, is considered. Parametric uncertainty applies only to the dynamics model and does not affect the kinematic controller. For the realization of motion 𝜉𝑑 (𝑡) (path tracking with time regime), the dynamic adaptive control algo‐ rithm can be proposed 𝑢
=
̇ 𝜃̇𝑟𝑒𝑓 + 𝐷0 (𝜃) + 𝑀0 (𝜃)𝜃̈𝑟𝑒𝑓 + 𝐶0 (𝜃, 𝜃) +𝑌1 (𝜃, 𝜃,̇ 𝜃̇𝑟𝑒𝑓 , 𝜃̈𝑟𝑒𝑓 )𝑎(𝑡) ̂ − 𝐾𝑑 𝑒̇𝜃 , (36)
where 𝑀0 , 𝐶0 , 𝐷0 represent known parts of the model, 𝑌1 ⋅ 𝑎(𝑡) ̂ is unknown one, and 𝐾𝑑 is a positive de i‐ nite regulation matrix. 𝑌1 is the so‐called regression matrix, and 𝑎(𝑡) ̂ is a vector of time estimates of unknown coef icients of the robot model generated by the so‐called adaptation law. These estimates are calculated as follows ̇̂ 𝑎(𝑡) = 𝑎(𝑡) ̃̇ = −Γ𝑌1𝑇 (𝜃, 𝜃,̇ 𝜃̇𝑟𝑒𝑓 , 𝜃̈𝑟𝑒𝑓 )𝑒̇𝜃 ,
(37)
where Γ = Γ 𝑇 > 0 is positive de inite matrix of adaptation gains and 𝑎(𝑡) ̃ = 𝑎(𝑡) ̂ − 𝑎 is a vector of parameter errors. Vector 𝑎 represents unknown but constant real parameters of the dynamics.
VOLUME 17,
𝑌1 𝑎 = 𝑀1 (𝜃, 𝑎)𝜃̈ + 𝐶1 (𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷1 (𝜃, 𝑎) and 𝑌1𝑟 𝑎 = 𝑀1 (𝜃, 𝑎)𝜃̈𝑟𝑒𝑓 + 𝐶1 (𝜃, 𝜃,̇ 𝑎)𝜃̇𝑟𝑒𝑓 + 𝐷1 (𝜃, 𝑎) the equation (40) can be converted to the form ̈ 1 (𝜃, 𝜃,̇ 𝑎)𝜃+𝐷 ̇ 1 (𝜃, 𝑎) 𝑀0 𝑒̈𝜃+𝐶0 𝑒̇𝜃+𝑀1 (𝜃, 𝑎)𝜃+𝐶 − 𝑀1 (𝜃, 𝑎)𝜃̈𝑟𝑒𝑓 + 𝐶1 (𝜃, 𝜃,̇ 𝑎)𝜃̇𝑟𝑒𝑓 + 𝐷1 (𝜃, 𝑎) +𝑌1𝑟 (𝑎 − 𝑎) ̂ + 𝐾𝑑 𝑒̇𝜃 = 0. (41) Assuming that 𝑎̃ = 𝑎̂ − 𝑎 is an error of parame‐ ter estimation, and making the necessary transforma‐ tions, the equation (41) can be written as (𝑀0 + 𝑀1 ) 𝑒̈𝜃 + (𝐶0 + 𝐶1 ) 𝑒̇𝜃 + 𝐾𝑑 𝑒̇𝜃 = 𝑌1𝑟 𝑎,̃
𝑌1
̇ 𝑒̇𝜃 + 𝐾𝑑 𝑒̇𝜃 = 𝑌1𝑟 𝑎.̃ 𝑀(𝜃)𝑒̈𝜃 + 𝐶(𝜃, 𝜃)
𝑉3 (𝑒𝜉 , 𝑒̇𝜃 , 𝑎) ̃ =
𝑀0 𝑒̈𝜃 +𝐶0 𝑒̇𝜃 +(𝑌1 𝑎 − 𝑌1𝑟 𝑎)+(𝑌1𝑟 𝑎 − 𝑌1𝑟 𝑎)+𝐾 ̂ 𝑑 𝑒̇𝜃 = 0. (40) 70
1 1 𝑇 𝑒̇𝜃 𝑀(𝜃)𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1 𝑎̃ + 𝑉1 (𝑒𝜉 ), 2 2 (44)
where the irst term is the same as in the case of the non‐adaptive version of the algorithm (34), and 𝑉1 (𝑒𝜉 ) given by the equation (30) is the Lyapunov function for the kinematic controller (1st stage of the cascade), performing path following task. The time derivative of the proposed Lyapunov function calculated along the trajectory of the system (43) is equal to 𝑉̇3
1 𝑇 ̇ 𝑒̇ 𝑀(𝜃) 𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1 𝑎̃̇ 2 𝜃
=
𝑉1̇ + 𝑒̇𝜃𝑇 𝑀(𝜃)𝑒̈𝜃 +
=
̇ 𝑒̇𝜃 − 𝐾𝑑 𝑒̇𝜃 𝑉1̇ + 𝑒̇𝜃𝑇 𝑌𝑟 𝑎̃ − 𝐶(𝜃, 𝜃) 1 ̇ + 𝑒̇𝜃𝑇 𝑀(𝜃) 𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1 𝑎.̃̇ 2
(45)
The derivative of the estimation error for 𝑎̃ is 𝑎̃̇ = 𝑎̇̂ − 𝑎,̇ which, assuming 𝑎 is constant in time leads to the equality 𝑎̃̇ = 𝑎.̇̂ The estimated parameter 𝑎̂ is calculated according to the adaptation law given by the equation (37). The derivative of the Lyapunov function (53) can then be rearranged to the following form 𝑉̇3
=
=
which after transformation gives the following expres‐ sion
(43)
For a system with a closed feedback loop (29), (43), the Lyapunov‐like function of the form was proposed
𝑌1𝑟
+𝐾𝑑 𝑒̇𝜃 = 0,(39)
(42)
which, taking into account the dependence 𝑀0 + 𝑀1 = ̇ 𝑀(𝜃, 𝑎) = 𝑀(𝜃) and 𝐶0 + 𝐶1 = 𝐶(𝜃, 𝜃,̇ 𝑎) = 𝐶(𝜃, 𝜃), leads to the equation
In the case of partial parametrization of the model, the dynamics equation of the manipulator takes the form
𝑀0 𝜃̈ − 𝜃̈𝑟𝑒𝑓 + 𝐶0 𝜃̇ − 𝜃̇𝑟𝑒𝑓 + 𝐷0 + ̈ 𝑎 − 𝑌1 (𝜃, 𝜃,̇ 𝜃̇𝑟𝑒𝑓 , 𝜃̈𝑟𝑒𝑓 ) 𝑎̂ + + 𝑌1 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)
2023
Then, considering that
7.1. Proof of the Convergence
̈ = 𝑢, 𝑀0 (𝜃)𝜃̈ + 𝐶0 (𝜃,̇ 𝜃)𝜃̇ + 𝐷0 (𝜃) + 𝑌1 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 (38) where 𝑀0 , 𝐶0 , 𝐷0 represent known parts of the model ̈ and 𝑌1 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 = 𝑀1 (𝜃, 𝑎)𝜃̈ + 𝐶1 (𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷1 (𝜃, 𝑎) represents unknown parts. Equations of the system (38) with a closed‐loop of the feedback (36) can be expressed as
N∘ 3
̇ 𝑒̇𝜃 − 𝐾𝑑 𝑒̇𝜃 𝑉1̇ + 𝑒̇𝜃𝑇 𝑌1𝑟 𝑎̃ − 𝐶(𝜃, 𝜃) 1 𝑇 ̇ 𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1 −Γ𝑌1𝑟 𝑒̇𝜃 + 𝑒̇𝜃𝑇 𝑀(𝜃) 2 𝑇 𝑉1̇ + 𝑒̇𝜃𝑇 𝑌1𝑟 𝑎̃ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝑎̃ 𝑇 𝑌1𝑟 𝑒̇𝜃 . (46)
The fourth term is a scalar, so it can be transposed to the same result: 𝑉̇3
=
𝑇 𝑉1̇ + 𝑒̇𝜃𝑇 𝑌1𝑟 𝑎̃ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝑎̃ 𝑇 𝑌1𝑟 𝑒̇𝜃 𝑉1̇ + 𝑒̇𝜃𝑇 𝑌1𝑟 𝑎̃ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝑒̇𝜃𝑇 𝑌1𝑟 𝑎̃
=
−𝑒̇𝜃 𝐾𝑑 𝑒̇𝜃 − 𝑒𝜉𝑇 𝐾𝑝 𝑒𝜉 ≤ 0.
=
𝑇
(47)
Journal of Automation, Mobile Robotics and Intelligent Systems
As you can see, the derivative of the Lyapunov‐ like function is negative semide inite. From LaSalle’s invariance principle it follows that (𝑒̇𝜃 , 𝑒𝜉 ) = (0, 0) is an invariant set to which the trajectories of a system with a closed feedback loop converge asymptotically. This ends the proof of the convergence of the adaptive backstepping integration algorithm for both stages of the cascade. It is worth mentioning that if all parameters are unknown (full parameterization of the model), it is ̇ = 𝐷0 (𝜃) = 0 enough to assume 𝑀0 (𝜃) = 𝐶0 (𝜃, 𝜃) in the adaptive dynamic control algorithm (36).
8. Robust Dynamic Controller – Structural and Parametric Uncertainty of the Manipulator’s Dynamics As mentioned earlier in the article, in addition to parametric uncertainty, there may also occur struc‐ tural uncertainty in the dynamics model. This situa‐ tion may arise when some interaction has not been taken into account in the equations of dynamics. In practice, this means that the dynamic model does not take into account some interaction that occurs in the real system. Often, structural uncertainty also appears in a system that is not fully identi ied. This corre‐ sponds to the dynamics model (4), which includes not only structural but also parametric uncertainty 𝑀(𝜃, 𝑎)𝜃̈ + 𝐶(𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷(𝜃, 𝑎)+𝛿 = ̈ ̈ +𝛿 = 𝑀0 (𝜃)𝜃 +𝐶0 (𝜃,̇ 𝜃)𝜃̇ +𝐷0 (𝜃)+𝑌2 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 𝑢. (48) For the above model of dynamics and for ful illed Assumption 1, let us propose a robust dynamic control law as follows
VOLUME 17,
8.1. Proof of the Convergence In the case of structural and parametric uncer‐ tainty (partial parametrization of the model), the dynamics equation of the manipulator takes the form ̈ + 𝛿 = 𝑢, 𝑀0 (𝜃)𝜃̈ + 𝐶0 (𝜃,̇ 𝜃)𝜃̇ + 𝐷0 (𝜃) + 𝑌2 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 (50) where 𝑀0 , 𝐶0 , 𝐷0 represent known parts of model, ̈ 𝑌2 (𝜃, 𝜃,̇ 𝜃,̇ 𝜃)𝑎 = 𝑀1 (𝜃, 𝑎)𝜃̈ + 𝐶1 (𝜃, 𝜃,̇ 𝑎)𝜃̇ + 𝐷1 (𝜃, 𝑎) represents parts of the model with parametric uncer‐ tainty and 𝛿 ∈ 𝑅𝑛 represents the forces not included in the model which occur in real conditions. It is worth it to mention that the regression matrix 𝑌2 – although it depends on unknown parameters 𝑎 – differs from matrix 𝑌1 , because 𝑌1 should include
2023
parameters derived from all forces, while 𝑌2 contains parameters derived only from forces included in the model, without 𝛿. After transforming the equations, similar to Sec‐ tion 7.1, the equations of the system (50) with a closed feedback loop (49) are obtained as below ̇ 𝑒̇𝜃 +𝐾𝑑 𝑒̇𝜃 +𝐾𝑠 sign𝑒̇𝜃 +𝛿 = 𝑌𝑟 𝑎.̃ (51) 𝑀(𝜃)𝑒̈𝜃 +𝐶(𝜃, 𝜃) For a system with a closed feedback loop (29), (51), the Lyapunov‐like function of the form was proposed 𝑉4 (𝑒𝜉 , 𝑒̇𝜃 , 𝑎) ̃ =
1 𝑇 1 𝑒̇ 𝑀(𝜃)𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1 𝑎̃ + 𝑉1 (𝑒𝜉 ), 2 𝜃 2 (52)
where the irst term is the same as in the case of the non‐adaptive version of the algorithm (34), and 𝑉1 (𝑒𝜉 ) given by the equation (30) is the Lyapunov‐ like function for the kinematic controller following the path. The time derivative of 𝑉4̇ calculated along trajecto‐ ries of the closed‐loop system (29,37,51) is equal to 𝑉4̇ = 𝑉1̇ + 𝑒̇𝜃𝑇 𝑀(𝜃)𝑒̈𝜃 +
1 𝑇 ̇ 𝑒̇ 𝑀(𝜃) 𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1 𝑎̃̇ 2 𝜃
̇ 𝑒̇𝜃 − 𝐾𝑑 𝑒̇𝜃 − 𝐾𝑠 sign𝑒̇𝜃 = 𝑉1̇ + 𝑒̇𝜃𝑇 𝑌2𝑟 𝑎̃ − 𝐶(𝜃, 𝜃) 1 𝑇 𝑇 ̇ 𝑒̇ 𝑀(𝜃) 𝑒̇𝜃 + 𝑎̃ 𝑇 Γ −1⋅ −Γ𝑌2𝑟 𝑒̇𝜃 . (53) 2 𝜃
−𝛿) +
Using skew‐symmetry between the inertia matrix and the Coriolis matrix, i.e., ̇ + 𝐶 𝑇 (𝜃, 𝜃), ̇ ̇ 𝑀(𝜃) = 𝐶(𝜃, 𝜃) the time derivative of the Lyapunov‐like function can be simpli ied as follows
̇ 𝜃̇𝑟𝑒𝑓 + 𝐷0 (𝜃) + 𝑢 = 𝑀0 (𝜃)𝜃̈𝑟𝑒𝑓 + 𝐶0 (𝜃, 𝜃) + 𝑌2 (𝜃, 𝜃,̇ 𝜃̇𝑟𝑒𝑓 , 𝜃̈𝑟𝑒𝑓 )𝑎(𝑡)−𝐾 ̂ 𝑑 𝑒̇𝜃 −𝐾𝑠 sign𝑒̇𝜃 , (49) where 𝐾𝑠 > 0 ∈ 𝑅+ is the regulation parameter of the additional switching regulator. If there is also parametric uncertainty in the model (𝑌2 ≠ 0), then the law of estimating the unknown parameters will be given by the same formula, as in the adaptive case, i.e., (37).
N∘ 3
𝑉4̇
=
𝑉1̇ + 𝑒̇𝜃𝑇 𝑌2𝑟 𝑎̃ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝑒̇𝜃𝑇 𝛿 𝑇 −𝑒̇𝜃𝑇 𝐾𝑠 sign𝑒̇𝜃 − 𝑎̃ 𝑇 𝑌2𝑟 𝑒̇𝜃 .
(54)
Second and last term on the right side of the above the equation can be reduced; therefore, we can write 𝑉4̇
=
𝑉1̇ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝑒̇𝜃𝑇 𝛿 − 𝑒̇𝜃𝑇 𝐾𝑠 sign𝑒̇𝜃 𝑉1̇ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝑒̇𝜃𝑇 sign𝑒̇𝜃 (𝐾𝑠 + 𝛿sign𝑒̇𝜃 )
=
𝑉1̇ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − ∣ 𝑒̇𝜃 ∣ (𝐾𝑠 + 𝛿sign𝑒̇𝜃 ) .
=
If 𝐾𝑠 ≥∥ 𝐴 ∥ +𝜀,
𝜀 ∈ 𝑅+ ,
where ∥ 𝐴 ∥ is the limit from above of the norm of unknown force acting on the dynamical object (struc‐ tural uncertainty of the dynamics), due to Assump‐ tion 1, then the following evaluation is true 𝑉4̇
=
𝑉1̇ − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝜀 ∣ 𝑒̇𝜃 ∣
= −𝑒𝜉𝑇 𝐾𝑝 𝑒𝜉 − 𝑒̇𝜃𝑇 𝐾𝑑 𝑒̇𝜃 − 𝜀 ∣ 𝑒̇𝜃 ∣ = −𝑊(𝑒𝜉 , 𝑒̇𝜃 ) ≤ 0.
(55)
By the LaSalle’s invariance principle, it can be con‐ cluded that 𝑊(𝑒𝜉 , 𝑒̇𝜃 ) = 0 71
Journal of Automation, Mobile Robotics and Intelligent Systems
de ines an invariant set, to which the trajectories of the closed‐loop system converge asymptotically. It means that the invariant set is equal to the asymptotic equi‐ librium point (𝑒𝜉 , 𝑒̇𝜃 ) = (0, 0).
2023
1 𝐶23 = − 𝜃̇3 𝑚3 𝑙2 𝑙3 sin 𝜃3 , 2 1 1 𝐶31 = 𝜃̇1 ( 𝑚3 𝑙2 𝑙3 sin 𝜃3 + 𝑚3 𝑙32 sin 𝜃3 cos 𝜃3 ), 2
3
‐ gravity vector 0 (𝑚2 + 𝑚3 )𝑔 1 𝑔𝑚3 𝑙3 cos 𝜃3
𝐷(𝜃) =
This ends the proof.
.
(58)
2
9. Simulation Study Simulations were run with the MATLAB package and the SIMULINK toolbox. The object of the simula‐ tions was the RTR manipulator with three degrees of freedom, presented in Figure 3. Links of the RTR manipulator have been modelled as homogeneous sticks with a length equal to 𝑙2 = 0.9 m and 𝑙3 = 1 m and masses 𝑚2 = 20 kg and 𝑚3 = 20 kg. The dynamics of the RTR manipulator are given by (1) with elements equal to: ‐ inertia matrix 𝑀(𝜃) =
N∘ 3
VOLUME 17,
𝑀11 0 0
0 𝑀22 𝑀23
0 𝑀23 𝑀33
,
(56)
1
1
𝑀11 = 𝑚2 𝑙22 + 𝑚3 (𝑙22 + 𝑙32 cos2 𝜃3 + 𝑙2 𝑙3 cos 𝜃3 ), 3 3 𝑀22 = 𝑚2 + 𝑚3 , 1 𝑀23 = 𝑚3 𝑙2 𝑙3 cos 𝜃3 , 2 1 3
𝑀33 = 𝑚3 𝑙32 ,
The Cartesian position of the end‐effector for the given manipulator can be expressed as 𝑥 𝑦 𝑧
𝑝=
cos 𝜃1 (𝑙3 cos 𝜃3 + 𝑙2 ) sin 𝜃1 (𝑙3 cos 𝜃3 + 𝑙2 ) 𝑙3 sin 𝜃3 + 𝜃2
=
,
then the Jacobi matrix has a form − sin 𝜃1 (𝑙3 cos 𝜃3 + 𝑙2 ) 0 − cos 𝜃1 sin 𝜃3 𝑙3 𝐽(𝜃) = cos 𝜃1 (𝑙3 cos 𝜃3 + 𝑙2 ) 0 − sin 𝜃1 sin 𝜃3 𝑙3 . 0 1 cos 𝜃3 𝑙3 (59) The goal of the simulations was to investigate a behavior of this rigid ixed‐base manipulator with parametric uncertainty using the controllers (27) and (37) proposed in the paper. The simulation was con‐ ducted for linear and square path parameterizations 𝑠𝑑 for the assumption that one or two parameters of dynamics are unknown. A screw curve has been chosen as a desired path:
‐ matrix of Coriolis and centrifugal forces 𝐶11 0 𝐶31
̇ = 𝐶(𝜃, 𝜃)
0 0 0
𝐶13 𝐶23 0
𝑟(𝑠) (57)
1
1
𝐶11 = 𝜃̇3 (− 𝑚3 𝑙2 𝑙3 sin 𝜃3 − 𝑚3 𝑙32 sin 𝜃3 cos 𝜃3 ), 2 3 1 1 𝐶13 = −𝜃̇1 ( 𝑚3 𝑙2 𝑙3 sin 𝜃3 + 𝑚3 𝑙32 sin 𝜃3 cos 𝜃3 ), 2
Z
3
(𝑟1 (𝑠), 𝑟2 (𝑠), 𝑟3 (𝑠))𝑇
=
cos
1 √2
− sin
l3
√2
, sin
𝑠
,
𝑇
𝑠
.
√2 √2
𝑠 √2 𝑠
cos √2 1
𝐵(𝑠) =
θ3
𝑠
(60)
Vectors 𝑇, 𝑁, and 𝐵 have been selected as below 𝑇(𝑠) =
l2
=
1 √2
− cos , 𝑁(𝑠) =
sin
− sin √2 0
,
𝑠 √2 𝑠
− cos √2 1 1 2
𝑠 √2 𝑠
, 1 2
and path parameters as 𝑐(𝑠) = , 𝜏(𝑠) = . 9.1. Linear Time Parametrization
θ2
For the path following linear time dependency 𝑠𝑑 (𝑡) was chosen 𝑇 𝑡 , 0, 0 . (61) 10 Due to the fact that, for control law (27), matrix 𝐽(𝜃)−1 is required, we assume that it is possible to avoid all singularities in robotic joint space if the manipulator can realize motion from initial con igu‐ ration to desired task without the necessity to pass through singular con iguration. Such a case is pre‐ sented in this simulation study. Simulations have been done for cases where one or two parameters of the manipulator are unknown. As a consequence, it is necessary to rewrite the model of dynamics as it was mentioned in Section 2.2.
𝜉𝑑 (𝑡) = (𝑠𝑑 , 𝑞2𝑑 , 𝑞3𝑑 )𝑇 (𝑡) =
Y θ1 X Figure 3. Manipulator RTR – the object of simulation [6] 72
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
One unknown parameter Assuming that among the parameters of the RTR manipulator the unknown parameter is 1 𝑔𝑚3 𝑙3 2 from gravity vector 𝐷, the dynamics of the manipula‐ tor can be represented as 𝑎5 =
̇ 𝜃̇ + 𝐷0 (𝜃) + 𝐷1 (𝑎5 ) = 𝑢, 𝑀0 (𝜃)𝜃̈ + 𝐶0 (𝜃, 𝜃)
(62)
where the known part 𝐷0 (𝜃) and vector with unknown parameter 𝐷1 (𝑎5 ) are 𝐷0 (𝜃) =
0 (𝑚2 + 𝑚3 )𝑔 0
, 𝐷1 (𝑎1 ) =
0 0 𝑎5 cos 𝜃3
(a)
.
̇ contain It is easy to see that 𝑀0 (𝜃) and 𝐶0 (𝜃, 𝜃) only known parameters, so 𝑀0 (𝜃) = 𝑀(𝜃) and ̇ = 𝐶(𝜃, 𝜃), ̇ where 𝑀(𝜃) and 𝐶(𝜃, 𝜃) ̇ are given 𝐶0 (𝜃, 𝜃) by (56) and (57). The dynamic model can be rewritten as follows ̇ 𝜃̇ + 𝐷0 (𝜃) + 𝑌1 (𝜃)𝑎5 = 𝑢, 𝑀(𝜃)𝜃̈ + 𝐶(𝜃, 𝜃)
(63) (b)
where 𝑌1 (𝜃) =
0 0 cos 𝜃3
(64)
is the regression matrix. An adaptive version of the algorithm has the form 𝑢
=
̇ 𝜃̇𝑟𝑒𝑓 + 𝐷0 (𝜃) + 𝑀0 (𝜃)𝜃̈𝑟𝑒𝑓 + 𝐶0 (𝜃, 𝜃) +𝑌1 (𝜃) ⋅ 𝑎̂ 5 (𝑡) − 𝐾𝑑 𝑒̇𝜃 ,
(65)
and from (37) the estimated parameter value can be calculated from the adaptation law 𝑎̇̂ 5 (𝑡) = 𝑎̃̇ 5 (𝑡) = −Γ𝑌1𝑇 𝑒̇𝜃 = −Γ cos 𝜃3 𝑒̇3𝜃 .
(66)
(c)
The de inition of the path with linear time depen‐ dency is given by (71). During the simulation, the following control parameters have been chosen: 𝐾𝑝 = 0.05, 𝐾𝑑 = 100, and adaptation gain Γ = 800. Tracking of the desired path for the RTR manipula‐ tor by linear time parametrization has been presented in Figure 4a. Error of curvilinear distance 𝑒𝑠 = 𝑠 − 𝑠𝑑 has been plotted in Figure 4b. In turn, tracking errors of Cartesian coordinates in a normal plane have been presented in Figures 4c, d. In turn, in Figure 5 a vector of parameter errors 𝑎(𝑡) ̃ = 𝑎(𝑡) ̂ − 𝑎 has been presented. From plots in Figure 4, it can be concluded that path tracking with linear time parametrization is real‐ ized properly and tracking errors converge to zero. Moreover, real curvilinear parametrization 𝑠(𝑡) tends to the desired function 𝑠𝑑 (𝑡). Furthermore, distance tracking errors 𝑒2 and 𝑒3 have only positive values. It means that distance (𝑝 − 𝑟) is positive and does not change sign during the regulation process. In other words, the matrix P is non‐singular, and path parametrization using orthogonal projection on the curve is valid.
(d) Figure 4. Linear time parametrization for one unknown parameter: (a) – the trajectory of the manipulator, (b) – curvilinear error 𝑒𝑠 , (c) – distance error 𝑒2 = 𝑞2 − 𝑞2𝑑 , (d) – distance error 𝑒3 = 𝑞3 − 𝑞3𝑑
Plot 5 has shown that estimate errors of unknown parameter converge to 0 during the regulation process. However, the convergence of estimation 73
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
(a)
Figure 5. Linear time parametrization – estimate error 𝑎̃ 5 = 𝑎̂ 5 (𝑡) − 𝑎5
error to zero is not necessary to ensure correct work of the adaptive algorithm. Two unknown parameters Let’s assume that among the parameters of the RTR manipulator the unknown parameters are 𝑎4 and 𝑎5 , selected as follows 𝑎=
𝑎4 𝑎5
=
𝑚2 + 𝑚 3 1 𝑔𝑚3 𝑙3
,
𝑎 ∈ 𝑅2 .
(b)
(67)
2
Parameter 𝑎5 occurs only in the gravity vector, while parameter 𝑎4 appears in gravity vector and iner‐ tia matrix. Then, the manipulator dynamics model can be presented in the following way ̈ 0 (𝜃,̇ 𝜃)𝜃+𝐷 ̇ (𝑀0 (𝜃) + 𝑀1 (𝜃)) 𝜃+𝐶 0 (𝜃)+𝐷1 (𝜃, 𝑎) = 𝑢, (68) where 𝑀1 and 𝐷1 include unknown parameters, while 𝐷0 (𝜃) = 0. The dynamic model of manipulator can be rewritten as follows ̇ 𝜃̇ + 𝑌1 (𝜃, 𝜃)𝑎 ̈ = 𝑢, 𝑀0 (𝜃)𝜃̈ + 𝐶0 (𝜃, 𝜃)
(c)
(69)
̈ is regression matrix of unknown where 𝑌1 (𝜃, 𝜃) parameters 𝑎4 and 𝑎5 𝑌1 =
0 𝜃̈2 + 𝑔 0
0 0 cos 𝜃3
.
(70)
The same control parameters as for one parameter have been used. Tracking of the desired path for the RTR manipulator by linear time parametrization given by (71) has been presented in Figure 6a. The error of curvilinear distance 𝑒𝑠 = 𝑠 − 𝑠𝑑 has been plotted in Figure 6b. As a result, tracking errors of Cartesian coordinates in the normal plane have been presented in Figure 6c, d. In Figure 7a, b, two vectors of parame‐ ter errors 𝑎(𝑡) ̃ = 𝑎(𝑡) ̂ − 𝑎 have been presented. From plots in Figure 6, it can be concluded that path tracking with linear time parametrization for two unknown parameters of the dynamic model is real‐ ized properly and tracking errors converge to zero. 74
(d) Figure 6. Linear time parametrization for two unknown parameters: (a) – the trajectory of the manipulator, (b) – curvilinear error 𝑒𝑠 , (c) – distance error 𝑒2 = 𝑞2 − 𝑞2𝑑 and (d) – distance error 𝑒3 = 𝑞3 − 𝑞3𝑑
Moreover, real curvilinear parametrization 𝑠(𝑡) tends to the desired function 𝑠𝑑 (𝑡). As for one unknown parameter, distance tracking errors 𝑒2 and 𝑒3 have only positive values.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
(a) (a)
(b)
(b)
Figure 7. Linear time parametrization: (a) – estimate error 𝑎̃ 4 = 𝑎̂ 4 − 𝑎4 and (b) – estimate error 𝑎̃ 5 = 𝑎̂ 5 − 𝑎5
Figure 8. Square time parametrization: (a) – the trajectory of the manipulator, (b) – curvilinear error 𝑒𝑠
Plots 7a, b have shown that estimate errors of unknown parameters converge to zero 0 only for one of the unknown parameters. However, even if estimate value of the parameter is not equal to real value, the adaptive algorithm works correctly. 9.2. Square Time Parametrization In the case of the two unknown parameters in the dynamics and the square parameterized path 𝑇
𝜉𝑑 (𝑡) = (𝑠𝑑 , 𝑞2𝑑 , 𝑞3𝑑 )𝑇 (𝑡) = 0.1𝑡 − 0.0001𝑡 2 , 0, 0 , (71) simulation tests were carried out while maintaining the values of the control parameters. The obtained results are shown in the Figures 8–10. As can be seen in Figure 10, again the estimation errors of unknown parameters, and in particular 𝑎̃ 4 , did not converge to zero. This means that during the path following process, the realized trajectory did not meet the condition of persistent excitation.
(a)
10. Conclusion In the paper, the general solution to the path tracking problem in three‐dimensional space for the manipulator has been presented. To achieve the robot’s description relative to the curve, Serret‐Frenet parametrization with orthogonal projection on a given path has been used. Given equations are valid only if
(b) Figure 9. Square time parametrization: (a) – distance error 𝑒2 = 𝑞2 − 𝑞2𝑑 , (b) – distance error 𝑒3 = 𝑞3 − 𝑞3𝑑 75
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
of persistent excitation was met. Simulations have con irmed proper action of control algorithms intro‐ duced in the paper. Future works will be focused on path following with other parametrisations, which are not limited by the assumption that distance to the path has to stay not equal to zero.
(a)
AUTHORS Alicja Mazur∗ – Faculty of Electronics, Photonics and Microsystems, Wrocław University of Science and Technology, Janiszewskiego St. 11/17, 50‐372 Wrocław, Poland, e‐mail: alicja.mazur@pwr.edu.pl. Mirela Kaczmarek – Department of Physics and Astronomy, Institute of Experimental Physics, Univer‐ sity of Wrocław, Max Born Square 9, 50‐204 Wrocław, Poland, e‐mail: mirela.kaczmarek@uwr.edu.pl. ∗
Corresponding author
References
(b) Figure 10. Square time parametrization: (a) – estimate error 𝑎̃ 4 = 𝑎̂ 4 − 𝑎4 , (b) – estimate error 𝑎̃ 5 = 𝑎̂ 5 − 𝑎5 the distance between the object and the path, i.e., 𝑝−𝑟, does not equal zero. According to the fact, that the manipulator has been described by two groups of equations: expres‐ sions describing manipulator moving along the curve and dynamics equations, a cascaded control scheme has been proposed. The control scheme consists of two stages of cascade working simultaneously: the kinematic controller and the dynamic controller. The irst one is responsible for solving the geometric prob‐ lem of path tracking, and the second one makes it possible to realize velocities designed in the kinematic controller on the dynamic level. The dynamic controllers, non‐adaptive, adaptive, and robust algorithm, suitable for the manipulator with or without parametric uncertainty and structural uncertainty, were proposed. In simulation studies, we presented only the case of adaptive control with par‐ tial parameterization of the model. Comparing Fig‐ ures 4 and 6, it can be observed that the errors of the distance from the path in the normal plane, i.e., 𝑒2 and 𝑒3 , practically do not differ, even if only 1 parameter was unknown at irst, and then two parameters. What is signi icantly different for both cases are parameter estimation errors, see Figures 5 and 7. In Figure 5, the parameter estimation error tends to 0, while in Figure 7 only one error tends to 0, and the other is very large. This agrees with the theory of adaptive control, which guarantees convergence to 0 for track‐ ing errors. On the other hand, parameter estimation errors would all converge to 0 only if the condition 76
[1] F. Dyba, and A. Mazur. “Comparison of curvi‐ linear parametrizations methods and avoidance of orthogonal singularities in the path following task,” Journal of Automation, Mobile Robotics and Intelligent Systems, 2023, accepted for publica‐ tion. [2] M. Galicki. “Adaptive control of kinematically redundant manipulator along a prescribed geo‐ metric path,” Robot Motion and Control. Lecture Notes in Control and Information Sciences, vol. 335, Springer, London, 2006, pp. 129–139. [3] N. Hung, F. Rego, J. Quintas, J. Cruz, M. Jacinto, D. Souto, A. Potes, L. Sebastiao, and A. Pascoal. “A review of path following control strategies for autonomous robotic vehicles: Theory, simula‐ tions, and experiments,” Journal of Field Robotics, vol. 40(3), 2023, pp. 747–779. [4] M. Krstić, I. Kanellakopoulos, and P. V. Koko‐ tovic. Nonlinear and Adaptive Control Design, John Wiley & Sons, Inc.: New York, USA, 1995. [5] A. Mazur, and M. Kaczmarek. “Following 3D Paths by a manipulator – adaptive case,” A. Mazur and C. Zieliński, editors, Advances in Robotics (in Polish), vol. 1, 2022, pp. 45–58. [6] A. Mazur. “Hybrid adaptive control laws solv‐ ing a path following problem for nonholonomic mobile manipulators,” Int. Journal of Control, vol. 77(15), 2004, pp. 1297–1306. [7] A. Mazur, J. Płaskonka, and M. Kaczmarek. “Following 3D paths by a manipulator,” Archives of Control Sciences, vol. 25(1), 2015, pp. 117– 133; doi: 10.1515/acsc‐2015‐0008. [8] A. Mazur, and D. Szakiel. “On path following control of nonholonomic mobile manipulators,” International Journal of Applied Mathematics and Computer Science, vol. 19(4), 2009, pp. 561–574.
Journal of Automation, Mobile Robotics and Intelligent Systems
VOLUME 17,
N∘ 3
2023
[9] A. Micaelli, and C. Samson. “Trajectory tracking for unicycle‐type and two‐steering‐wheels mobile robots,” Technical Report No. 2097, Sophia‐Antipolis, 1993.
[12] K. Tchoń, A. Mazur, I. Dulęba, R. Hossa, and R. Muszyński. Manipulators and Mobile Robots: Models, Motion Planning, Control (in Polish), Aca‐ demic Publishing House PLJ: Warsaw, 2000.
[10] J. Płaskonka. “Different kinematic path following controllers for a wheeled mobile robot of (2,0) type,” Journal of Intelligent & Robotic Systems, vol. 77, 2013, pp. 481–498; doi: 10.1007/s10846‐ 013‐9879‐6.
[13] V. Utkin, J. Guldner, and J. Shi. Sliding Mode Control in Electro-mechanical Systems, CRC Press, 1999.
[11] D. Soetanto, L. Lapierre, and A. Pascoal. “Adaptive, non‐singular path‐following control of dynamic wheeled robots,” Proceedings of the IEEE Conference on Decision and Control, Maui, Hawaii, USA, 2003, pp. 1765–1770; doi: 10.1109/CDC.2003.1272868.
77
VOLUME 17, N∘ 3 2023 Journal of Automation, Mobile Robotics and Intelligent Systems
PARAMETER IDENTIFICATION OF SPACE MANIPULATOR’S FLEXIBLE JOINT Submitted: 5th January 2023; accepted: 16th March 2023
Mateusz Wojtunik, Fatina Liliana Basmadji, Grzegorz Granosik, Karol Seweryn DOI: 10.14313/JAMRIS/3‐2023/24 Abstract: A manipulator mounted on a satellite is often used to per‐ form active debris removal missions. The space manipu‐ lator control system needs to take the dynamic model of the satellite‐manipulator system into account because of the influence of the manipulator motion on the position and attitude of the satellite. Therefore, precise modeling of the space manipulator dynamics as well as parameter identification are needed to improve the credibility of the simulation tools. In this paper, we presented the iden‐ tification of the flexible‐joint space manipulator model based on dynamic equations of motion. Experiments were performed in an emulated microgravity environ‐ ment using planar air bearings. The arbitrarily selected joint‐space trajectory was performed by the manipula‐ tor’s control system. The experiments were repeated mul‐ tiple times in order to analyze the identification method sensitivity. The identification is based on the Simulink SimMechanics model. Thus, the procedure can be used for any space manipulator without the need to obtain analytical relations for dynamic equations each time. Including joint flexibility and spring viscous damping in the dynamic model allowed it to reflect the experimental measurements better than the reference model could. Identified parameters of the flexible joint have values of the same magnitude as corresponding real system parameters. Keywords: Orbital robotics, Mathematical modeling, Parameter identification, Flexible‐joint space manipula‐ tor, Microgravity simulator
1. Introduction The increase of space debris in low Earth orbit (LEO) led to the development of ideas for on‐orbit servicing (OOS) and active debris removal (ADR) mis‐ sions [1]. One of the considered methods for active removal of malfunctioning satellites from Earth’s orbit is to utilize a robotic manipulator mounted on a servicing satellite. Several OOS and ADR mis‐ sions have already been proposed, e.g., European Space Agency (ESA) mission e.Deorbit [2]. The con‐ tact between the debris and the satellite‐manipulator system has a signi icant in luence on the system dynamics. The dynamics characteristics of impact and contact between the target and the manipulator’s end‐effector are derived in [3]. The stages (approach, grasping, de‐orbit) of the debris removal using the 78
space manipulator are discussed in [4]. After the cap‐ ture of the debris, the mass and inertia of the target have a signi icant impact on the system dynamics. Thus, proper control for the detumbling of the system must be proposed [5, 6]. One of the advantages of space manipulators is their high Technology Readiness Level (TRL) in com‐ parison to other methods such as space nets or har‐ poons [7]. However, designing a control system for the manipulator mounted on the satellite is dif icult due to the in luence of the manipulator’s motion on the position and attitude of the servicing satellite. In close proximity to the target object, we encounter one of the following cases: the system is free‐ loating, the satellite is fully controlled, or the satellite control is used to give nonzero velocity before the manip‐ ulator’s operations (tangent capture case) [8]. Most commonly, it is assumed that the Attitude and Orbit Control System (AOCS) for the satellite is turned off because the external forces and torques induced by satellite thrusters can cause undesirable changes of position and orientation of the end‐effector, which might lead to a collision with the target object. As a consequence of uncontrollable satellites, the system has conserved momentum and angular momentum; thus, it is nonholonomic [9, 10]. Such systems will be referred to as free‐ loating (the irst case mentioned above). Due to the in luence of the manipulator motion on the state of the servicing satellite, it is necessary to design dedicated control laws that depend on the system’s dynamic parameters [11]. One of the main concepts for control of the free‐ loating manipulator is the so‐called Dynamic Jacobian, which depends on the state of both the manipulator and the satellite as well as on the mass and inertia parameters of the system [12]. In addition, there are multiple con‐ trol methods proposed for the satellite‐manipulator system, e.g., predictive control [13] and impedance control [14]. Since the control system of the space manipulator takes the dynamic model into account, it is extremely important to design a precise mathematical model of the space manipulator’s joint. Dynamic effects that occur in the joint have a signi icant in luence on the system’s dynamics. Therefore, including them in the mathematical model allows us to improve the credibility of the simulation tool. The most popular aspects considered in modeling the dynamics of the free‐ loating manipulator are joint lexibility, friction,
2023 © Wojtunik et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0) (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems
link lexibility, gear kinematics, and gear dynamics. The lexibility of the joint has a large impact on the control system’s stability because it might generate undesirable oscillations that can lead to worse posi‐ tioning of the end‐effector or even to the instability of the control system [15]. Modeling of the lexible‐ joint ixed‐base space manipulator was presented in [16, 17]. The dynamic model of the lexible‐joint and lexible‐link free‐ loating manipulator was derived in [18], while joint friction was additionally included in [19]. The in luence of the link and joint lexibility on the vibrations of the manipulator’s end‐effector is analyzed in [20], while controlled vibration sup‐ pression of a lexible‐base lexible‐joint space manip‐ ulator is discussed in [21]. The lexibility effects of the target capture during the debris removal proce‐ dure are discussed in [22]. Target grasping maneu‐ vers of a lexible‐link space manipulator are discussed in terms of collision avoidance, aerodynamic effects, and gravity gradient in [23]. The utilization of Kane’s method in deriving the dynamic model of a lexible‐ link space manipulator is presented in [24]. The in lu‐ ence of the nonlinear friction joint torque on the lexible space manipulator’s control system was ana‐ lyzed in [25]. The general expressions for the system’s total momentum and angular momentum as well as dynamic equations of motion of lexible‐joint free‐ loating manipulators were presented in [26]. The analysis of the impact of gear kinematics on the free‐ loating manipulator dynamics was discussed in [27]. The fact that the vast majority of control laws designed for free‐ loating manipulators depend on the system’s dynamic parameters exposes the neces‐ sity of experimental parameter identi ication. There are two main approaches for the identi ication of a space manipulator’s parameters. The irst of them takes advantage of the linear parametrization of the dynamic equations of motion [28], while the other one is based on equations for the conservation of total angular momentum of the system [29]. However, the former approach usually requires noise‐sensitive measurements of the angular acceleration of the satel‐ lite as well as manipulator joint accelerations. On the other hand, some parameters have to be known a pri‐ ori while using the latter approach (angular momen‐ tum equations). A novel method that allows for the full identi ication of a space manipulator’s dynamic parameters with the assumption of perfectly rigid joints is presented in [30], while parameter estimation of lexible space manipulators is discussed in [31]. In this paper, we present the identi ication of the space manipulator’s lexible joint parameters based on several experimental results. The mathematical model of the planar free‐ loating space manipulator with lexible joints and viscous joint damping is pre‐ sented. One of the main novelties of the paper is that the experiments were performed with the use of a real lexible‐joint manipulator operated on the air‐bearing microgravity test‐bed. The identi ication procedure is based on optimization techniques used to match the response of the Simulink SimMechanics model of the system with experimental results. The main goal of
VOLUME 17,
N∘ 3
2023
the identi ication is to improve the credibility of the designed system model, while obtaining real values of the lexible‐joint parameters is only a secondary goal of the presented research. The paper is organized as follows. The dynamic model of the system is described in Section 2. The air‐bearing microgravity test‐bed is presented in Section 3 along with the description of performed experiments. The identi ication results are presented in Section 4 and discussed in Section 5. The paper is brie ly summarized in Section 6.
2. Dynamics of Planar Flexible‐Joint Space Manipulator Let us consider a planar manipulator mounted on a free‐ loating satellite. The manipulator has 𝑛 rota‐ tional lexible joints. The schematic view of the system is presented in Figure 1. A stationary inertial frame of reference is denoted as Π𝑖𝑛𝑒 . It is ixed in an arbitrarily selected point in the Cartesian space. The idea for modeling the lexible joint is schemat‐ ically depicted in Figure 2. The model is based on connecting a motor (rigid part of the joint) and a link with a torsional spring and a damper ( lexible part of the joint). In comparison to the reference model (with perfectly rigid joints), three additional parameters
Figure 1. Schematic view of planar n‐DoF space manipulator
Figure 2. Schematic view of flexible joint with viscous damping 79
Journal of Automation, Mobile Robotics and Intelligent Systems
describe the joint: motor inertia, spring stiffness, and viscous damping coef icient. In order to derive the dynamic equations of motion, we follow the Lagrange formalism. The vector of generalized coordinates for the analyzed planar system is given by: x = 𝑥0
𝑦0
𝑞0
q𝑇
q𝑇𝑚
𝑇
(1)
where 𝑥0 ∈ ℝ1 and 𝑦0 ∈ ℝ1 denote the X and Y com‐ ponents of the satellite Center of Mass (CoM) position vector, respectively, 𝑞0 ∈ ℝ1 denotes the attitude of 𝑇 ∈ ℝ𝑛 denotes the the satellite, q = 𝑞1 … 𝑞𝑛 vector of manipulator joint angular positions, while 𝑇 q𝑚 = 𝑞𝑚 1 … 𝑞𝑚 𝑛 ∈ ℝ𝑛 denotes the vector of motor angular positions. It is noteworthy that q and q𝑚 are measured with respect to the same frame of reference (see Figure 1). The second‐order Euler‐Lagrange equation is given by [32]: 𝑑 𝑑𝑡
𝜕ℒ 𝜕ℒ 𝜕ℱ − + =Q 𝜕ẋ 𝜕x 𝜕ẋ
(2)
where ℒ ∈ ℝ1 denotes the Lagrange function of the system, de ined as difference between system’s total kinetic energy and potential energy, ℱ ∈ ℝ1 denotes Rayleigh’s dissipation function, while Q ∈ ℝ3+2𝑛 denotes the vector of generalized forces. In the notation used in (2) we assume that the derivative of a scalar function with respect to a column vector is a column (in order to avoid writing multiple transpositions). As stated in Section 1, we assume that the ser‐ vicing satellite is uncontrolled. As a result, the irst three entries of the generalized forces vector are equal 𝑇 to zero; thus, Q = 0 0 0 01x𝑛 𝝉𝑇 , where 𝑛 01x𝑛 ∈ ℝ denotes vector illed with zeros, while 𝑇 𝝉 = 𝜏1 … 𝜏𝑛 ∈ ℝ𝑛 denotes the vector of manipulator control torques. It is worth noting that the control torques are applied to the motors (rigid parts of manipulator joints). In the case of space manipulators, the gravitational potential energy is usually assumed to be equal to zero. In addition, the gravity gradient is neglected due to the relatively small size of the manipulator [33]. Each joint of the manipulator has a potential energy coming from the spring. We assume that the stiffness of the spring is linear [34], which is justi ied by the fact that the same model was used, e.g., in [16, 17, 26]. Thus, the Lagrange function of the analyzed system is given by: ℒ = 𝑇0 + 𝑇𝑟 + 𝑇𝑚 − 𝑉 (3) Kinetic energies of the satellite 𝑇0 ∈ ℝ1 , the robotic manipulator 𝑇𝑟 ∈ ℝ1 and the motors 𝑇𝑚 ∈ ℝ1 as well as total spring potential energy 𝑉 ∈ ℝ1 are given by: 1 1 𝑇0 = 𝑚0 (𝑥̇ 02 + 𝑦̇ 02 ) + 𝐼0 𝑞̇ 02 2 2 𝑛
𝑇𝑟 = 𝑖=1
80
1 1 2 𝑚 |v | + 𝐼𝑖 𝜔𝑖2 2 𝑖 𝑖 2
(4)
(5)
VOLUME 17,
𝑛
(6)
1 2 𝑘 (𝑞 − 𝑞𝑖 ) 2 𝑖 𝑚𝑖
(7)
𝑖=1
𝑉= 𝑖=1
2023
1 𝐼 𝜔 2 2 𝑚𝑖 𝑚𝑖
𝑇𝑚 = 𝑛
N∘ 3
where 𝑚0 ∈ ℝ1 and 𝐼0 ∈ ℝ1 denote mass and inertia of the satellite, respectively, 𝑚𝑖 ∈ ℝ1 and 𝐼𝑖 ∈ ℝ1 denote mass and inertia of the 𝑖‐th manipulator link, v𝑖 ∈ ℝ2 denotes the linear velocity of the 𝑖‐ th manipulator link’s CoM with respect to Π𝑖𝑛𝑒 , | ⋅ | denotes Euclidean norm, 𝜔𝑖 ∈ ℝ1 denotes the angular velocity of the 𝑖‐th manipulator link with respect to Π𝑖𝑛𝑒 , 𝐼𝑚 𝑖 ∈ ℝ1 denotes inertia of the 𝑖‐th motor, 𝜔𝑚 𝑖 ∈ ℝ1 denotes the angular velocity of the 𝑖‐th motor with respect to Π𝑖𝑛𝑒 , while 𝑘𝑖 ∈ ℝ1 is the stiffness of 𝑖‐th manipulator joint (spring). It is assumed that the mass of each motor is included in the mass of the respective manipulator link; thus, 𝑇𝑚 consists only of rotational kinetic energy. The Rayleigh’s dissipation function for the ana‐ lyzed system is given by: 𝑛
ℱ= 𝑖=1
1 2 𝑏 (𝑞̇ − 𝑞̇ 𝑖 ) 2 𝑖 𝑚𝑖
(8)
where 𝑏𝑖 ∈ ℝ1 denotes viscous damping coef icient of the 𝑖‐th manipulator joint. After evaluating v𝑖 , 𝜔𝑖 , and 𝜔𝑚𝑖 as functions of generalized coordinates and their derivatives, the Lagrange function and Rayleigh’s dissipation function can be directly applied to (2) which yields the dynamic equations of motion of the system: (M∗ + J∗ )ẍ + (C∗ + B∗ )ẋ + K∗ x = Q
(9)
Relations for matrices that appear in (9) are pre‐ sented below. M∗ =
M 0𝑛x(3+𝑛)
0(3+𝑛)x𝑛 0𝑛x𝑛
(10)
J∗ =
02x2 0(1+2𝑛)x2
02x(1+2𝑛) J
(11)
C∗ =
C 0𝑛x(3+𝑛)
0(3+𝑛)x𝑛 0𝑛x𝑛
(12)
03x3 0 B = 𝑛x3 0𝑛x3
03x𝑛 B −B
03x𝑛 −B B
(13)
03x3 0𝑛x3 0𝑛x3
03x𝑛 K −K
03x𝑛 −K K
(14)
∗
K∗ =
Matrices M ∈ ℝ(3+𝑛)x(3+𝑛) and C ∈ ℝ(3+𝑛)x(3+𝑛) are mass and Coriolis matrices for the reference model (perfectly rigid manipulator joints), respec‐ tively. These matrices depend on satellite and manipu‐ lator parameters and explicit relations for them can be found in [35]. Matrices M∗ ∈ ℝ(3+2𝑛)x(3+2𝑛) and C∗ ∈
Journal of Automation, Mobile Robotics and Intelligent Systems
ℝ(3+2𝑛)x(3+2𝑛) are illed with zeros, which is a result of expressing motor angular positions with respect to the same frame of reference as lexible joint angular positions. Matrices B ∈ ℝ𝑛x𝑛 and K ∈ ℝ𝑛x𝑛 denote damping and stiffness matrices, respectively, given by: K = diag(𝑘1 , 𝑘2 , … , 𝑘𝑛 )
(15)
B = diag(𝑏1 , 𝑏2 , … , 𝑏𝑛 )
(16)
where diag denotes an operator that forms a diagonal matrix from given parameters – such a matrix has nonzero components only on its diagonal. Matrix J ∈ ℝ(1+2𝑛)x(1+2𝑛) is the inertia matrix of manipulator motors given by: J𝐴 J = 01x𝑛 J𝑇𝐵
0𝑛x1 0 0𝑛x1
J𝐵 01x𝑛 J𝐶
(17)
where each submatrix is derived as follows: 𝑛
⎡ ⎢ 𝐼𝑚𝑖 ⎢ 𝑖=1 ⎢ 𝑛 ⎢ 𝐼𝑚 𝑖 ⎢ 𝑖=2 ⎢ J𝐴 = ⎢ 𝑛 ⎢ 𝐼𝑚𝑖 ⎢ 𝑖=3 ⎢ ⋮ ⎢ 𝑛 ⎢ ⎢ 𝐼𝑚𝑖 ⎣𝑖=𝑛 𝐼 ⎡ 𝑚1 ⎢ 0 J𝐵 = ⎢ 0 ⎢ ⋮ ⎣ 0
𝐼𝑚2 𝐼𝑚2 0 ⋮ 0
𝑛
𝑛
𝑛
𝐼𝑚𝑖 𝑖=2 𝑛
𝐼𝑚𝑖
⋯
𝑖=3 𝑛
𝐼𝑚𝑖 𝑖=2 𝑛
𝑖=𝑛 𝑛
𝐼𝑚𝑖
⋯
𝑖=3 𝑛
𝐼𝑚𝑖
𝐼𝑚𝑖
⋯
𝑖=3
𝑖=3
⋮
⋮
⋱
𝐼𝑚𝑖
⋯
𝑛
𝑛
𝐼𝑚𝑖 𝑖=𝑛
𝐼𝑚 3 𝐼𝑚 3 𝐼𝑚 3 ⋮ 0
𝑖=𝑛
⋯ 𝐼𝑚𝑛−1 ⋯ 𝐼𝑚𝑛−1 ⋯ 𝐼𝑚𝑛−1 ⋱ ⋮ ⋯ 0
J𝐶 = diag(𝐼𝑚1 , 𝐼𝑚2 , … , 𝐼𝑚𝑛 )
⎤ 𝐼𝑚𝑖 ⎥
⎥ ⎥ 𝐼𝑚𝑖 ⎥ ⎥ 𝑖=𝑛 ⎥ 𝑛 ⎥ (18) 𝐼𝑚𝑖 ⎥ 𝑖=𝑛 ⎥ ⋮ ⎥ ⎥ 𝑛 ⎥ 𝐼𝑚𝑖 ⎥ ⎦ 𝑖=𝑛 𝐼𝑚𝑛 ⎤ 𝐼𝑚𝑛 ⎥ 𝐼𝑚𝑛 ⎥ ⋮ ⎥ 𝐼𝑚𝑛 ⎦
VOLUME 17,
N∘ 3
2023
in emulated microgravity environment using dedi‐ cated test‐bed operated by the Space Research Centre of the Polish Academy of Sciences (CBK PAN) [36, 37]. 3.1. Microgravity Air‐Bearing Test‐Bed The test‐bed consists of a precisely leveled granite table, a planar manipulator mounted on the servic‐ ing satellite mock‐up, and the operator’s computer. The granite table is 3 m x 2 m. Planar air bearings located under both manipulator and satellite mock‐ ups allow us to emulate microgravity environment via producing a thin ilm of pressurized air between the surface of the table and air bearings. As a result, the manipulator’s motion becomes frictionless in the horizontal plane. The pressurized air is provided by an air tank located on the satellite mock‐up. The manip‐ ulator has three rotational joints, the irst of which is lexible, and the other two are rigid. As a result, 𝑇 q = 𝑞1 𝑞2 𝑞3 and q𝑚 = 𝑞𝑚1 . The lexibility of the irst joint is achieved by introducing the compliant component. CAD models of the lexible joint and the compliant component are presented in Figure 3 [38]. The control system of the manipulator is realized in the joint space and is based on the ATSAMA5D36 microcontroller located on the satellite mock‐up. The manipulator joint angular positions are measured via encoders. In addition, an external vision system allows the measurement of position and orientation (with respect to Π𝑖𝑛𝑒 ) of dedicated markers mounted on both the satellite mock‐up and the manipulator’s end‐ effector. Each joint has a harmonic gear with a gear reduction ratio of 100 ∶ 1. Parameters of the manipulator are listed in Table 1. The mass of the satellite mock‐up is equal to 58.69 kg,
(19)
(20)
It is noteworthy that the dynamic model considers a slowly rotating motor because gear kinematics are not included. In the real hardware, we expect a quickly rotating motor that is connected with the rigid part of the joint (output shaft) via gearbox. Then, the output shaft is connected with the manipulator link via spring and damper (as seen in Figure 2). Therefore, in order to include inertia of quickly rotating motor in 𝐼𝑚𝑖 , its moment of inertia must be multiplied by the square of the gear reduction ratio. In addition, the model presented above can also be used if the manipulator has both rigid and lexible joints. In such situations, the respective sizes of each matrix can vary depending on the number of lexible joints.
3. Description of the Test‐Bed and Performed Experiments We present the identi ication of lexible joint of the space manipulator based on experiments performed
Figure 3. a – CAD model of the flexible joint; b – CAD model of the compliant component; 1 – gear; 2 – sleeve; 3 – compliant component; 4 – connector 81
Journal of Automation, Mobile Robotics and Intelligent Systems
Table 1. Parameters of the manipulator Parameter Mass [kg] Moment of 2 inertia [kgm ] Length [m]
Link 1 2.81 0.0637
Link 2 2.82 0.0635
Link 3 4.64 0.0515
0.449
0.449
0.310
Position vector of CoM [m]
0.1362 −0.0017
0.1340 −0.0005
0.1511 0.0004
while its moment of inertia is equal to 2.42 kgm2 . The position vector of the manipulator mounting point expressed with respect to Π0 and rotated to Π0 is equal 𝑇 (Π ) to p𝑚 0 = 0.337 m −0.001 m . Expressing this vector as rotated to Π0 is convenient as entries of the vector are then constant. 3.2. Description of the Test Campaign A set of experiments was performed on the micro‐ gravity air‐bearing simulator. In each test, the manip‐ ulator control system tracks the trajectory de ined in the joint space. It is noteworthy that the encoder in the irst joint is located before the compliant component. Therefore, the measurements of the encoder relate to the position of the motor 𝑞𝑚1 (rigid part of the joint). Hence, in each test the trajectory of the motor angular position will be realized. The discussion concerning obtaining the angular position of the manipulator’s lexible joint 𝑞1 will be brought up in the next section. The initial position and attitude of the satellite are equal to 0 at the beginning of every experiment. Both manipulator and satellite have zero initial velocity. The experiment was repeated 5 times. The test scenario is chosen arbitrarily. The desired motion of the manipulator is represented by a joint‐space trajectory of each joint accelerating with a constant acceleration. The initial condition for the manipulator is: 𝑞𝑚1 (𝑡0 ) = 7.5 deg, 𝑞2 (𝑡0 ) = −15 deg, 𝑞3 (𝑡0 ) = −22.5 deg. The trajectory lasts 10 s. 3.3. Calculation of the Manipulator’s Flexible Joint Angular Position As stated in the previous Section, the encoder in the irst joint of the manipulator allows the measure‐ ment of the angular position of the motor (rigid part of the joint). In order to obtain the angular position of the lexible joint of the manipulator, the external vision system measurements of the position and attitude of the satellite mock‐up and the position and orientation of the end‐effector are used. However, it is noteworthy that such measurements will not be available in a real mission on orbit. In order to obtain the angular position of the lex‐ ible joint of the manipulator, we use the direct kine‐ matics equation written for the satellite‐manipulator system: 𝑥𝑒 𝑥 𝑙 𝑙 𝑙 (Π ) = 0 + R0 p𝑚 0 + R1 1 + R2 2 + R3 3 𝑦𝑒 𝑦0 0 0 0 (21) 82
VOLUME 17,
N∘ 3
2023
where 𝑥𝑒 ∈ ℝ1 and 𝑦𝑒 ∈ ℝ1 denote the X and Y compo‐ nents of the end‐effector position vector, respectively, 𝑙𝑖 ∈ ℝ1 denotes the length of the 𝑖‐th manipulator link, while R0 ∈ ℝ2x2 denotes the rotation matrix of the satellite with respect to Π𝑖𝑛𝑒 : cos(𝑞0 ) −sin(𝑞0 ) sin(𝑞0 ) cos(𝑞0 )
R0 =
(22)
The variable Ri ∈ ℝ2x2 denotes the rotation matrix of the 𝑖‐th manipulator link with respect to Π𝑖𝑛𝑒 (with 𝑖 = 1, 2, 3): 𝑖
⎡ 𝑞𝑗 ⎢cos 𝑞0 + ⎢ 𝑗=1 R𝑖 = ⎢ ⎢ 𝑖 ⎢ 𝑞𝑗 ⎢ sin 𝑞0 + 𝑗=1 ⎣
𝑖
⎤ 𝑞𝑗 ⎥ ⎥ 𝑗=1 ⎥ (23) ⎥ 𝑖 ⎥ cos 𝑞0 + 𝑞𝑗 ⎥ 𝑗=1 ⎦
−sin 𝑞0 +
Equation (21) needs to be solved for 𝑞1 which appears in R1 , R2 and R3 . Position components of the end‐effector 𝑥𝑒 , 𝑦𝑒 as well as position and attitude of the satellite 𝑥0 , 𝑥0 , 𝑞0 are measured via external vision system. The angular positions of the second and third joints of the manipulator 𝑞2 , 𝑞3 are measured via encoders. For each performed experiment, Equa‐ tion (21) is used in order to obtain the angular position of the lexible joint of the manipulator.
4. Identification of Flexible Joint Parameters In order to match the model with the behavior of the manipulator on the air‐bearing test‐bed, three parameters will be identi ied: motor inertia 𝐼𝑚1 , joint stiffness 𝑘1 , and viscous damping coef icient 𝑏1 . In addition, it was decided that the initial spring offset 𝜆1 (𝑡0 = 0) = 𝑞1 (𝑡0 = 0) − 𝑞𝑚1 (𝑡0 = 0) will be identi ied for each experiment. The initial spring offset cannot be controlled on the test‐bed. At the beginning of the test, the control system is able to maintain constant angular positions of the motor 𝑞𝑚1 , the second joint 𝑞2 , and the third joint 𝑞3 . However, the angular position of the lexible joint 𝑞1 oscillates around the equilibrium point 𝑞1 = 𝑞𝑚1 as a result of spring potential energy. Then, the desired trajectory is loaded to the manipulator’s control system, and the spring offset at that current time will lead to different dynamics of the system during trajectory realization. The initial spring offset in luences the amplitude of the lexible joint angular position oscillations. The implemented identi ication method is based on the optimization technique. The dynamic Equa‐ tions (9) are linearly parametrized with respect to three above‐mentioned lexible joint parameters. Therefore, it is intuitive to use a standard technique such as least‐square method with State‐Variable Fil‐ ter (SVF) [39]. However, the initial spring offset appears in the dynamic equations of motion non‐ linearly under trigonometric functions. Second, in the case of free‐ loating manipulators, the analyti‐ cal derivation of the dynamic equations of motion
Journal of Automation, Mobile Robotics and Intelligent Systems
is very complicated. Therefore, the goal of the pre‐ sented work was to implement a procedure that would be based on Simulink SimMechanics model of the system [40]. Designing a Simulink model of the satellite‐manipulator system is much easier than deriving analytical relations. The SimMechanics model was validated with the use of analytical expres‐ sions for the dynamic model. In the future, the deriva‐ tion of the dynamic equations of motion will not be needed again, as it is cross‐checked that the SimMe‐ chanics model gives valid results. The SimMechanics model can be actuated via two methods: by applying prede ined motion of the body or by applying generalized forces. In the presented approach we actuate the motion of the second and third joint of the manipulator with the use of encoder measurements obtained in experiments. The satellite is not actuated in order to comply with the nonholo‐ nomic nature of the system. Finally, the motion of the motor (rigid part of the irst joint) is actuated with driving torques obtained with the use of inverse dynamics approach for reference (rigid joints) model. The motion of the manipulator’s lexible joint is then caused by the dynamic coupling of the spring‐damper system. By changing parameters 𝐼𝑚1 , 𝑘1 , 𝑏1 , 𝜆1 we are able to obtain different motion of the lexible joint. In order to utilize the SimMechanics model in the identi ication procedure, we choose the Nelder‐ Mead method (also called the downhill simplex method) [41]. The cost function for the optimization procedure is chosen as:
VOLUME 17,
N∘ 3
2023
Figure 4. Flexible joint angular position – comparison between model and experimental results for test #1
Figure 5. Flexible joint angular position – comparison between model and experimental results for test #2
̂ , 𝑘̂ 1 , 𝑏̂ 1 , 𝜆1 ) 𝑓𝑐𝑜𝑠𝑡 (𝐼𝑚 1 𝑝
̂ , 𝑘̂ 1 , 𝑏̂ 1 , 𝜆1 ) − 𝑞1 (𝑡𝑖 ))2 ) ((𝑞1𝑚𝑜𝑑 (𝑡𝑖 , 𝐼𝑚 𝑒𝑥 1
= 𝑖=1
(24) where ∧ denotes estimation value used in each iter‐ ation of the optimization procedure, subscript 𝑚𝑜𝑑 refers to SimMechanics model response, subscript 𝑒𝑥 refers to the performed experiment (see Section 3.3), while 𝑝 denotes number of measurements. The minimum of the cost function means that for the chosen parameters we obtained the best match‐ ing between the model response and the experimen‐ tal results. The identi ication procedure changes the identi ied parameters and solves the SimMechanics model in each iteration in order to ind the best solu‐ tion. The lower and upper bounds for each parame‐ ter are: ̂ < 100 kgm2 0.001 kgm2 < 𝐼𝑚 1 0.01
Nm Nm < 𝑘̂ 1 < 100 rad rad
0 < 𝑏̂ 1 < 10
kgm s
−0.1 rad < 𝜆1 < 0.1 rad The identi ication procedure was performed for each experiment. The angular position of the
Figure 6. Flexible joint angular position – comparison between model and experimental results for test #3
manipulator’s lexible joint for the experiment as well as the simulation of the reference model and the lexible‐joint model with identi ied parameters are presented in Figures 4–8 for each experiment, respectively. The measurements from the vision system and from the encoders contain noise [37]. Therefore, the identi ied values are rounded to one decimal place so that the identi ication accuracy is adequate to measurement uncertainty. The obtained values of the lexible joint parameters are listed in Table 2. The identi ied initial spring offset for each experiment are presented in Table 3. The frames obtained from the vision system during one of the tests are presented in Figure 9. 83
Journal of Automation, Mobile Robotics and Intelligent Systems
Figure 7. Flexible joint angular position – comparison between model and experimental results for test #4
Figure 8. Flexible joint angular position – comparison between model and experimental results for test #5 Table 2. Flexible joint parameter identification results
Value
Parameter Symbol
Motor inertia 𝐼𝑚1
Unit
[kgm ]
Nm rad
s
Test #1 Test #2 Test #3 Test #4 Test #5 Average Standard deviation
6.9 3.3 5.6 7.6 4.5 5.6 1.8
12.8 5.1 9.2 5.1 8.3 8.1 3.2
1.3 0.4 0.5 0.0 0.1 0.5 0.5
2
Joint stiffness 𝑘1
Damping coef icient 𝑏1 kgm
Value
Table 3. Values of the identified initial spring offset Symbol Unit Test #1 Test #2 Test #3 Test #4 Test #5
𝜆1 [deg] −1.3 2.8 −0.2 −0.4 −1.1
5. Discussion The identi ication procedure gave satisfactory results. By analyzing each igure, it can be concluded that in each case the lexible‐joint model response re lects the experimental results better than the refer‐ ence model response. The best matching between the 84
VOLUME 17,
N∘ 3
2023
Figure 9. Frames captured by the vision system camera during Test #2
test results and SimMechanics results was obtained for Test #2 (see Figure 5). The frequency and ampli‐ tude of oscillations caused by the lexibility of the joint is well illustrated. In other experiments, it is observed that the oscillations caused by the joint lexibility are not as signi icant. This might be caused by lower initial spring offset which will be discussed in detail below. In addition, it is observed that the motion of the lexi‐ ble joint in the latter part of the test is more similar to the experimental results for each case of the lexible‐ joint model simulation. The numerical results for the motor inertia range from 3.3 kgm2 in Test #2 to 7.6 kgm2 in Test #4. The average obtained value in the identi ication is equal to 5.6 kgm2 ± 1.8 kgm2 . Although the standard deviation is quite high, the identi ied values had the same order of magnitude in each case. As explained in Section 2, the motor inertia 𝐼𝑚1 in the analyzed model is expected to be very high (identi ied parameter is one order of magnitude higher than each manipula‐ tor link’s inertia) because the inertia of the quickly rotating motor must be scaled with the square value of the harmonic gear reduction ratio. The motor inertia in luences two main aspects in the model response. First, high 𝐼𝑚1 values cause the amplitude of spring oscillations to be greater. Second, this parameter in lu‐ ences the value of the d’Alambert (inertial) reaction forces induced within the system. The identi ied value for the joint stiffness is equal Nm Nm to 8.1 ± 3.2 . The design value of the compliant rad rad component can be obtained from the CAD model and Nm is equal to 5.47 [38]. This value serves as the only rad reference despite the fact that the real stiffness of the compliant component might be slightly different than the one obtained using the CAD model. Although the average identi ied value has the proper order of magnitude, the exact value differs from the design value. However, it is observed that the identi ication for Test #2 (which was previously concluded to be the best result of all experiments) and Test #4 led Nm which is very similar to the stiffness value of 5.1 rad to the corresponding CAD value. The joint stiffness used in the lexible‐joint model allows us to shape the frequency and the amplitude of spring oscillations. For instance, it can be clearly seen in Figure 4 that the
Journal of Automation, Mobile Robotics and Intelligent Systems
frequency of the oscillations is not identical to that of the experiment. The identi ied stiffness for this case is very high. However, the latter parts of the simulation response are very close to the experimental results. The procedure may be improved by introducing a scal‐ ing factor so that the cost function values the irst part of the experiment higher than the latter parts because the lexibility of the joint at the end of the experiment was damped. It must be here repeated (as previously stated in Section 1) that the main purpose of the identi ication was to improve the credibility of the simulation tool, not to identify real system parameters. On the other hand, the obtained standard deviation is high, which might lead to a conclusion that the compliant compo‐ nent’s stiffness is not linear, and the proposed spring model might be imprecise. The results for the damping coef icient were unsat‐ isfactory, e.g., the procedure identi ied no damping for Test #4 and very high damping for Test #1. This leads to a conclusion that the friction of the spring is not linear; hence, the proposed model for the oscillation damping is not correct. Future work might include identi ication of friction forces induced within both the motor (rigid part of the joint) and compliant component. Finally, in each experiment, the initial spring offset was different. As explained before, this value cannot be controlled in any way during the tests. In Test #2 (identi ied as the best example from performed exper‐ iments), the highest initial spring offset was equal to 2.8 deg. On the other hand, Figure 6 shows that in Test #3 almost no oscillations at the irst part of the test were obtained. The initial spring offset for this test was much lower and equal to −0.2 deg. As a result, the joint stiffness identi ied for this case is very different than the reference value (CAD model). It is intuitive that the identi ication procedure is very sensitive to initial spring offset. If the offset is high, we achieve greater excitation of identi ied parameters, which is very important in parameter identi ication. Conse‐ quently, the results are more credible if the parameter excitation is large enough, and we desire as high initial spring potential energy as possible in order to obtain good identi ication results.
6. Conclusion In this paper, we have proposed the space manipu‐ lator’s lexible joint parameter identi ication based on experiments performed in an emulated microgravity environment. The experiments were performed using the test‐bed based on air bearings. The manipulator has a lexible irst joint, which allows testing of the motion of the lexible‐joint manipulator and compari‐ son with the simulation results. The performed identi ication results are satisfac‐ tory, although various problems were observed in the proposed approach. The identi ied motor iner‐ tia seems high, but the results are correct as the value should account for the quickly rotating motor inertia scaled with the square of the gear reduction
VOLUME 17,
N∘ 3
2023
ratio. The identi ied joint stiffness value has the same order of magnitude as it was obtained from the CAD model. The quality of the identi ication highly depends on the initial spring offset – higher offsets achieved better the identi ications. High initial offset allows greater excitation of identi ied parameters, which is very important in parameter identi ication. Finally, the obtained damping coef icients are very different in each experiment, which leads to a conclusion that the spring friction is nonlinear and the proposed model is inadequate. Future work might include utilizing the nonlin‐ ear friction model in the identi ication procedure and modi ication of the test‐bed so that the initial spring offset can be controlled. AUTHORS Mateusz Wojtunik∗ – The Space Research Centre of the Polish Academy of Sciences (CBK PAN),ul. Bar‐ tycka 18a, 00‐716 Warsaw, Poland, e‐mail: mwojtu‐ nik@cbk.waw.pl. Fatina Liliana Basmadji – The Space Research Cen‐ tre of the Polish Academy of Sciences (CBK PAN),ul. Bartycka 18a, 00‐716 Warsaw, Poland, e‐mail: bas‐ madji@cbk.waw.pl. Grzegorz Granosik – Lodz University of Technology, Institute of Automatic Control, Poland, e‐mail: grze‐ gorz.granosik@p.lodz.pl. Karol Seweryn – The Space Research Centre of the Polish Academy of Sciences (CBK PAN),ul. Bar‐ tycka 18a, 00‐716 Warsaw, Poland, e‐mail: ksew‐ eryn@cbk.waw.pl. ∗
Corresponding author
ACKNOWLEDGEMENTS This work was partially supported by the Polish National Centre for Research Development, project no. LIDER/19/0119/L‐10/18/NCBR/2019. The authors would like to thank Dr. Tomasz Rybus from Centrum Badań Kosmicznych Polskiej Akademii Nauk (CBK PAN) for his helpful suggestions.
References [1] C. Bonnal, J. M. Ruault, and M. C. Desjean. “Active debris removal: Recent progress and current trends,” Acta Astronautica, vol. 50, 2013, pp. 71– 96; doi: 10.1016/j.actaastro.2012.11.009. [2] S. Estable, et al. “Capturing and deorbiting Envisat with an Airbus Spacetug. Results from the ESA e.deorbit Consolidation Phase study,” Journal of Space Safety Engineering, vol. 7, no. 1, 2020, pp. 52–66; doi: 10.1016/j.jsse.2020. 01.003. [3] P. Huang, Y. Xu, and B. Liang. “Contact and impact dynamics of space manipulator and free‐ lying target,” Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2005; doi: 10.1109/IROS.2005.1545260. 85
Journal of Automation, Mobile Robotics and Intelligent Systems
[4] L. Felicetti, P. Gasbarri, A. Pisculli, M. Sabatini, and G. B. Palmerini. “Design of robotic manipula‐ tors for orbit removal of spent launchers’ stages,” Acta Astronautica, vol. 119, 2016, pp. 118–130; doi: 10.1016/j.actaastro.2015.11.012. [5] F. Aghili. “Optimal control of a space manipula‐ tor for detumbling of a target satellite,” Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 2009; doi: 10.1109/ROBOT.2009.5152235. [6] B. Zhan, M. Jin, G. Yang, and C. Zhang. “A novel strategy for space manipulator detumbling a non‐cooperative target with collision avoidance,” Advances in Space Research, vol. 66(4), 2020, pp. 785–799; doi: 10.1016/j.asr.2020.05.045. [7] M. Shan, J. Guo, and E. Gill. “Review and comparison of active space debris capturing and removal,” Progress in Aerospace Sciences, vol. 80, 2016, pp. 18–32; doi: 10.1016/j.paerosci. 2015.11.001. [8] K. Seweryn, F. L. Basmadji, and T. Rybus. “Space robot performance during tangent capture of an uncontrolled target satellite,” The Journal of the Astronautical Sciences, vol. 69, 2022, pp. 1017– 1047; doi: 10.1007/s40295‐022‐00330‐2. [9] I. Dulêba. “Impact of control representations on ef iciency of local nonholonomic motion plan‐ ning,” Biuletyn of the Polish Academy of Sciences Technical Sciences, vol. 59, no. 2, 2011, pp. 213– 218; doi: 10.2478/v10175‐011‐0026‐x. [10] J. Ratajczak, and K. Tchoń. “Normal forms and singularities of non–holonomic robotic systems: a study of free– loating space robots,” Systems & Control Letters, vol. 138, 2020, 104661; doi: 10.1016/j.sysconle.2020.104661. [11] A. Ellery. “Tutorial Review on Space Manipula‐ tors for Space Debris Mitigation,” Robotica, vol. 8, no. 2, 2019; doi: 10.3390/robotics8020034.
N∘ 3
2023
Conference on Robotics, Automation and Mechatronics, Chengdu, China, 2008, pp. 812–818; doi: 10.1109/RAMECH.2008.4681344. [17] S, Ulrich, J. Z. Sa̧siadek, and I. Barkana. “Modeling and direct adaptive control of a lexible–joint manipulator,” Journal of Guidance, Control, And Dynamics, vol. 35, no. 1, 2012, pp. 25–39; doi: 10.2514/1.54083. [18] X.‐Y. Yu. “Augmented robust control of a free– loating lexible space robot,” Journal of Aerospace Engineering, vol. 229, no. 5, 2015, pp. 947–957; doi: 10.1177/0954410014541632. [19] D. Meng, Y. She, W. Xu, W. Lu, and B. Liang. “Dynamic modeling and vibration characteris‐ tics analysis of lexible‐link and lexible‐joint space manipulator,” Multibody System Dynamics, vol. 43, 2018, pp. 321–347; doi: 10.1007/ s11044‐017‐9611‐6. [20] X. Liu, H. Li, J. Wang, and G. Cai. “Dynamics anal‐ ysis of lexible space robot with joint friction,” Aerospace Science and Technology, vol. 47, 2015, pp. 164–176; doi: 10.1016/j.ast.2015.09.030. [21] Z. Chen, Y. Zhang, and Z. Li. “Hybrid Control Scheme Consisting of Adaptive and Optimal Con‐ trollers for Flexible‐Base Flexible‐Joint Space Manipulator with Uncertain Parameters,” Proceedings of the 2017 9th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 2017; doi: 10.1109/IHMSC.2017.84. [22] A. Stol i, P. Gasbarri, and M. Sabatini. “A para‐ metric analysis of a controlled deployable space manipulator for capturing a non‐cooperative lexible satellite,” Acta Astronautica, vol. 148, 2018, pp. 317–326; doi: 10.1016/j.actaastro. 2018.04.028.
[12] K. Yoshida, and Y. Umetani. “Control of a space free– lying robot,” Proceedings of the 29𝑡ℎ IEEE Conference on Decision and Control, Honolulu, USA, 1990; doi: 10.1109/CDC.1990.203553.
[23] C. Toglia, M. Sabatini, P. Gasbarri, and G. B. Palmerini. “Optimal target grasping of a lexible space manipulator for a class of objectives,” Acta Astronautica, vol. 68(7‐8), 2011, pp. 1031–1041; doi: 10.1016/j.actaastro.2010.09.013.
[13] T. Rybus, K. Seweryn, and J. Z. Sa̧siadek. “Appli‐ cation of predictive control for manipulator mounted on a satellite,” Archives of Control Sciences, vol. 28, no. 1, 2018, pp. 105–118; doi: 10.24425/119079.
[24] R. Masoudi, and M. Mahzoon. “Maneuvering and Vibrations Control of a Free‐Floating Space Robot with Flexible Arms,” Journal of Dynamic Systems, Measurement and Control, vol. 133(5), 2011, 051001; doi: 10.1115/1.4004042.
[14] P. Palma, K. Seweryn, and T. Rybus. “Impedance control using selected compliant prismatic joint in a free‐ loating space manipulator,” Aerospace, vol. 9, no. 8, 2022, p. 406; doi: 10.3390/aerospace9080406.
[25] D. Shang, X. Li, M. Yin, and F. Li. “Tracking control strategy for space lexible manipulator consider‐ ing nonlinear friction torque based on adaptive fuzzy compensation sliding mode controller,” Advances in Space Research, In Press, 2020; doi: 10.1016/j.asr.2022.04.042.
[15] J. Z. Sa̧siadek. “Space robotics and its challenges,” Aerospace Robotics, Springer, 2013, pp. 1–8; doi: 10.1007/978‐3‐642‐34020‐8_1. [16] J. Qingxuan, Z. Xiaodong, S. Hanxu, and C. Ming. “Active control of space lexible–joint/ lexible– link manipulator,” Proceedings of the 2008 IEEE 86
VOLUME 17,
[26] K. Nanos, and E. Papadopoulos. “On the dynam‐ ics and control of lexible joint space manip‐ ulator,” Control Engineering Practice, vol. 45, 2015, pp. 230–243; doi: 10.1016/j.conengprac. 2015.06.009.
Journal of Automation, Mobile Robotics and Intelligent Systems
[27] M. Wojtunik, and K. Seweryn. “The in luence of the gear reduction ratio on the free– loating space manipulator’s dynamics,” Proceedings of the 18𝑡ℎ International Conference on Informatics in Control, Automation and Robotics (ICINCO 2021), 2021, pp. 282–289; doi: 10.5220/001055 6502820289. [28] H. Wang, and Y. Xie. “Prediction error based adaptive Jacobian tracking for free– loating space manipulators,” IEEE Transactions on Aerospace and Electronic Systems, vol. 48, no. 4, 2012, pp. 3207–3221; doi: 10.1109/TAES. 2012.6324694. [29] O. Ma, H. Dang, and K. Pham. “On–orbit identi‐ ication of inertia properties of spacecraft using a robotic arm,” Journal of Guidance, Control, and Dynamics, vol. 31, no. 6, 2008, pp. 1761–1771; doi: 10.2514/1.35188. [30] O.‐O. Christidi‐Loumpasefski, C. Ntinos, and E. Papadopoulos. “Analytical and experimental parameter estimation for free– loating space manipulator systems,” Proceedings of the 14𝑡ℎ Symposium on Advanced Space Technologies in Robotics and Automation (ASTRA ’17), Leiden, The Netherlands, 2017. [31] O.‐O. Christidi‐Loumpasefski, C. Ntinos, and E. Papadopoulos. “On parameter estimation of lexible space manipulator systems,” Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 2020; doi: 10.1109/IROS45743. 2020.9340768. [32] Goldstein H., Poole C., and Sa ko J., Classical Mechanics, Third Edition, Pearson: London, 2001. [33] F. Cavenago, A. M. Giordano, and M. Massari. “Contact force observer for space robots,” Proceedings of the 58𝑡ℎ Conference on Decision and Control, Nice, France, 2019; doi: 10.1109/ CDC40024.2019.9029285. [34] Schaub H., and Junkins J. L., Analytical mechanics of aerospace systems, AIAA: Reston, VA, 2002.
VOLUME 17,
N∘ 3
2023
[35] T. Rybus, M. Wojtunik, and F. L. Basmadji. “Opti‐ mal collision‐free path planning of a free‐ loating space robot using spline‐based trajectories,” Acta Astronautica, vol. 190, 2022, pp. 395–408; doi: 10.1016/j.actaastro.2021.10.012. [36] J. Oleś, J. Kindracki, T. Rybus, Ł. Mȩżyk, P. Paszkiewicz, R. Moczydłowski, T. Barciński, K. Seweryn, and P. Wolański. “A 2D microgravity test‐bed for the validation of space robot control algorithms,” Journal of Automation, Mobile Robotics & Intelligent Systems, vol. 11, no. 2, 2017, pp. 95–104; doi: 10.14313/JAMRIS_2‐ 2017/21. [37] F. L. Basmadji, G. Chmaj, T. Rybus, and K. Sew‐ eryn. “Microgravity testbed for the develop‐ ment of space robot control systems and the demonstration of orbital maneuvers,” Proceedings of SPIE: Photonics Applications in Astronomy, Communications, Industry and High–Energy Physics Experiments, Wilga, Poland, 2019; doi: 10.1117/12.2537981. [38] R. Moczydłowski. “Design of elastic element dedicated for space manipulator joint based on FEM topological optimization,” master’s the‐ sis (in Polish: “Projekt elementu podatnego pary kinematycznej manipulatora satelitarnego bazuj¹cy na optymalizacji topologicznej z wyko‐ rzystaniem oprogramowania MES”), Warsaw University of Technology, 2017. [39] Garnier H., and Wang L. Advances in industrial control: Identi ication of continuous – time models from sampled data, Springer, London, 2003. [40] G. Wood, and D. Kennedy. “Simulating Mechan‐ ical Systems in Simulink with SimMechanics,” Technical report, The MathWorks, Inc., Natick, USA, 2003. [41] J. C. Lagarias, J. A. Reeds, M. H. Wright and P. E. Wright. “Convergence properties of the Nelder–Mead simplex method in low dimen‐ sions,” SIAM Journal of Optimization, vol. 9, no. 1, 1998, pp. 112–147; doi: 10.1137/S105262349 6303470.
87