Cover - RIT Student Feature
Back to Table of Contents
From Film to Computer Vision: Imaging in the Age of AI By Bruno Artacho, PhD Candidate at RIT, and Dr. Andreas Savakis, RIT Professor and Center Director
The City of Rochester has a tradition of significantly contributing to engineering innovations. The momentum created by Rochester’s engineers, scientists and business leaders helped place the city in a prominent position during the 20th Century. The city is the birthplace of leading technologies in industries ranging from imaging to education and healthcare. Continuing on the path of engineering innovations, the 21st Century brought the advent of Artificial Intelligence (AI), fueled by innovations in Deep Learning. Deep Learning is a data-driven approach to AI, where a computer learns to perform advanced tasks with high competency by learning from large datasets. Deep Learning deploys neural networks with many layers, i.e. deep architectures composed of brain-inspired artificial neurons, that learn from examples how to perform tasks such as classification of images, text, or sounds. These techniques are used more and more frequently on smartphones, voice assistants, and autonomous vehicles, achieving results that often match or surpass human performance. Once again, Rochester’s spirit for innovation helps position the region at the forefront of contributing to the development of deep learning methods. Rochester Institute of Technology (RIT) has been an important contributor by fueling innovation, educating young engineers, and fostering growth through collaboration with local industry. One research lab at RIT is the Vision and Image Processing Laboratory (VIP-lab), directed by Professor Andreas Savakis, which focuses on developing adaptable, robust and efficient computer vision methods with state-ofthe-art performance. Bruno Artacho is a doctoral 14 | The ROCHESTER ENGINEER MARCH 2022
candidate in the RIT’s Electrical and Computer Engineering Ph.D. program who conducts his research in the VIP-lab. The field of computer vision deals with the development of algorithms and deep learning models that extract information from images in order to perform various tasks of interest, such as object detection and tracking, face recognition, and scene analysis. The VIP-lab team of graduate students, under the direction of Prof. Savakis, have been working on several computer vision tasks, including: human pose estimation (humancomputer interfaces and health monitoring), visual object tracking (autonomous navigation and traffic monitoring), analyzing changes in satellite imagery (e.g. pre and post natural disasters), and segmenting the outline of objects in an image (self-driving vehicles). These promising works have attracted funding from and partnerships with the Air Force Research Laboratories, the National Science Foundation and various industrial partners. Two tasks in the computer vision field where Bruno Artacho has focused during his doctoral research are semantic segmentation and human pose estimation. Semantic segmentation aims to extract meaningful information from each location in an image, by labelling each image pixel by a known semantic category, e.g. person, car, stop sign, etc. Applications of semantic segmentation include self-driving vehicles, automatic focus, and foreground/background detection for video calls. The task of human pose estimation focuses on detecting the human body joints in images, by extracting the human body postures under diverse conditions, enabling a multitude of applications including sports training, health monitoring and cover article