The life cycle of an AI&ML project by ISEG Executive Education

eBook

Applied Artificial Intelligence & Machine Learning The life cycle of an AI & ML project

1 | Business Problem The first step is to accurately define the problem and the project's objectives. Before collecting the correct data, we must be able to properly understand the problem under study. Through the analysis of the problem, we must also do a financial analysis using, for example, the previous year as a reference. These financial KPIs are ultimately used to carry out an analysis of the improvement introduced by the developed AI solution. In this analysis, all costs associated with the business problem must be accounted for, including the phase of operational integration.

2 | Data Collection The second step is to identify and collect the available data. Depending on the volume of data and the problem itself, business professionals often turn to their IT teams to promote access to these data sources and transfer them to the analytical platform used by their Data Science teams (e.g. Data Warehouse in the Cloud).

3 | Data Preparation and Exploratory Data Analysis The following stage is the data preparation and the exploratory analysis. Data Science teams typically begin data preparation and analysis on an AI platform (e.g. Databricks or Sagemaker) that has access to the data on the analytical platform (e.g. Data Warehouse in the Cloud). At this point the Data Scientist starts creating a 'data pipeline' where he gathers all the necessary 'tables' to create a final structure for the Machine Learning algorithms in question. During this process, the Data Scientist has to clean the data (by removing and handling missing observations, measurement errors, and outliers), transforming and classifying alphanumeric or non-numeric data into numeric data (continuous, nominal, binary, and ordinary). The Data Scientist also formats the data according to the desired structure, removing unnecessary variables, columns, etc. After the data is clean, structured, and organized, the Analyst can proceed to exploratory data analysis through statistical analysis (univariate and multivariate) and the identification of patterns and anomalies in the data. This phase is very important and can consume about 70 to 90% of the total development time of the Machine Learning solution, excluding the integration with the operational process.

4 | Modelling This phase contains 3 major sub-phases:

Development Consisting in choosing the most appropriate model for the data from among the various Machine Learning algorithms (supervised, unsupervised, semi-supervised, or reinforcement) depending on the type of problem under study (regression, classification, or clustering). After choosing the models, we can adjust their hyperparameters to achieve the desired performance. Finally, we proceed to the evaluation of the model in terms of its precision and its relevance to describe the problem under study.

Communication In a very interactive way, Communication allows business stakeholders to follow the evolution of the solution during its development, making it possible to understand the potencial results and suggest improvements in both data and models until an initial version that registers acceptable values for its operationalization is reached.

Deployment This is the sub-phase of the automation of the entire process, from accessing data, transforming it, creating and selecting models, and saving the models in a library to the respective scheduling of this end-to-end process. This stage results in the 're-training' process that allows the Machine Learning models to be updated with little human supervision, as long as the performance values of the new models are within certain parameters.

5 | Business Process Integration AI solutions only represent business value when they are integrated into an operational process that solves the initial business problem. This is the phase where this component is solved, ad it is constituted by two sub-components:

Scoring This process integrates the operational process in question with the models implemented in the previous phase. For example, imagine a system of real-time recommendations that suggests a product as soon as a customer logs into a portal. With the necessary data from the consumer, this sub-component will create a real-time stream and obtain a forecast from the last model implemented in phase 4 of which products to recommend to that specific customer.

Decisioning This sub-phase aims to maximize the predictions and use them to make decisions to optimize the process. Considering the previous example, imagine that the result of the model's forecast for this consumer was five products, but three of them had already been shown to the customer in the last two weeks. Through business rules on Machine Learning forecasts, we can only show forecasts that have not been presented to the client to maximize their success. This phase can be the most time-consuming in a Machine Learning project and greatly depends on the architecture and systems associated with the problem we are solving. At a minimum, we must account for six to eight weeks for integration with a given business process, followed by the necessary training for stakeholders to maximize the implemented solution.

About the Authors Jorge Caiado Jorge Caiado holds a Ph.D. in Applied Mathematics for Economics and Management. He is a Professor of Data Analysis in Finance, Applied Statistics, Econometrics and Forecasting Methods at the Lisbon School of Economics and Management (ISEG) and a Researcher at the Centre for Applied Mathematics and Economics. He was a visiting researcher in the Department of Statistics at University Carlos III in Madrid (Spain) and Invited Assistant Professor at ISEGI Nova Lisbon School. His research in econometrics, finance, time series analysis, forecasting methods and statistical software has led to numerous publications in scientific journals and books. He serves as an econometric and statistical consultant and trainer for numerous companies and organizations including central banks, commercial and investment banks, bureau of statistics, bureau of economic analysis, transportation and logistics companies, health companies and insurance companies. He is also a co-founder and partner of GlobalSolver – a deep tech company that works with AI, Machine Learning & Big Data.

Eliano Marques Eliano Marques holds a MSc in Applied Econometrics and Forecast from ISEG Lisbon School of Economics and Management at the University of Lisbon and currently is the Executive Vice President of Data & AI at Protegrity – a company that specializes in Data Privacy and Security. Eliano joined Protegrity from Emirates, where he was VP for Data Science and before holding different global AI roles at Think Big Analytics, Teradata and Deloitte UK. He is a Data Science executive with experience in establishing AI, Data & Analytics capabilities across enterprises in many industries and geographies across the globe. He remains a technology leader, able to build AI solutions from scratch side-by-side Data Scientists and Engineers. He has designed and built large scale real-time AI solutions that leverage Machine/Deep Learning on its core, from the lab to production in all major clouds.

Starting September 17th, 2021 A great opportunity to understand the theory and practice on how machine learning can be a solution for business transformation new challenges.

With the coordination of

Discover more Jorge Caiado

Eliano Marques

Programme Advisor

Marta Vieira

marta.vieira@isegexecutive.education (+351) 962 682 202