EDUFINANCE Algorithmic Scoring Tool (EduFAST)
CONTENTS Introduction ............................................................................................................................................ 3 Financial Institution Profitability ............................................................................................ 3 What is the impact from PD? .................................................................................................. 4 Traditional Credit Scoring........................................................................................................ 5 A New Way to Think about Scorecards: Credit Algorithms ........................................... 6 Benefits of Machine Learning Scoring ................................................................................. 7 Application to Education Finance ......................................................................................... 8 Brief on How to Use the Opportunity EduFinance Algorithm ........................................ 8 Planning the Implementation in New Markets .................................................................. 12 Why Opportunity EduFinance .............................................................................................. 13 Opportunity EduFinance Impact to Date ........................................................................... 14 Appendix 1: The Machine Learning Model ........................................................................................... 15 Introduction to Terminology and concepts....................................................................... 15 Appendix 2: Overview of Web Application & Design ........................................................................... 24 Deployment................................................................................................................................ 27 Appendix 3: Features of EduFinance Web Application......................................................................... 29 Work Flow .................................................................................................................................. 29 Appendix 4: Using the EduFinance Web Application............................................................................ 41 References ......................................................................................................................................... 48
2
Introduction Affordable private schools are currently attended by 228 million children across Africa, South Asia and Latin America. The sector is rapidly expanding in these markets and 48 million new children are expected to enrol by 2023. The current potential market is estimated by Opportunity EduFinance to be at least $24 billion in outstanding loan value (EduFinance, 2018). Growth will continue to generate significant demand for small and medium-sized enterprise (SME) financing of infrastructure projects and small school fees loans for parents with irregular incomes who have traditionally been excluded from financial services. For financial institutions (FI), this demand growth presents a significant market opportunity for those willing to invest in the sector, but the perception of risk causes many FIs to constrain their portfolio allocation for education finance products. Accurate and efficient assessment of the riskiness of borrowers can help to improve financial institution (FI) sustainability while mitigating risk, enabling greater access to financial services at more affordable interest rates. Figure 1 Enrolment in Private Education
Opportunity EduFinance has developed a machine learning credit algorithm that seeks to improve accuracy and efficiency of loan assessment for three main products: School Fee Loans (Group lending), School Fee Loans (Individual assessment) and School Improvement (SME) Loans. The credit algorithm described here is relevant for FIs lending to low cost affordable private schools and parents of students attending these schools. The first iteration of this algorithm was implemented with five (5) FIs in Uganda, supported by funding from the Bill and Melinda Gates Foundation. Opportunity EduFinance intends to make the platform accessible to financial institutions that want to implement the algorithm in their respective markets, utilizing the same platform, but calibrating the models to individual markets. This paper describes the overall process, benefits and requirements for a financial institution to implement the machine learning credit algorithm. The descriptions here are meant to help FIs to become more accurate and efficient in providing financial services to their Education clients. The final section of the document details the features of the web-based application that is used to access the models.
Financial Institution Profitability FIs generate profits through financial intermediation, maturity transformation and taking credit risk. The ability to make better, faster credit decisions is a major lever towards improving profitability. If FIs
3
can make better credit decisions more efficiently, they can simultaneously increase productivity while reducing risk, meaning higher revenue generation and lower losses. Figure 2 FI Profitability
Revenues are generated through the disbursement of loans on both a fee and interest rate basis. Greater loan volumes result in higher interest income and fee income. FIs expect to take losses on some of the loans that are disbursed, when a customer fails to repay their loan. It is usually not desirable to eliminate all loan losses because an overly cautious approach limits the potential for revenue generation. So FIs take calculated risks that customers will not be willing or able to repay their loan. This is expressed by risk managers in the industry as Expected Loss (EL). EL is a function of Probability of Default (PD), Loss Given Default (LGD) and Exposure at Default (EAD). LGD and EAD are mitigated by the quality of collateral and management of disbursements. PD is driven by how good the financial institution is at deciding whether or not to extend credit to the customer. The primary focus of Opportunity EduFinance’s credit algorithm is the minimisation of PD, while maximising revenues. Figure 3 Expected Loss Calculation
What is the impact from PD? Take the below example of a $100m loan portfolio that has an EL of $5m annually. A 2-percentage point reduction in the portfolio at risk (PAR) of the portfolio (from 5% to 3%) would result in nearly $2m
4
worth of savings. Assuming the bank holds a 20% equity capital base, the $2m in savings is equivalent to a 10% increase in equity (ie. the bank would move from a 20% capital ratio to 22%). Figure 4 Improved Profitability from Lower Loan Losses
Traditional Credit Scoring Traditionally, banks and other FIs have used Credit Scorecards in order to improve loan losses, probability of default and productivity. This helps them to grow portfolios by lowering the cost of serving their clients, increasing the quality of services provided and increasing customer satisfaction. A historically strong focus on cash flow and affordability have helped mitigate risk both to the client and FI. A credit scoring model is a risk management tool that assesses a potential borrower’s credit worthiness. They calculate a predicted probability of default, using a variety of data points that are based on the potential client and their historical data. Scorecards have the benefit of being intuitive, easy to use and they are endorsed by top-tier financial institutions world-wide. For example, scorecards feature in the curriculum of the Global Association of Risk Managers’ (GARP) Financial Risk Management (FRM) qualification examinations (Crouhy, 2014). Credit scoring models can take into account both subjective variables as well as quantitatively estimated variables. They typically will assign a specific value/ weight to each of the variables, summarising the client’s total score in a single number or score. The figure below shows an extract of one scoring model.
5
Figure 5 Typical Credit Scorecard
Despite the comfort that scorecards give in risk mitigation, there are drawbacks. First, they rely on expert opinion and are therefore partly driven by instinct and a perception of what has worked in the past. Many scorecards look at other factors, such as Education Level or Marital Status, but they assign a static weighting to each factor. That static weighting becomes obvious and can be subjective as well although most scorecards try to be as objective as possible. As a result, scorecards can be gamed by savvy loan officers when they are able to see the impact of a variable on the score.
A New Way to Think about Scorecards: Credit Algorithms In recent years, firms’ ability to collect and store large amounts of data has continued to grow exponentially. A 2017 report by IBM stated that 90% of the world’s data was created in the past two years (IBM, 2017). Rather than rely only on expert opinions and experience, banks are now able to supplement their expertise with data that they have on customer behaviour. Leveraging the data that is already being collected by banks, machine learning techniques can be incorporated into the lending process. Machine learning, described in detail in the Appendix, uses statistical/ mathematical models that are trained to predict an outcome based on historic data. These models work without using explicit instructions, relying on patterns and inference instead. A good model requires sufficient data to get started. This includes internal data, both client and transactional data. Typically several hundred cases, including as many defaulted or non-performing loans as possible, will make for a high quality model. A good model also includes external data, where possible, such as credit bureau information, credit scores, if available in the market. This is especially helpful data for the evaluation of new clients.
An algorithm can be described as a defined sequence of computational steps that takes an input value or series of input values and transforms them into an output. Algorithms can vary in their complexity from those used to solve simple, well-defined problems to those used to solve complex, illdefined problems.
6
Machine learning refers to a set of methods that can automatically detect patterns in data and use those patterns to predict future data or perform decision making of some form under uncertainty. Advances in computing technology have made it possible for machine learning tools to complete thousands of iterations in a relatively short amount of time, enabling rapid decisions while factoring large pools of data.
Benefits of Machine Learning Scoring The use of a machine learning scoring tool offers significant benefits for FIs, some of which come immediately, while other benefits take some time to achieve as the data quality improves. Using the machine learning algorithm, in comparison to a subjective scoring tool offers the benefit of building up a unique database of customer behaviour. Elements of the subjective scoring model can be maintained, but it enables credit analysts and managers the ability to focus times on difficult accounts or decisions. The use of a machine learning algorithm has potential to accrue benefits across the FI using it. The realisation of these benefits is largely dependant on the institutionalisation of the algorithm as a standard for all EduFinance loan applications. But these benefits can be classified into the below five categories: Reduced Risk, Increased Efficiency, Increased Sales, Increased Speed, Increased Consistency. Over time, as more data gets loaded into the system, the ability to rely on scores will improve. Figure 6 Benefits from Automation •
Reduce bad debt
•
•
Reduced exposure to high risk accounts
Analysts focus on difficult accounts
•
Increased volume of accounts with same staff
Increase Efficiency
Reduce Risk
Increase Sales
Increase Consistency •
Consistent, objective decisions
•
Equal, objective treatment of applicants
Increase Speed
•
Quickly handle obvious approvals/ declines
•
Less data required to make accurate decisions
•
Target credit-worthy customers
•
Increase approval rates
Source: Dun & Bradstreet
7
Application to Education Finance SME School Improvement Loans (SIL) are loans to low-cost private schools, which increases enrollment and improves educational outcomes by helping proprietors access credit to add classrooms, teachers, WASH facilities, transportation and computer labs; School Fee Loans (SFL) are loans to low-income parents and caregivers, which helps their children attend school and prevents them from dropping out when household cash flow does not line up with timing of school fee payments, such as in farming communities. Applying machine learning techniques to these Education Finance products means that financial institutions are able to develop larger, more sustainable portfolios that can drive social impact. They are able to allocate more of their capital to this portfolio as it becomes more and more of a driver of bank profitability. The first step to achieving rapid expansion of an EduFinance portfolio is to systematize credit models for Education Finance. Opportunity EduFinance has experience building credit models for SME School Improvement loans and parent-focused School Fee loans. The platform and initial models were first developed for the Ugandan market, where a significant amount of Opportunity’s data on Education Finance lending was available. These models have now been validated and the platform is ready for expansion to other markets. However, each new market will require additional data as there is market specific information that needs to be captured in order to develop an effective algorithm.
Brief on How to Use the Opportunity EduFinance Algorithm The credit algorithm is available for three EduFinance products: •
School Fees Loans (Group Lending)
•
School Fees Loans (Individual Assessment)
•
School Improvement Loans (SME Lending)
The algorithm is hosted on a secure, web-based application and is accessible only to users within that FI and Opportunity EduFinance through a standard internet application. It can be used on tablets or smart phones and is responsive (the size adjusts according to the device being used). A standard SFL application form takes between two and five minutes to fill in, provided all documentation is readily available. The SIL application takes longer, up to 15 minutes, as there are a greater amount of fields. Most SIL are lower volume, longer duration and greater value. After all of the data required (see Results) has been submitted in the application, the algorithm provides a score between 0 and 1 for a given loan application. Over time, the FI should develop a predetermined threshold for which an application’s score should result in an automatic decline or review.
8
Figure 7 Machine Learning Thresholds
For an FI with existing historical data on School Improvement and/or School Fee Loans, when the FI first begins using the algorithm, all lending processes initially can remain unchanged. The score should be used initially as an additional datapoint in the evaluation of a loan during this pilot phase. Historical data can be accessed by users with Admin credentials and can be exported to a .csv file for analysis. It may also be the case that the FI is just beginning to make EduFinance School Improvement or School Fee Loans. In this case, the data for each loan application can be logged, stored in the database and then tracked over time. After sufficient loans have been made and there are several cases of defaults, the algorithm can then be calibrated to the FI’s specific historical data. Whether or not an FI has an existing product, over time new data and recalibration will enable the algorithm to offer increasingly realistic indications of risk, resulting in better, faster credit decisions. Collecting data through an economic cycle allows it to respond to macroeconomic conditions. The overall process can take years, but the earlier it is embedded into bank lending processes, the better the scoring will be.
9
Figure 8 How Financial Institution Benefits from the EduFinance Credit Algorithm
Immediate Benefits •
•
•
•
Who benefits most?
Can be used as a risk indicator (in combination with other risk indicators defined by the bank) or as credit score (automated credit decision).
•
Credit
Secure system on which to manage and track key loan application data and other metrics, including loan approval rates, disbursement rates, customer behaviour.
•
Managers
Performance management – Know how many loan applications relationship officers have completed/ prospected, monitor quality of individual loan officers, spot check for data consistencies.
•
Managers
More fair outcomes for clients – potential to offer beneficial rates to less risky clients and less risky clients are more certain to get a loan
•
Clients
Long Term Benefits
Who benefits most?
•
Streamlined lending processes – enable loan officers to know exactly what data they need to collect for paper applications or digitize if desired
•
Loan officers
•
Credit team
Faster approval and disbursement – build transaction volumes and process, improve turn-around-time and gain market share.
•
Loan officers
•
Credit team
•
Whole bank
•
•
Build unique database of customer behavior, that on a long run, feed into developing the other business segments with higher return.
10
Evaluating the Model When evaluating a model, it is important to understand the proportion of correct decisions that the model makes. There are two types of errors in prediction and a better model has a lower incidence of these two errors. The first error, called a “Type I Error” occurs when a model predicts high performance, but in reality, it is a defaulting loan. This is called a “False Positive”. A “Type II Error” occurs when a model predicts that a loan would be high risk/ low quality, and in fact it would have been high quality (also known as a “False Negative”). A confusion matrix is used to evaluate the model, comparing predictions to outcomes. Figure 10 Confusion matrix highlights the trade-off of predictions
Example Confusion Matrix
Figure 9 Best ROC curves reach 1 on Y axis quickly
Area under the curve (ROC)
Actual Performance
Predicted Performance
Low Credit Quality
High Credit Quality
Low Credit Quality
True negative
False negative
(Correct Prediction)
(Incorrect Prediction)
46%
6%
High Credit Quality
False positive
True positive
(Incorrect Prediction)
(Correct Prediction)
4%
44%
SIL = 0.83 = “Good” SFL = 0.71 = “Fair”
Graphically, the confusion matrix can be represented in a Receiving Operating Characteristic (ROC) curve (Figure 9), which plot the True Positive Rate (percentage of bad loans rejected) versus the False Positive Rate (percentage of good loans rejected). The closer this model gets to 1.0, the better it is. The model’s performance can also be represented using the Distribution of Actual vs Predicted results, as per Figure 11. In this case, the ideal model would show the least amount of overlap between the predicted outcomes of Actual Goods and Actual Bads.
11
Figure 11 Actual vs Predicted defaults
Example Distribution of outcomes vs. Prediction 250 Actual Good
200
Actual Bad
150 100 50
5% 10 % 15 % 20 % 25 % 30 % 35 % 40 % 45 % 50 % 55 % 60 % 65 % 70 % 75 % 80 % 85 % 90 % 95 % 10 0%
0
% Predicted Default
Planning the Implementation in New Markets In order to get started, the first step is Data Collection. The amount of time that this work will take for a given FI depends on the data that they have readily available in the core banking system. Following data collection, the EduFinance team calibrates a model with appropriate modifications for the new market. The model will then be made available to the FI on a secure platform, accessible only to the FI and EduFinance for testing and for connections to the core banking system if desired (requiring some resources and expertise from the FI’s technology department). Finally, the product can be launched for use and EduFinance will assist the FI in monitoring performance, if so required. Otherwise, the FI is free to manage and monitor performance on its own. Figure 12 Implementation Overview
Step 1: Data Collection Who
Step 3: Train Staff
Step 4: Use the Algorithm
FINANCIAL INSTITUTION
SFL: Branch Loan Officer
• Bank Staff
• EduFinance Team
SIL: Credit Team, SME Officer
• EduFinance Team
• Head Office Staff
• EduFinance to work with team run initial testing
• Convene staff for training (or train of trainer) • Review data requirements • Practice submission
1 month
1 week
EduFinance Team • Collect application raw data (paper and banking system)
What
Step 2: Initial Testing
• Calibrate initial model • Launch on bank specific platform • Ensure that data matches defined naming conventions
When
1-3 months
• Branch Staff
• Staff use algorithm for credit scoring • Algorithm provides an extra data point for credit approval
The following activities are recommended for monitoring and tracking performance. This includes a measurement of performance for both loans made using the algorithm and those made without it. Generally, the analysis includes a comparison of scores for loans that were disbursed to loans that were not disbursed.
12
Figure 13 Performance Measurement Activities to be completed
Metrics for Monitoring
1. Registration of users: EduFinance can conduct a training of branch staff or training of trainer • Introduce the web application 2. Collect scores on new business: EduFinance loan applications through the algorithm (rejected applications important or loans that go bad). 3. Evaluation of results: Is the score correctly identifying quality loan applications? 4. Monitoring of applications: Is the data consistent with paper applications being submitted? Loan officers should know that this will be monitored to prevent score engineering. 5. Recalibrate the model (6 months – 1 year later): Additional data will enable EduFinance to refine the algorithm and continually improve.
Why Opportunity EduFinance Opportunity EduFinance’s goal is to serve as an industry catalyst for increasing access to affordable education by offering a package of capital and expertise to financial institutions that results in both financial and social return. Throughout Opportunity International’s 48 years of providing financial and non-financial services to those living in poverty, and consistently listening to what they seek to accomplish, one clear goal has never changed through the decades: parents work hard to give their children an education. Across continents, parents identify their biggest dreams as seeing their children finish school. They know that investing in a child’s education is a decisive way to loosen the grip of poverty. This is why Opportunity International developed Education Finance products to address some of these barriers that families face. Data proves what our clients inherently believe: •
An extra year of primary school boosts a girl’s eventual wages by 10-20%.1
•
Tertiary education boosts earnings by 18% on average.2
•
A single year of formal education for a child can add 0.6 years to general life expectancy. As a result, it is estimated that a 12-year education will increase a child’s life expectancy by 7.2 years.3
•
Studies have also established links between limited education opportunities, low quality education and child marriage.4
Educating a child not only benefits her, but also has far-reaching intergenerational effects. Children of women who finish primary school are 40% less likely to die before age five.5 Better-educated women 1
George Psacharopoulos and Harry Anthony Patrinos, “Returns to Investment in Education: A Further Update,” Policy Research Working Paper 2881[Washington, D.C.: World Bank, 2002]. 2 http://www.educationcounts.govt.nz/indicators/main/education-and-learning-outcomes/1919 3 The National Bureau of Economic Research, The Effects of Education of Health (2015). http://www.nber.org/digest/mar07/w12352.html 4 Nguyen and Wodon (2012). Child Marriage and Education: A Major Challenge. 5 UNESCO (2009) Education for All Global Monitoring Report.
13
are more likely to have fewer children and send their children to school.6 Young people with an education acquire the knowledge and skills to catalyze sustainable development in impoverished areas to the benefit of future generations. Unfortunately, despite the high value and benefits of education, 263 million children worldwide of primary and lower secondary school age are out of school.7 In 2008, Opportunity began offering innovative financial products and services, coupled with training, designed to increase access to an improved education. Opportunity EduFinance’s goal is to get more children in school and help keep them in school, as well as to help expand and improve the accessibility and quality and of local affordable schools.
Opportunity EduFinance Impact to Date In total, Opportunity EduFinance has positively impacted the education of more than 4.1 million children and young people through the provision of technical assistance to over 40 financial institutions working across 20 Global South countries. Financial institution partners have made over 11,000 School Improvement Loans, 304,000 School Fee Loans, and 12,000 tertiary loans, all totalling more than $240 million. Figure 14 EduFinance Impact to Date
This experience has given Opportunity EduFinance a database of market, product design and credit quality information that can be mined to construct effective credit models. Through these models, the aim is to reduce risk, increase success and growth of these financial institutions and – ultimately and most importantly – increase educational access low-resource communities. For further information on Opportunity EduFinance, please visit https://edufinance.org.
6
Majgaard and Mingat (2012). UNESCO Institute for Statistics Database.
7
14
Figure 21 Variable Names – School Fee Loans
Variable name
Variable description
modtrm
Whether loan term (duration) was modified
resident
Whether the customer is a resident of the country
intfreq
Frequency of Interest
collateral
Value of collateral offered to secure the loan
accruedint
The accrued interest on the loan
instalmentamt
The amount payable at each instalment of the loan
approvedamt
The maximum that can be borrowed for this loan (approved)
home_national
Whether or not the customer is a national of the country
id_type
The type of identification document provided e.g. passport
savings
The total amount of savings recorded for the customer
disbursedamt
The total disbursed amount of the loan
lddate
Date of data collection (likely related to the year of data collection)
ldamount
Loan amount
urban
Whether or not the customer resides in an urban area
stakeholder
Whether the customer is a bank stakeholder
dob
Date of birth
gender
Male, female or other
matdate
The date at which the loan matures
relationship_status
Married, divorced, widowed, single, etc.
employment_status
Waged employees, self-employed, unpaid family worker etc
industry
Industry of employment
sector
Sector of employment
accommodation_type
Type of dwelling e.g. house, apartment, farm
valdate
Disbursement date of loan
totaloverdueamt
The total overdue amount of the loan to date
income
The income band of the customer
branch
Bank branch ID, likely related to location
Table displaying variable names from the SFL dataset along with their description.
47
References Crouhy, Michel, Dan Galai, and Robert Mark. 2016. “The Essentials of Risk Management.” 2nd Edition (New York, NY: McGraw-Hill). Dun & Bradstreet. 2019. Understanding Credit Scorecards. https://www.dnb.co.uk/content/dam/english/business-trends/business-credit-scorecard-ebook-uk.pdf Fernandez Vidal, Maria, and Fernando Barbon. 2019. “Credit Scoring in Financial Inclusion.” Technical Guide. Washington D.C.: CGAP. International Business Machines (IBM). 2016. “What Is Big Data?” https://public.dhe.ibm.com/common/ssi/ecm/wr/en/wrl12345usen/watson-customer-engagementwatson-marketing-wr-other-papers-and-reports-wrl12345usen-20170719.pdf. Moffatt, Peter G. 2005. “Hurdle models of loan default.” Journal of the operational research society 56.9, pp. 1063–1071. Myers, Leann and Maria J Sirois. 2014. “Spearman correlation coefficients, differences between.” Encyclopedia of statistical sciences 12. Opportunity EduFinance. 2018. “The $24 billion Opportunity.” https://edufinance.org/latest/publications/research-and-learning/the-24-billion-opportunity/ Schreiner, Mark. 2003. “Scoring: The Next Breakthrough in Microcredit?” Occasional Paper 7. Washington, D.C.: CGAP. https://www.cgap.org/research/publication/scoring-next-breakthroughmicrocredit. Shen, Hua and Adrian Ziderman. 2009. “Student loans repayment and recovery: international comparisons”. Higher education 57.3, pp. 315–333. Viola, Paul and William M Wells III. 1997. “Alignment by maximization of mutual information.” International journal of computer vision 24.2, pp. 137–154. UNESCO Institute of Statistics. 2018. “One in Five Children, Adolescents, and Youth is Out of School. Fact Sheet No. 48.” Available at: http://uis.unesco.org/sites/default/files/documents/fs48-one-fivechildren-adolescents- youth-out-school-2018-en.pdf
48