Columbia Economic Review
Vol. IX No. II Spring 2018
2
Spring 2018
Columbia Economic Review Columbia Economics Review (CER) aims to promote discourse and research at the intersection of economics, business, politics, and society by publishing a rigorous selection of student essays, opinions, and research papers. CER also holds the Columbia Economics Forum, a speaker series established to promote dialogue and encourage deeper insights into economic issues.
2017-2018 Editorial Board
2017-2018 EDITORIAL BOARD EDITOR-IN-CHIEF
Alan Lin
JOURNAL SENIOR EDITORS
PUBLISHER
Ben Titlebaum
MANAGING EDITORS
Alex Whitman DESIGN DIRECTOR
Jessica Lu
Jessica Chu Bai Michael Shambhavi Tiwari Neel Puri Mitchell Zhang Spencer Papay Manuel Perez Archila LAYOUT EDITOR
Elise Gout
STAFF EDITORS
Daniel Jack JunYeol HannahKim Douglas Brian DeLong Ho Kelly Butler Elizabeth Li Michael Rajiv Allen Ram Chu Neel Puri William Sammartino Rishi Shah Kirk Wu Spencer Papay
CONTRIBUTING ARTISTS
Stacy Tao Amanda Ba Sirena Khanna Lipa Schmelczer
ONLINE WEBMASTER
EXECUTIVE EDITOR
Frank Zhu
Mathieu Sabbagh
CONTRIBUTORS
Emmanuel Akinbobola Lance Jubel Hallie Gruder
Yi Jun Lim Elise Parrish Adam Mann
Roxanne Farhad Yifan Shi Philip Jang
HEC MEMBEFRS
OPERATIONS EXECUTIVE DIRECTOR
Makenzie Nohr TREASURER
Michelle Yan
SENIOR ADVISERS
CEC MEMBERS
EPC MEMBERS
Bryan Li Randy Zhong Zoey Chopra
Alex Shek Zachary Rotman Maria Thompson
Yael Cohen Blaine Helloloid
For a complete list of papers cited by our authors and a full version of all editorials, please visit our website at columbiaeconreview.com Columbia Economic Review would like to thank the Columbia Economics Department for their generous support of the publication.
Jenna Karp Derek Zhang
Opinions expressed herein do not necessarily reflect the views of Columbia University or Columbia Economic Review, its staff, sponsors, or affiliates.
We welcome your comments. To send a letter to the editor, please email: econreview@columbia.edu We reserve the right to edit and condense all letters.
Columbia Economics | Program for Economic Research
Licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
Printed with generous support from the Columbia University Program for Economic Research Columbia Economics Review
Spring 2018
3
Contents 4
Letter from the Editors
5
Submission Guidelines
5
The Effects of Democracy on Development during Structural Adjustment
14
Does Collusion among Full-Service Airlines Affect the Entry Decision of Low-Cost Carriers?
24
The Impact of an Urban Neighborhood’s Demographic Characteristics on Renters’ Housing Unaffordability
30
The Link Between Immigration and Trade: Evidence from South Korea
38
Challenges for Decentralization and Federalism: Scaling up an Educational Policy for 184 municipalities in Brazil
50
The Wage Gap Starts at the Kitchen Table Columbia Economics Review
4
Spring 2018
Letter from the Editors Dear Readers, As we approach the end of the 2017-18 academic year, we are pleased to present our Spring 2018 issue. We are halfway through the year 2018 and popular issues continue to revolve around economic and social unrest in the face of ineffective policy missions. We have seen a number of trends develop, ranging from increased consolidation in the commercial sphere to gentrification and transformations in housing options. These trends are indicative of the broad range of socioeconomic changes that are currently affecting the world. Equipped with a vast array of economic and mathematical tools, we, as economists, are capable of guiding our society toward the policy objectives that maximize welfare for the largest possibly portion of the population. In this day and age, citizens of the world must focus on how to optimize outcomes, whether through democracy, capitalism, or some combination thereof. In seeking the truth about markets and observable outcomes, we are better able to understand how to make our societies function at the highest level. And, in doing so, we are able to bring education, fair competition, and equitable employment to all subsets of the population. The Columbia Economics Review has continually received a plethora of submissions from throughout the United States and the entire world. This semester was no exception and we are excited to feature five papers and one web editorial that help us economists address the most pertinent issue in our world: how to make policy work for the society. These pieces approach this issue from a variety of angles, but all are fraught with thorough and deliberate attention to economic analysis. They address the most widely publicized socioeconomic issues in the world, including market organization, migration, and education. In publishing this journal issue, we hope to demonstrate that, though inefficiencies may be widespread and frustrating, there is strong reason to believe that the next generation of economic thinkers will bring economic efficiency to the real world. Thus far in 2018, people all over the world have called into question a number of issues with enormous economic ramifications. Liam Masterson (pp. 5-13) calls into question the very basis of American economic theory, which generally assumes that democracy, coupled with capitalism, will produce the most efficient outcomes. Masterson finds that democratic regimes in developing countries re more likely to experience increase living standards after receiving aid. Luigi Caloi also takes a developmental skew while seeking to address the problem of scaling in Brazilian educational development programs. Through his analysis () of the “Literacy in the Right Age Program” across 184 Brazilian municipalities, Caloi provides insight into the factors that truly make such developmental programs work for the people. With a stable government and widespread education, we believe that societies can develop effective policy initiatives to maximize socioeconomic welfare. The additional papers in this issue seeks to address inefficiencies in a range of more developed markets. Tianyi Li analyzes (pp. 14-23) one of the most historically inefficient markets in the world: airlines. By applying a 2002 framework to contemporary data, Li effectively addresses the impact of airlines collusion on consumers and, more specifically, on smaller firms and their entry options. In keeping with the trend of market inequity, Celene Chen analyzes (pp. 24-29) a problem very close to our experience in New York City: housing unaffordability. Chen provides readers with an understanding of the trends associated with increasing housing costs and legislators’ subsequent policy decisions. In an age of continued urbanization and increase gentrification, Chen’s research is incredibly relevant. May Lyn Cheah uses a South Korean case study to address the confluence of immigration and lateral trade flows. She finds that, amongst other effects immigrants from low-income countries have a more significant economic effect on the receiving country than immigrants from high-income countries. Furthermore, Cheah explains that, in the case of South Korea, immigrants have a greater impact on non-OECD than OECD countries. In light of recent American political sentiments, Cheah’s research is both concerting and enlightening. As we approach summer recess, we at The Columbia Economic Review are optimistic that continued research in the undergraduate economic sciences will shed light on the inequities of the world. The research in our Spring 2018 issue shows us that, though inefficiency is severe and widespread, effective policy initiatives are within grasp. We believe that our dedication to economic analysis will continue to connect undergraduate researchers, readers, and policy makers in a united front to address economic inefficiency and socioeconomic inequality. Cheers, Michael Crapotta CC’19 | Managing Editor Alex Whitman CC'19 | Managing Editor Alan Lin CC'18 | Editor-in-Chief Benjamin Titlebaum CC'19 | Publisher
Columbia Economics Review
Spring 2018
5
Submission Guidelines General Information
The Columbia Economic Review is the biannually circulated undergraduate economics journal of the Department of Economics at Columbia University. We are looking for seminar papers, research papers, theses, etc., on all fields related to economics. Our goal is to give undergraduates at Columbia and other universities throughout the world the opportunity to publish their research in a premier undergraduate academic journal. Papers should be emailed to econreview@columbia.edu. They can also be submitted online here: columbiaeconreview.com/submit/. Please read our submission guidelines thoroughly before submitting a paper. We will not review your paper if requirements are not met. You will be contacted to re-submit under the requirements.
Eligibility & Guidelines
Paper submissions must be from current undergraduate students or recent graduates. The submissions should meet the following requirements: 1. The content of the paper (not including the bibliography and extra data tables) must not exceed 40 pages. It is the author’s responsibility to trim down their work prior to submitting it. 2. Include the author’s name and university, acknowledgements (if relevant), and image files of all graphics or tables used in the paper. All images should be included in the paper and separate files (e.g jpeg) should be submitted alongside the paper. 3. Any spreadsheets used should also have relevant data linked. 4. Not have already been published in other journals. By submitting, you give CER the sole right to publish the paper and make any edits that we see fit. Please do not submit to other academic journals. 5. All manuscripts should be submitted in PDF format with 1.5 line spacing. We strongly recommend manuscripts not exceed 40 pages (not including the bibliography and extra data tables). The suggested length includes reference lists, figures, and tables. Submit the .tex file if LaTeX was used). It is the author’s responsibility to condense the thesis prior to submitting the documents. 6. Please use 12-point Times New Roman or similar font. Margins should be 1.5 inches on the top, bottom, and sides. 7. Include an abstract of 100 or fewer words
Deadlines
Submissions are reviewed on a rolling basis. We encourage you to submit your work as soon as you have turned in the final version to your department.
Policies
Any paper submitted to The Columbia Economic Review must NOT be under consideration for publication by any other journal. All submitted papers must represent original work and give proper credit to the work of other authors, as it pertains to the submitted paper. Plagiarism is unacceptable. If, upon review, the submitted paper is deemed to have an unacceptable amount of plagiarized material, the paper will no longer be eligible for consideration. Thanks to the generous support of the Department of Economics at Columbia University, there is no submission fee for papers. We reserve the right to review or reject manuscripts that have been previously rejected by CER. Furthermore, CER reserves the right to reject papers without review.
Selection Process and Publication
CER has a team of senior editors and editors who are selected for their academic accomplishments, superior writing skills and research experience. All submissions will be reviewed rigorously and we will select the papers best exemplifying strong analytic thought and originality. Please visit our website for past publications: www.columbiaeconreview.com We will notify selected authors after the final decisions have been made. We appreciate all contributions. If you have further questions, please contact us at econreview@columbia.edu.
Columbia Economics Review
6
Spring 2018
The Effects of Democracy on Development during Structural Adjustment Masterson evaluates how the nature of a nation’s political system, specifically with regard to the level of democratization, influences the efficacy of structural adjustment programs (SAPs). His study is unique in that it measures correlation with social and economic factors over time, accounting for not only short term, but also medium and long term patterns. He suggests that democratic governments are better equipped than authoritarian regimes to restructure their economies due to the alignment of leaders’ interests with those of their citizens. Given criticisms of the effectiveness of the World Bank and IMF, Masterson’s paper explores how large a role the type of government plays in bringing about the changes that SAPs are meant to produce. -E.L
By Liam Masterson Ithaca College ‘18
Introduction
T
HE formation of the International Monetary Fund and International Bank for Reconstruction and Development (later the World Bank Group) following World War II marked the beginning of the modern development ecosystem as the two institutions helped post-war Europe rebuild and later expanded to worldwide development. In an effort to imitate the success of Europe’s rebuilding, the IMF and World Bank created programs for developing nations to adjust their economies and put them on the same growth track. The most notable of these programs are structural adjustment loans (SAL), later known as structural adjustment programs (SAP), which lend money to developing nations in exchange for macroeconomic reforms. These programs were adopted in scores of countries around the world and have had lasting impacts in the economies that they affected, not all for the best. In fact, many economists and development specialists have declared SAPs on the whole to be outright failures. Recipients of SAPs are often
saddled with debt and have to make reforms that threaten the stability of an economy or political regime. Although the World Bank (WB) and International Monetary Fund (IMF) have often been criticized as forces of neocolonialism that create more problems than they solve, they also seem to be willing to give this economic medicine to any of type regime, regardless of democratic nature.
“Recipients of SAPs are often saddled with debt and have to make reforms that threaten the stability of an economy or political regime.” Evoking the theoretical framework established in The Dictator’s Handbook, this paper explores how variation in political systems can have different incentives that will affect the outcomes of development initiatives. In practice, this paper empirically tests whether Columbia Economics Review
dictatorships or democracies are better at using SAPs to improve the lives of their citizens. Using a two-fold analytical method, adapted from the work of previous studies in structural adjustment research, this paper analyses correlations between pre-Adjustment regime status and multiple social and economic variables in a cross-country study that includes nearly all SAP recipient countries the short, medium, and long-term. The analysis of this study concludes that, with minor exceptions, democratic regimes are more likely to see increased standards of living across multiple social and economic factors, after the introduction of an SAP.
Background SAPs were envisioned to provide financing, in the form of a loan, over a period of several years, in return for reforms in trade protection, price incentives, inflation stabilization, and efficient resource use. World Bank SAPs focused on 3 main areas: macroeconomic stabilization, public sector management, and private sector development. Macroeconomic stabilization includes exchange rate adjustment—usually a devaluation for
Spring 2018 more favorable trade conditions—and public finance improvement through reduction of spending deficits, public expenditure limits, and increase in government receipts. Public sector management cultivated reforms in civil service to reduce corruption and increase efficiency. It also encouraged governments to divest and restructure parastatal corporations and enhance privatization efforts. Private sector development reforms included revamping financial institutions, primarily banks; revisiting trade policies, including reducing tariffs and duties; reforming pricing policies for agriculture and other commodities; and making the regulatory environment more business friendly (Noorbakhsh et. al, 2005; Easterly, 2005). Finally, multimillion dollar loans were attached to the programs to implement these reforms. These plans looked promising; however, during the 1980s and 90s, nations with SAPs not only performed poorly but also often struggled to fulfill their debt payments. Social unrest, economic failure, and accusations of neo-colonialism followed. The World Bank and International Monetary Fund became targets of criticism from structural adjust-
ment loan recipients, industrialized financers, and the development community-at-large. Though reforms were made to the WB and IMF in future programs, problems persisted. In 2002, the IMF reprioritized and abandoned non-emergency structural adjustment as a developmental tool, deciding to focus efforts on assisting with the adjustment of macroeconomic factors during short term financial crises, thus leaving the World Bank as the sole provider of future development-focused SAPs (Kapur and NaĂm, 2005). The reforms recommended to SAP recipients must traverse the political system of each nation in order to be implemented, which is where SAPs can run into trouble. Governments have been known to implement the intended reforms to receive loan money only to later nullify or otherwise contradict the reform legislation rather than enforce it. Compliance with the reforms have been shown by Noorbakhsh et al. to improve the human development index score of the implementing country over the short and medium term, improving quality of life for all citizens, but the analysis does not consider how the type of regime affects development outcomes.
Columbia Economics Review
7
Literature Review Development research studying the effects of structural adjustment are plentiful and thorough. The controversial nature of the programs has led many researchers in the past 30 years to ask how and why so many nations experienced unrest and economic turmoil following the introduction of an SAP. Looking at democracy’s effect on the outcomes of an SAP is not a new idea, though it has only been lightly researched. By 1989, even the World Bank acknowledged that SAP outcomes are affected directly by the political environment in which they are implemented. Lindenber and Devarajan found that the world had overall become more democratic during the 1960s through the 1980s and that new democracies performed worse than older, more established democracies (Lindenber and Devarajan, 1993). However, studies have shown that compliance with the recommended reforms improved the short-term performance of the human development of the nation, which continued into the medium term. Those who comply and effectively implement
8
Spring 2018
policy reforms have better Human Development Index performance (Noorbakhsh et. al, 2005), and papers such as Shandra, Shandra, and London (2012) have concluded that increased borrowing through an SAP leads to lower infant mortality rates. Lindenber and Devarajan (1993) also concluded that democracies may accrue debt as quickly as dictatorships but grow faster with regard to GDP per capita and trade as a percent of GDP. Additionally, while a cross-country study found that regimes that did not participate in SAPs had higher rates of inflation, dictatorial regimes were also found have higher rates of inflation than their democratic counterparts, whether they had participated in an SAP or not. Educational performance is also positively affected by democratic regimes. Dahlum and Knutsen (2017) used cross-country and panel data analysis to conclude that democracies consistently create conditions that lead to an increased “share of young citizens that attend school and the number of years they stay in school” (Dahlum and Knutsen, 2017). There is no evidence that democracies provide better education for students, but that is irrelevant to this study. These findings justify measuring and analyzing education enrollment in this study, included as a social variable.
Methodology In researching structural adjustment, it is challenging to find a control. It is not feasible to form a group of peer nations that have not taken part in SAPs, because most poor countries have attempted at SAPs. Indeed, over 45% of nations have participated in SAPs; almost all that haven’t are post-industrial economies, industrialized countries, or nations so new that the scarcity of their records make them incompatible with this thesis. There are very few economically and socially comparable countries; thus, a true regression analysis with control and experimental groups is inappropriate for the data and present development situation. Instead, this study will use a temporal, standard correlation analysis where the baseline period is a short-term period before introducing structural adjustment. Statistical significance is defined at the 10%, 5%, and 1% levels in relation to P-values of the correlation coefficient value being equal to zero. By nature, countries that receive SAPs have structural economic problems, which causes
unavoidable selection bias (Easterly, 2005) as countries self-select into the group of nations that have had SAPs. This thesis uses a counterfactual approach (Noorbakhsh et. al, 2005) by analyzing the pre and post-structural adjustment years for each nation. This method assumes that the introduction of structural adjustment is the only difference between the pre and post-structural adjustment economies. To mitigate the influence of any acute political or economic crisis, the analysis averages the performance over 5 year periods,
“There are very few economically and socially comparable countries [...] Instead, this study will use a temporal, standard correlation analysis where the baseline period is a short-term period before introducing structural adjustment.” focusing on broader changes in performance due to introducing an SAP. Given that the effects of acute economic incidents have been mitigated in this way, the above assumption is justified as the effects of all other events have been diluted across the 5 year period. However, one should note that this method does not account for acute political events in a recipient nation, such as regime change, as each data point, regardless of period, is always correlated with regard to the average democracy score of pre-adjustment period. The five-year periods are placeholders for the respective short-term, medium-term, and long-term periods, with the mean values of the development indicators for each country aligned in five year periods: 0-5 years before introducing structural adjustment (SA) (Period 1), 0-5 years with SA (Period 2), 5-10 years with SA (Period 3), and 10-15 years with SA (Period 4). This study uses a temporal analysis comparing the difference between the 5 years before introducing the SAP and the average of the first, second, and third 5-year periods after introducing SAP. To analyze the data, this thesis uses a twofold method. First, the whole sample is comColumbia Economics Review
piled and examined in a correlation analysis in order to approximate the relationship between democracy and each selected variable. If there is a relationship affecting development at all levels of democratization, the correlation is statistically significant and will be commented on. If there is no statistically significant relationship, the implications of this result will be expounded as well. Second, adapting a previously-used method in papers by Noorbakhsh et.al, the countries in the sample are ordered from best to worst in Period 1 with respect to their democracy score (DEMO) values and stratified into 3 terciles: top democracies, middle democracies, and bottom democracies. The terciles’ composition represent the top, middle, and bottom democracies before their first SAP’s start date. Within each tercile, a correlation analysis is performed to approximate the relationship between democracy and improvement in the selected development indicators. Note that for any particular metric and period there are gaps in the data, so those countries will be dropped from the sample before the countries are ranked and split into terciles. This stratification method endeavors to discover any correlations that may have escaped identification in the correlation analysis of the whole sample. There is a possibility that there are distinct benefits or inefficacies in achieving development once a nation reaches a certain level of democratization that one would not see in a linear correlation across the whole sample; democracies that have reached the upper end of democratization might see nonlinear improvements in development. Given that there is a statistically significant correlation in a tier, the results will be interpreted and comments will be made on their implications. The methodology in this study aims to counter the problems with studying SAPs and mitigate against any unintentional bias as well as accommodate competing theories and leave room for new interpretations. Developed from the work of Noorbakhsh et al., the method this study uses, as discussed in the section below, is different from other crosscountry analyses that have studied the effect of democracy with relation to development in three important ways. First, while previous analyses have examined all developing countries against those with SAPs, creating a selection bias (Easterly, 2005; Shandra, Shandra, and London 2012; Lindenber and Devarajan, 1993), this study only examines outcomes
Spring 2018 within the set of countries with SAPs, eliminating the potential for selection bias. Second, this study examines each country before and after the start of their individual structural adjustment programs, negating the problems ignored by other studies which failed to consider the effects of multiple SAPs, which many countries have had (Easterly, 2005). This study does not need to consider the effects of multiple SAPs because it compares a country’s adjustment results to its previous performance. Third, while other studies have analyzed outcomes across whole samples of SAP countries, drawing linear relationships, this methodology allows for both linear and nonlinear relationships. The potential outcomes for this analysis are varied and have far reaching implications. If democracies are better at leveraging SAPs to improve their constituents’ quality of life, we should see better outcomes with a better democracy score and the best democracies should have the best outcomes. Conversely, if dictatorships can use SAPs more effectively, we can expect to see worse outcomes with a better democracy score and the worst democracies should have the best outcomes. If stability of regime is what determines performance of development outcomes, we should see better outcomes in the top and bottom democracies and worse performance in middle tier democracies. This may mean that the World Bank (WB) should only lend to politically stable governments. Furthermore, if the top democracies do significantly better than both middle and bottom democracies, we may be able to interpret this as a dictatorship trap, meaning anything that isn’t a full democracy will have negligible performance and should not be considered for further SAPs until an improvement in regime occurs. Additionally, if there is no correlation between the democracy and development outcomes, an interpretation of this result may be for the WB to lend to all qualified nations regardless of regime type.
Data This thesis is interested in understanding how the level of democracy in an SAP participant country affects development outcomes and wellbeing for its citizens. While 88 countries have received SAPs, only 85 have data relevant to this analysis in light of political circumstances in the Former Yugoslavia. SAPs are short-term lending agreements meant to
last 3 to 5 years. However, throughout the complicated history of structural adjustment, many countries have taken multiple loans over decade-long periods, and most research fails to consider the effects of multiple loan packages (Easterly, 2005). This analysis negates this problem by looking at the effect of the first SAP a nation has experienced. Given the methodology used for this study, determining the first time a nation has participated in an SAP is critical to the validity of this thesis. The Projects and Operations of the World Bank Group database and the IMF’s Lending Arrangements database are the primary sourc-
“If democracies are better at leveraging SAPs to improve their constituents’ quality of life, we should see better outcomes with a better democracy score and the best democracies should have the best outcomes.” es for determining the first structural adjustment program that a country has participated in. This data is cross-referenced with academic literature that mentions the first SAP arrangements for a nation. Special structural adjustment loans are excluded from this analysis as they are rare, much more targeted in nature, and less controversial. Similarly, sector structural adjustment loans are excluded as they do not affect the macroeconomy in the same way that traditional structural adjustment programs do. As a proxy for development, 5 economic variables have been selected that the literature has deemed appropriate: GDP per capita with constant 2010 US$ (GDPC), net inflows of foreign direct investment as a percentage of GDP (NDFI), trade, imports plus exports as a percentage of GDP (TRAD), annual inflation calculated by the GDP deflator (INFGD), and annual percentage growth of agriculture value added (AVAD). As a proxy for well-being, 4 social and educational variables have been selected that the literature has deemed appropriate: government expenditure on education as a percentage of GDP Columbia Economics Review
9
(EXED), infant mortality rate per 1,000 live births (IMR), gross secondary school enrollment as a percentage of population (SECEN), and gross tertiary school enrollment as a percentage of population (TEREN). The data for these metrics were collected from the International Bank for Recovery and Development’s database, part of the World Bank Group. The DEMO variable is the primary explanatory variable for determining whether democratic variation affects development. Freedom House publicly publishes their methodology, and while its shortcomings are well documented elsewhere, they are the authoritative source on regimen status. The democracy scores are taken from Freedom House’s database on civil liberties (CL) and political rights (PR). The CL and PR scores are added together and multiplied by negative one to create a composite score (DEMO) where the best democracies have a score of -2 and the worst have a score of -14. This is to make the analysis more intuitive as better democracies will have higher scores; better democracies will have a positive correlation with better outcomes. The CL and PR comments of the score are both vital to this analysis since they make the research comparable across development and democracy research. As discussed in the theory section, the components that are evaluated to make the CL and PR scores are echoed in other democracy research. Civil liberty score components include freedom of expression and belief, associational and organizational rights, rule of law, personal autonomy, and individual rights. The political rights score components include free and fair elections, right to organize, political choices free from domination, political rights for minority groups, level of corruption, and bureaucratic transparency. All the factors that compile to make the CL and PR scores can be used to make governments accountable for their citizens; therefore the CL and PR components appropriate for this study. The economic factor GDPC is used throughout development literature to measure the level of economic prosperity for the typical person in a given economy. While its limitations are well documented in other papers, the author believes it to be appropriate in measuring economic well-being. TRAD and INFGD were selected due to the goals of structural adjustment programs. Increased trade and lower levels of inflation are reflective of market liberalization and economic stability, respectively. NDFI and AVAG were
10
Spring 2018
chosen following other structural adjustment literature use (Noorbakhsh et. al, 2005; Lindenber and Devarajan, 1993) and taking into account the dual goals of SAPs: to improve growth in largely agricultural economies and help make balance of payments adjustments for the recipient country (Easterly, 2005). The social metrics of EXED, IMR, and SECEN have all been used in other SAP-related studies as appropriate measures for well-being and social improvement in recipient countries (Noorbakhsh et. al, 2005; Shandra et. al, 2012; Agarwal and Samanta 2006). Political scientists have suggested that primary and secondary school enrollment are inappropriate measures of development when comparing dictatorships to democracies because both types of regimes have an incentive to educate their citizens with basic mathematic and scientific literacy so that they may be productive in low and medium-skill jobs. Instead, they suggest that tertiary education shows the educational difference between regimes. A democratic leader’s power is strengthened by the well-educated, productive, voting populace. A dictator’s rule is threatened by those who can understand the systems of oppression; thus a dictator is discouraged from sending more of the population through higher levels of education (Mesquita and Smith, 2011). For this reason, TERED is include in the analysis as a social improvement variable, but SECEN is retained due to its prevalence in the literature. The average democracy score for countries in the 5-year period before starting an SAP is 9.5 out of 14. This score would be considered partially free, leaning towards dictatorship, by Freedom House, but one should note that the scores in the sample run the gambit from 2 to 14. For example, the best democracy in the sample is Costa Rica, which has had a fully functioning democracy since independence with free press, protection for minorities, and free, fair, competitive elections with respected positions for the opposition. Costa Rica averages a democracy score of 2 in its respective Period 1 (1980-1984). Contrarily, one of the most dictatorial regimes in the sample is the Democratic Republic of Congo (aka Zaire or Congo-Kinshasa). Earning scores that vary between 13 and 14 for its respective Period 1 (1982-1986), dictator Mobutu Sese Seko ran a politically repressive kleptocracy. A military dictator, Mobutu used arbitrary violence to suppress all political opposition, used the central bank as his personal slush fund, seized do-
mestic and foreign-owned assets to distribute to his cronies, and created a cult of personality that rivals other rules of his cloth. Agricultural growth is modest on average for the sample; however, it is interesting to note that the max and min represent dramatic changes to those economies. The sample’s average expenditure on education, 3.875%, falls below the high-income country average of about 4.75% during similar periods (World Bank, 2017), but one should note that the max in the sample is much higher than that of high-income countries. The GDP per capita values are unsurprising and conform with the idea that these are all developing countries; even the max and min create a range of low-income to middle-income countries. Similarly, it is unsurprising that the infant mortality rate is relatively high in these developing nations, especially given that most of the observations were in the late 1970s and early 1980s. However, the min of infant mortality rates sticks out as being particularly low for a developing nation, similar to infant mortality rates in the United States during the late 1990s. Inflation is unsurprising, on average, as it implies positive inflation, which most economies experience. Secondary school enrollment is predictable as schooling is variable across the developing world, but one should note that the max is at 100% enrollment which means that analyzing the changes in enrollment can only be negative or zero for those data points, making the variable less analytically powerful. Tertiary school en-
“On average, the education metrics, GDP per capita, trade, agricultural value added, and foreign investment have all improved from Period 1 to Period 2” rollment, in contrast, is very low across the sample, with notable exceptions, but can be as low as .027% of the population enrolled. Finally, trade seems appropriate of developing nations with the extreme exception of the max value, which says that the average sum of
Columbia Economics Review
imports and exports for that country during the 5 years preceding structural adjustment is 227.86% of the GDP. On average, infant mortality and inflation have decreased. We should note that IMR has not decreased for all countries in the sample; the max is a positive change for infant mortality rates. On average, the education metrics, GDP per capita, trade, agricultural value added, and foreign investment have all improved, from Period 1 to Period 2, as we would expect in the first 5 years of structural adjustment. However, one should again note that this is the average change, and every variable contains cases of regressive change at the extremes of the sample. On average, agricultural value added, GDP per capita, infant mortality, foreign investment, trade, and enrollment rates have all continued and expanded their trends in Period 3, improving the performance of the metric. But, once again, one should note that for many of the variables, the extremes of the sample have regressed in their development. Additionally, expenditure on education has reversed its expansionary trend in Period 3, and the typical country in the period 5-10 years after the introduction of an SAP spends proportionally less on education than it did before the implementation of the program. Finally, on average, GDP per capita, infant mortality, foreign investment, trade, and enrollment rates have all continued their trends further in Period 4. Similarly, government expenditure on education has, on average, continued its downward trend from and remains still worse than Period 1. Additionally, agricultural growth has slowed down from Period 3 to Period 4 but is still better than Period 1. Finally, the range of inflation values has become smaller in Period 4 compared to all other periods. When looking at the trends of all the periods, we can see that after the introduction of an SAP, most observed variables improve on average over next 15 years. While this is not proof of causation, nor does this paper claim that SAPs will always lead to improved development outcomes, on average, the introduction of an SAP coincides with improvement in many development metrics and only ever coincides with the worsening of standards of living in specific cases. This should be remembered as the results of this study are analyzed in the next section.
Spring 2018 Table 1
Analysis The significant results for the whole sample can be summarized in Table 1. The most significant results are the change in GDP per capita and tertiary school enrollment. The positive correlation for TEREN across all periods can be interpreted to mean that the more democratic a regime is, the higher the gains in tertiary education enrollment will be in all periods observed after an SAP is introduced. As predicted by Mesquita and Smith, the significant positive correlation for TEREN confirms that the more democratic a regime, the more it relies on a highly educated population to stay in power. Additionally, the correlation between initial democracy levels and tertiary get stronger and more significant over time, meaning that the introduction of an SAP can be the impetus for long term expansion of tertiary education in a democracy. If one compares these results to the baseline established previously, we see that, on average, in all periods, tertiary education improved for the typical country in the sample, but negative change from Period 1 is always present for some countries in the sample. Knowing this, we can use these correlations to conclude that the more democratic a nation is when receiving an SAP, the more likely it will be to see positive growth in tertiary education enrollment across the following 15 years. Similarly, while less significant and smaller than the correlation between democracy and tertiary education enrollment, secondary education enrollment (SECEN) is positively correlated with better democracy scores in Period 1. Interestingly, while it follows the same pattern as tertiary education enrollment by increasing magnitude and significance over time, it is less significantly correlated with de-
mocracy scores in Period 1, which may support the idea that while secondary education is important to democracies, it is only conditionally important to dictatorships. Dictators that can’t control the wealth of their nations through resource extraction need a marginally more educated and thus more productive population from which to extract wealth (Mesquita and Smith, 2011). Since secondary and tertiary education are important to democracies but tertiary education is threatening to dictators, we should expect that tertiary education variables are strongly correlated with better democracy scores. However, secondary education is conditionally important to dictatorships, so democracy should only be weakly correlated with secondary education variables, and this result is borne out by the data. Additionally, secondary education enrollment is not significant in the change between Period 1 and Period 2, but becomes significant later on; in contrast, tertiary education has a significant correlation in all periods. This could be because an influx
“[T]he introduction of an SAP can be the impetus for long term expansion of tertiary education in a democracy.” of tertiary educated graduates leads to more secondary education instructors, thus leading to better education and higher incentive to enroll in secondary education. Additionally, tertiary education often needs financing that is typically unavailable at lower income levels. The increase in enrollment could result from
Columbia Economics Review
11
improvements in GDP per capita and therefore income. However, neither of the above explanations were causally tested in this study and should not be considered as conclusions of this paper. Better democracy scores in Period 1 are also correlated with a positive change in GDP per capita in all periods compared to Period 1. This result can be interpreted to mean that the more democratic a regime is, the higher the gains in GDP per capita in all periods observed after an SAP is introduced. This also confirms the idea that democracies are better at enhancing their citizens’ lives because they are accountable for their constituents and must give them what they want in order to stay in power. The correlation’s significance and absolute value both peak in the change between Period 1 and Period 3, then drop slightly after the change period. This can be explained as a delayed response to change. The institutional and structural changes that have been implemented in the recipient nation to improve economic functions may have the greatest payoff 5 to 10 years after they are implemented, then dilute afterwards. If one compares these results to the baseline established previously, we see that, on average, in all periods, GDP per capita improved for countries in the sample, but in every period, negative change from Period 1 is within 1 standard deviation. These correlations imply that the more democratic a nation is when receiving an SAP, the more likely it will be to see positive growth in GDP per capita across the following 15 years, peaking between 5 and 10 years after the introduction. Finally, there is weakly significant negative correlation across the whole sample for the agricultural value-added growth rate. The more authoritarian regimes seem to better improve their AVAD than the more democratic regimes after the introduction of an SAP; however, once again, this is weakly correlated. This may seem like evidence in favor dictatorships’ management of SAPs, but one must remember that this significance is only found in the final observed period and, as will be expanded upon in further paragraphs, this correlation may be evidence that while developing, democracies move away from agricultural growth and in favor of other outcomes, while dictatorships can still benefit, agriculturally, from democratization. One should note that only 2 variables showed no significant correlations in the
12
Spring 2018 Table 2
whole sample analysis or the stratified sample analysis: IMR and EXED. As explained in the next paragraph, it was inappropriate to analyze EXED stratified sample analysis; therefore it would be inappropriate to conclude that the there is no relationship between government expenditure on education (EXED) and democracy after the introduction of an SAP, as it has not been fully analyzed. The most that this paper can say is that further research is needed in this case. However, this paper does conclude that infant mortality is unaffected by a regime’s democratic nature after the introduction of an SAP as there are no significant correlations in either of the analyses. This could be because advances in earlylife care, metaphorically, lift all boats and are unaffected by regime type. Table 2 summarizes the significant results for the stratified sample. Note that the significances across all results in the stratified sample have become less significant as the process of creating terciles reduces the sample size by approximately 67%; thus the statistical power of the analysis is reduced. Due to this reduction in sample size, it is necessary to drop EXED (Government Expenditure on Education) from the analysis as it would have under 15 observations per tercile. Starting with the most significant results, net direct foreign investment inflows (NDFI) is negatively correlated most strongly with democracy in the bottom tercile sample in the difference between Period 1 and the period directly following the implementation of a structural adjustment program (Period 2). The significance and magnitude of the correlation drops in the differences with Period 3 and Period 4, respectively. These are the exact results one would expect if the major gains in NDFI were made in the first period after the implementation of trade liberalization in the worst dictatorships. These markets are
usually cut off from the world with regard to trade, so investors would likely flock to these newly opened markets in the first 5 years of adjustment, and given that the liberalization doesn’t continue because the program has ended, they are unlikely to significantly increase that investment after the initial period of adjustment. If we assume that these worst democracies are able to retain these foreign investors after the initial period of adjustment with corruption, collusion, or other means, one should see that big gains made during Period 2 should plateau and find an equilibrium level of investment, given that trade lib-
“These markets are usually cut off from the world with regard to trade, so investors would likely flock to these newly opened markets in the first 5 years of adjustment.”
eralization doesn’t continue after the program ends, and this is what is observed. Additionally, one should note that the top and middle democracies show weakly correlated negative relationships in NDFI for ∆1-4 and ∆1-3, respectively. However, it should be surprising that the above negative relationships appear in all tiers of the stratified sample analysis but do not appear significantly in the whole sample analysis. Due to the arrangement of the relationships, they never align across all levels of democracies and in the same comparison period, thus failing to Columbia Economics Review
lead to significant whole sample correlations. These results in the NDFI variable vindicate the use of the stratified sample analysis method. Without this methodology, NDFI relationships would never have been found in a whole sample correlation analysis, and this paper would have to conclude that there is no relationship between democratic regime nature and direct foreign investment inflows. Looking at the inflation variable (INFGD), we see that within the top democracy tercile, better democracies are positively correlated with higher rates of inflation. Superficially, this seems to confirm the idea that democracies are specifically bad at controlling inflation (Skidmore, 1989). However, when placed in context with the result found in the whole sample, that better democracies are positively correlated with increased GDP per capita, one sees that these increased levels of inflation did not affect the ability of democracies to make gains in growth. One would expect higher levels of inflation in an expanding economy, especially in an economy that had implemented the reforms to create growth recommended by a structural adjustment program. While the literature may be correct in arguing that democracies often have trouble controlling inflation, this may be the cost to democratic regimes when developing and creating better standards of living for their citizens. Next, the trade variable (TRAD) is positively correlated with better democracies in observation periods ∆1-2 and ∆1-3 in the top democracy tercile. This result is completely expected if it is assumed that the people of a nation want more economic opportunities to sell their products on the international market as well as to purchase goods that aren’t available domestically. Oftentimes, the greatest improvements in a person’s life may be facilitated by goods from other nations
Spring 2018 such as electronics, medications, and other consumer goods. Additionally, the liberalization of trade due to the implementation of an SAP would provide domestic economic opportunities in the export sector. If one assumes that the people of a nation desire these benefits and that the top democracies are the most responsive to their people’s need, then the top democracies will use the policy recommendations of an SAP to expand their economy’s trade conditions. The significance and magnitude of the correlations decrease in further observation periods, which implies that the benefits of these reforms diminish over time. The other significant result in the TRAD variable is in the bottom democracies’ ∆1-4 observation period. This negative correlation implies that at the bottom levels of democracy, the less democratic a nation’s government is, the better its trade performance will be. However, this correlation only appears in the last observational period, so it would be inappropriate to make a conclusion about this relationship without further investigation. It is unlikely that policies implemented with an SAP would affect annual trade averages 10-15 years after their introduction, so it is likely that something about the nature of dictatorships is influencing this result rather than their policies. Moreover, the secondary education enrollment variable (SECEN) is positively correlated with better democracies in the middle tercile of democracies. This means that if a nation is closer to being in the top tercile than the bottom tercile of democracies, then it is more likely to see better improvement in secondary school enrollment than those closer to the bottom tercile. This result follows the Mesquita and Smith model since being a partially free regime means that the rulers must rely on a more productive and thus more educated workforce in order to stay in power. One would also expect that this relationship would appear in the top tercile of democracies since they rely heavily on an educated population to stay in power; however, such a relationship fails to appear. This could be due to the already high enrollment rates in the top democracies, providing little room to grow. This suggestion seems to be the case; according to the author’s calculations, the average top democracy sample country has 33% higher secondary enrollment rates compared to the average middle democracy sample country and 71% higher secondary enrollment rates compared to the
average bottom democracy sample country. This idea is further supported by the whole sample analysis, which shows that better democracies are correlated with having higher gains in SECEN across the board. Finally, the results of the agricultural value added variable (AVAD) tell a coherent story. AVAD is positively correlated with better democracies in the bottom democracy tercile during the first two periods following adjustment. This result follows the Mesquita and Smith model, as the more democratic a nation’s regime is, the more rulers will have to rely on a more productive workforce to stay in power, which is aided by greater food security. As most dictatorships are non-industrialized, agrarian societies, the fledgling dictators in the top of the bottom tercile will have to use the reforms of an SAP to provide their denizens with better food security than their counterparts at the bottom of the tercile. Similarly, the top democracies show less agricultural growth with better democratic principles in the long term. Since, as discussed earlier, better democracies are correlated with greater growth in GDP per capita and trade. This phenomenon could be explained by considering that, in the long term, the reforms of the SAP may have led to increased diversification of the economy and more importation of foodstuffs, leading to less dependence on domestic agricultural growth as an economic priority.
Conclusion In this paper, the author utilized a methodology from previous research to create a robust analysis of the development performance across different regime types after structural adjustment is initiated. While the results of this study are correlations and cannot prove causation, the results consistently conform to theoretical political-economy models. Specifically, the model outlined by Mesquita and Smith (2011), where democracies are better custodians of their countries’ development because the incentives of democratic leaders align with the benefit of the most people, fits the results consistently. This study has shown that, after the introduction of an SAP, better democracies are correlated with higher improvements in GDP per capita, trade, and non-primary school enrollment rates. These results are corroborated by previous research that saw better performance in these variables with Columbia Economics Review
13
democracies than dictatorships. Additionally, while this study shows that democracies are more likely to experience higher rates of positive inflation post-adjustment, this may be a consequence of their increased ability to develop. Finally, this paper confirmed the ability of dictators to use an economic or political event, such as an SAP, to improve their regime’s ability to feed their people, but only when strictly necessary, while democracies instead used the impetus of an SAP to move away from an agrarian society and diversify their economy through trade and education expansion. The implications of this paper are far reaching and applicable to many decisions in future development programs and literature. While these results are in no way conclusive, and additional research is needed in the field, this paper has shown that democracy can be a linchpin factor that determines the success or failure of a macro-economic adjustment program. As such, the recipient regime’s reputation for the maintenance of human rights and democratic principles should be given a primary consideration when implementing such a program. Future research should consider the results of this study when evaluating individual case studies of the failures or successes of a structural adjustment program. Future research in this area should consider that not all relationships between democracy and development will appear in samples of all regime types, but will only appear above a certain level of democratization. The stratification technique used in this study can be used as a template to tease out the subtler non-linear relationships that may not appear in more regime diverse samples. Additionally, the lessons elaborated by Easterly (2005) and others regarding selection bias should be retained and mitigated against in future studies when researching SAPs. Furthermore, future studies in this area should consider the effects of multiple loans over time as well of the magnitude of the loan accepted, as that was not studied here. Future research should also attempt to find causal links for these correlations and generalize the results to describe development strategies for developing nations. And finally, discovering and empirically describing the mechanisms that lead to such correlations should be a top priority for researchers in development and democracy.
14
Spring 2018
Does Collusion among Full-Service Airlines Affect the Entry Decision of Low-Cost Carriers? Li provides a contemporary look into the anti-competitive behavior that pervades the full-service airline market. After an ample literature review, Li revisits past models, improving them in an effort to interpret more recently compiled data. Li finds that incumbent collusion by established full service airlines reduces the odds of new airline entry, especially for low-cost carriers. Since the full-service airline industry represents a prime example of an imperfectly competitive market, these results shed light on how we should approach the concept of market entry in the age of startups. We are excited to explore other applications of Li’s research across other markets plagued by similar inefficiencies. -M.V.C.
By Tianyi Li
Cornell University ‘18
Introduction
W
HEN a low-cost carrier (LCC) decides upon whether to enter a new flight route, it takes into account several factors, for which the existing behavior and expected reactions from the full-service airlines (FSA) will be of great importance. FSAs use varied methods to deter LCCs from entering the market: code-sharing, forming alliances with partner airlines, and price predation. Such anti-competitive behavior may greatly diminish the expected profit of a prospective LCC entrant, thereby lowering the probability of entry. However, it is possible that full-service airlines’ efforts to prevent an LCC’s entry might have very little or no bearing on its entry decision, because FSAs and LCCs target different groups of customers. LCCs tailor their products to price-sensitive customers like students and budget travelers, who make purchasing decisions mainly based on prices. In contrast, customers who buy tickets from FSAs care more about service quality and on-time per-
formance; business flyers and wealthy travelers are usually willing to pay a premium for a comfortable flight with fewer delays and hassles. Although consumers are free to pur-
“FSAs use varied methods to deter LCCs from entering the market: codesharing, forming alliances with partner airlines, and price predation. Such anti-competitive behavior may greatly diminish the expected profit of a prospective LCC entrant.” chase tickets from any airline companies, and overlaps exist between the price-conscious and quality-sensitive groups, in general the collusive behavior of FSAs (code-sharing, forming alliances, and predatory pricing) will not significantly harm the core customer base Columbia Economics Review
of the LCCs. In other words, LCCs are offering a new combination of price and service to the aviation market, and this product differentiation effect allows LCCs to enter a route without suffering fierce competition from incumbent FSAs. This paper is dedicated to studying which of the two arguments above is substantiated by the data. In addition to regression analyses, I apply the concept of “entry threshold ratio” proposed in Bresnahan and Reiss (1991) and use the obtained ratios to infer whether anticompetitive behavior exist among airlines operating in a route. Mazzeo (2002) develops a model to assess the impact of product differentiation on market configuration and firm profitability. By imitating Mazzeo’s methodology, I devise a framework to evaluate the impact of collusion and the differential effects of heterogeneous incumbents on an LCC’s entry decision.
Background and Literature Review A previous study by Goetz and Shapiro (2012) scrutinizes the code-sharing conduct among incumbent airlines as a response to the threat of entry by a low-cost competitor.
Spring 2018 Using a fixed-effect, lagged-time linear probability model to study the threat’s effect on incumbents’ code-sharing decisions, they find that the likelihood of code-sharing with partner airlines increases significantly when there is a possibility of LCC entrance. Nevertheless, their analysis utilizes the linear estimation model to get rid of the fixed effects, resulting in a probability space greater than [0, 1]. Moreover, the authors focus on the codeshare decision-making of the incumbent airlines, and it would be beneficial to complement their research by evaluating the entry decision of the low-cost carriers. To avoid reverse causality, I treat the airline entry pattern as a two-stage sequential game, assuming the incumbent FSAs have entered in stage 1 and concentrating on the 2nd stage entry contest by the LCCs, given the presence and behaviors of the FSAs. Boguslaski, Ito, and Lee (2004) focus on the dynamics of Southwest Airlines’ entry strategies in the 1990s. They estimate the size of a federal legislation’s unfavorable effect on SW’s expansion scheme, and discuss the impact of certain exogenous factors which affect SW’s entry pattern. Building upon their study, my research explores the competition impact of FSA’s actual behaviors (allying, code-sharing, and presence), and by directly characterizing the endogenous actions from the FSAs, a more concrete and convincing argument can be made regarding what factors affect the route-expansion decision of an LCC. Bresnahan and Reiss (1991) develop the concept of entry threshold to study entries’ effects on incumbents’ conducts in markets where one couldn’t directly observe prices and costs. By examining the relationship between market size (S) and the number of firms supported by this market (N), they find that incumbents’ profits and the price level fall sharply with the entry of the second and third firms, and later entrants have little effect on prices and profits. Using data on population and number of airlines operating in a market, I will estimate the market size of different routes and construct entry threshold ratios for the US aviation market, and infer from the ratios whether the incumbent airlines are guilty of deterring entry. Mazzeo (2002) designs an empirical framework to predict the entry and product supply decisions by heterogeneous firms in a market equilibrium. Because the model allows different types of firms to have distinct effects on firm profitability, it can be applied in the
context of airline entry to explore whether the presence of FSAs and LCCs in a route has disparate effect on the profits of new LCC entrants. If empirical evidence shows that an existing LCC has stronger influence on the entry decision of an incoming LCC than an extant FSA does, then it can be reasoned that the entrant competes by offering a differentiated product from the FSA, but not from other LCCs. Mazzeo’s model enables quantitative analysis of how product differentiation lessens competition between airlines of different types, and in my paper, I adapt it to investigate the effect of product differentiation and collusion on the profitability of different kinds of airlines. Furthermore, the sequential game presumption can be elaborated by allowing incumbent airlines in stage one to anticipate the behaviors of subsequent entrants, thereby fitting the entry game into a classical Stackelberg model.
Data, Route and Airline Selections, and Definitions One aim of this paper is to empirically test the impact of full-service airlines’ collusive behaviors on an LCC’s decision to enter a given flight segment, and as a result it is necessary to find data on FSA’s alliance affiliation, codesharing information, and merging decisions. Variables include ticketing and operating carriers for each route, number of passengers on the same itinerary, fare paid, and other routelevel characteristics. The raw dataset extends across several dimensions: time, flight segments, airline carriers, etc., and I am only using the flight segment dimension to conduct a cross-sectional study. I define each domestic flight route as a potential market for entry, and observe in each quarter, in how many routes does a low-cost carrier operate. I limit the sample to the top 45 popular airports, and consider all the nonstop segments among these airports to be potential routes for an LCC to enter. This practice generates 1703 airport-pairs, or potential markets for entry. For all these routes, I consider the entry decisions of 3 major US low-cost carriers—JetBlue, Frontier, and Spirit Airlines. To proxy for the market size of different routes, the Pop variable is constructed, using the sum of the population in the metropolitan areas served by the two endpoint airports. The US Census Bureau provides statistics on metropolitan population estimates and changes, and in this essay, data on 2016 Columbia Economics Review
15
is utilized. Theoretical Entry Model and Regression The LCC entry contest can be modeled in a sequential or a simultaneous game setting. For simplicity, section 4 and 5 assume that full-service airlines have entered a flight market before a low cost carrier has, and the focus of analysis is on the second stage of a sequential entry game where the LCC makes an entry decision and the incumbents do nothing. Moreover, under a multi-time-hori-
“I adopt the nonlinear estimation models (probit) to assess the regressors’ impact on each LCC’s decision to enter. An airline i will enter a flight segment j if the expected profit from serving this market is positive.” zon dynamic game setting, a firm will enter a market if the net present value (NPV) of its expected stream of profits is greater than zero. Since there is not enough information to calculate the NPV of expected profits, I only look at one-period of entry decisions by different LCCs. Because the dependent variable in the regression is a binary (Entry=1 and Non-entry=0), I adopt the nonlinear estimation models (probit) to assess the regressors’ impact on each LCC’s decision to enter. An airline i will enter a flight segment j if the expected profit from serving this market is positive, or the expected variable profit is greater than the entry cost: E(πij)-Cij>0. The reduced form expression for the expected profit is E(πij)-Cij=Xij + ij, where Xij are the explanatory variables and ij is an i.i.d. error term. Although neither the expected profit nor the entry cost is directly observable, the actual entry decision of an LCC is observable: 1, if E(πij) - Cij≥0 and the firm chooses to enter eij=0, if E(πij)-Cij<0 and the firm chooses not to enter. Therefore I estimate the probability of entry by a LCC on a flight segment using a probit
16
Spring 2018
model based on the equation: Prob ( Xij)=Prob ( ij> -Xij ). The Xij matrix contains all the explanatory variables, a detailed description of which can be found in the appendix. Here I summarize how the key variables are constructed. Specifically, the regressors are categorized into three types: route characteristics (population at endpoint cities, and nonstop flight distance); features of the entrants and the incumbents (the hub dummies), and the incumbent behavior variables The first two types serve as controls, and the last category includes the variables of real interest. The incumbent behavior variables consist of a characteristic of the airlines already operating in a flight segment (i.e. the total number of rival operators on a route), and their potentially anti-competitive actions--
the ally dummy. The ally dummy stands for conduct, like code-sharing and joining an airline alliance: since neither participating in an alliance nor code-sharing with a partner airline happens frequently, I use the following rules to construct the ally dummy: if the fullservice airline operating on a flight segment joins an alliance, codeshares with partner airlines, or merges with a partner airline, then this ally variable will have a value of 1; if none of the three actions takes place, then the ally dummy will equal 0. In addition, although price predation is a viable option for incumbent FSAs to deter entry, prices depend upon several different factors (time to buy, purchasing site, bulk fares, award travel, etc.), and it is difficult for FSAs to use price to prey on the entrants. Hence the price variable is not included in this empirical model. The regression model ends up looking like this:
Columbia Economics Review
P(eij=1)= 1Pop + 2Dist + 3Hub + 4NumRival + 5Ally + ij
Results of the Regression and Interpretation The regression results of JetBlue, Frontier, and Spirit Airlines show that the ally variable has a statistically significant, negative impact on the entry decisions of JetBlue ( 5 = -0.352, z= -2.36) and Spirit ( 5 = -0.241, z=-2.06), but not on Frontier Airlines. In comparison, the number of incumbent operators (numrival) has a weakly significant, negative impact on Frontierâ&#x20AC;&#x2122;s entry decision ( 4 = -0.106; z = -1.645), but not on JetBlue and Spirit. Although all three airlines are low-cost carriers, they are distinct in terms of company characteristics, and each company might re-
Spring 2018 spond differently to the strategies pursued by rival airlines. Nevertheless, the statistical evidence shows that in general, the features and actions of incumbent airlines do impact the entry decision of low cost carriers. The empirical outcomes demonstrate that LCCs are prone to incumbent FSAs’ anticompetitive behaviors, even though the LCCs are nominally offering a differentiated service from their FSA rivals. One feasible explanation for this phenomenon is that the FSAs are diversifying their product offerings and providing no-frill services similar to what LCCs have been doing. Since 2016, the major airlines in the United States (DL, AA, and UA) have successively introduced their own “basic economy” products, which offer fewer benefits than ordinary economy class service—a move by the FSAs to match and compete with the LCCs in the market and protect their own market shares. In other words, FSAs diversify their service offerings to compete with LCCs and maintain their dominance in domestic market. The hub dummies in the regression capture an important feature of all the airlines operating domestically in the U.S.: since most U.S. airlines use the hub-and-spoke paradigm to arrange their air traffic, I anticipate that having a hub of its own at one of the two endpoint airports will enhance the LCC’s odds of entry, while having a rival’s hub will diminish the chance of entry. It is evidenced that having a hub will significantly boost all three lowcost carriers’ probability of entry, but instead of having negative impact, the influence of rivals’ hubs on LCCs’ entry probability seems to be positive, whether significant or not. A possible interpretation of this result is that several of JetBlue, Spirit, and Frontier’s hubs overlap with those of the three major airlines’, and consequently, the effect of hubs on entry decisions might be biased. Nevertheless, the effect of hubs on entry is not what the paper is focusing on, as Sinclair (1995) has shown that hub-and-spoke networks are significant determinants for route entry decisions. Further examination of the routes served by the three LCCs reveals that they are not intentionally avoiding the flight segments which are already served by FSAs. Although the anti-competitive behaviors from incumbent airlines do have negative impact on LCCs’ entry decisions, LCCs are still inclined to enter markets where FSAs are currently servicing. On the one hand, most U.S. carriers adopt the hub-and-spoke model in arranging
“I adopt the nonlinear estimation models (probit) to assess the regressors’ impact on each LCC’s decision to enter. An airline i will enter a flight segment j if the expected their flights to achieve economies of scale and greater route efficiency. Since LCCs and FSAs sometimes share the same airport as their own hub, it is possible that both a LCC and a FSA operate flights out of the same hub to an identical, popular destination. On the other hand, a route already served by FSAs might be a mature market with stable demand or promising growth prospect, and the entering LCC is attempting to supply a product which is further differentiated from what the incumbents have already offered.
Application of Bresnahan and Reiss (1991) in analyzing incumbent conducts and predicting market equilibria Firm Conduct Identification using Entry Threshold Ratios Bresnahan and Reiss create the idea of entry threshold by first proposing the concept of “zero-profit equilibrium level of demand.” By assuming profit is zero in market equilibrium, BR rewrites the profit function to express market size S:
where SN is the market size that supports exactly N firms, VN stands for the Nth entrant’s variable profits (price minus average variable costs), FN is the Nth firm’s fixed costs, and d represents individual demand. The ratios of two market sizes can be written as
and as N increases, the successive entry threshold ratios will fluctuate. Columbia Economics Review
17
The entry ratio concept can be applied to deduce the conducts of incumbent airlines in a market. Traditional oligopoly theory states that higher market demand (S) attracts entrants (N), and that fierce competition will bring down variable profits as entry occurs and entry ratios will decrease to 1. By observing to what value the sequence of ratios converges, inference on incumbents’ conducts can be made. For instance, if the sequence converges to 1 and fixed cost is assumed to be constant for all, then it could be argued that additional entries do not affect levels of variable profit per customer, and the firms in the market might be either colluding (i.e. sustaining a cartel) or engaging in perfect competition. If the sequence converges to a level greater than 1, then it is either +1 > 1 or >1. In the former case, the market is not yet saturated, and additional entry will further bring down variable profits; the latter case might result from entrants using inefficient production technologies, or the incumbent firms are deterring competition by creating entry barriers, thereby raising FN+1 above FN. Since the U.S. aviation market is constantly evolving, it is difficult to conclude whether a route is saturated or not. Of the 1703 observations in my dataset, the number of airlines operating in a market (N) ranges from 0 to 6. To model market size, population of metropolitan areas are used. The average size of markets with N firms are denoted as sN. The entry ratios do not exhibit monotonicity. One possible reason is that different airlines operate different frequencies of flights each day on a certain market, and we cannot assume that the number of airlines strictly increases with market size. In other words, since airlines are rarely constrained by capacity, one company could potentially monopolize a huge market by providing many flights a day. This is different from the case discussed in Bresnahan and Reiss (1991), where a single doctor or plumber can meet the demand of only a certain number of people. In the new arrangement, the entry ratios are generally monotonic , and the minor increase from s3/s2 to s4/s3 can be regarded as potential evidence that F4 is higher than F3 and entry barriers are high. In addition, the sequence gradually converges to a value above
18
Spring 2018
1.1, indicating that either the variable profit is still above competitive level, or entry cost is higher for new entrants. Both scenarios suggest that the U.S. aviation market is not highly competitive— either profit margins still exist or newcomers face entry hindrance from the incumbents. Furthermore, the sharp decrease from s1/s0 to s3/s2 is consistent with Bresnahan and Reiss’s conclusion that post-entry competition intensifies at a rate that drops with the number of entrants; the small change from s4/s3 to s6/s5 implies that later entrants have a relatively small influence on incumbent conducts. Market Equilibria and Entry Thresholds Prediction In addition to inferring incumbent conducts, the available data enables likelihood estimation that predicts the equilibrium number of airlines (N) in a market with particular demand conditions. For a market to sustain N firms in equilibrium, the N+1th firm must earn negative profit upon entry. Suppose the profit function of an airline can be written as π(N)=VN × S - FN + m = π(N) + m where S represents market size (approximated by the pop variable), VN and FN respectively denote per-capita variable profit and fixed cost of the Nth firm, and the unobservable error term m is normally distributed, homoskedastic, and independent of explanatory variables. The assumption of normally distributed errors justifies the use of probit functions to estimate the coefficients. The probability of observing no airlines operating in a market is Pr(N = 0) = Pr (π(1) < 0) =1- [π(1)] (1) where (∙) stands for the cumulative normal distribution function (normal CDF). I expect airline profits to decline with the entry of an additional operator (regardless of its type), and therefore the chance of observing N airlines in an equilibrium market is Pr(N) = Pr(π(N) > 0 & π(N+1) < 0) = [π(N)] - [π(N + 1)] (2) Expression (1) and (2) collectively define the probability density functions in the likelihood function (assuming markets are mutually independent). The variable profits function for the Nth airline is assumed to be linear and contains additively separable components:
(3) where X matrix involves the demand shifters, the coefficient i measures the change in per head variable profits when the ith airline enters the market, di is a dummy variable which becomes 1 when the ith airline enters, and V1 equals the per head variable profits of a monopolist. The X matrix comprises dist and ally, and all is are expected to be positive. Similarly, the fixed cost function for the Nth firm is characterized as
(4), where Z matrix holds the cost shifters, coefficients i quantify the rise in fixed cost with the addition of another airline, and F1 equals the fixed cost of a monopolist. The Z matrix consists of hub information, and all i’s are expected to be positive. Additionally, the variables pop and dist are transformed using the following formula to produce a reasonable estimate for the coefficients:
Due to the transformation, variable values above the mean become positive and those below the mean become negative, and a value equal to the mean is converted to zero. This practice ensures that the estimated coefficients for market size (captured by pop) and distance are not too small. Substituting in (3) and (4), the profit function becomes
or equivalently:
Table A (on the following page) displays the parameter estimates of the profit function. Consistent with expectation, all the alpha and gamma estimates have positive signs, indicating that additional entrants bring down per capita variable profit and face higher fixed costs. However, contrary to Bresnahan and Reiss’s finding, the later entrants have greater impact on variable profits (as visualized by graph 1). Since I have
Columbia Economics Review
concluded that the U.S. aviation market is not perfectly competitive, the incremental effects of new entrants on variable profit can be explained using cartel theory: when there are few competitors in a market, it is easy for the incumbents to collude and form a cartel to maximize profit. As new airlines enter the market, the cartel becomes increasingly hard to sustain, and when it breaks down, the drop in variable profit is striking. The estimated parameters enable the prediction of profits under different market conditions. By imposing the two conditions π(N) > 0 and π(N+1) < 0, the equilibrium number of firms operating in a market (N) can be identified. The entry threshold, or the market size that can support a certain number of airlines, can then be estimated by S(N)hat = F(N)hat/V(N)hat, where F(N)hat and V(N)hat are obtained using the estimated parameters and variable values in the dataset. Note that the analysis here cannot account for the intrinsic difference between low-cost carriers and full-service airlines. As the data’s dependent variable represents only the total number of airlines flying nonstop in a market, the differential effect of heterogeneous competitors on an airline’s payoff cannot be computed.
Adaptation of the Empirical Framework in Mazzeo (2002) To study the effect of product differentiation on firm profits and market structure, a more comprehensive mathematical framework must be utilized. So far, the analysis has been presuming that airline entry is a two-stage sequential game, in which the FSAs have established themselves in stage 1, and LCCs make entry decisions in stage 2, given incumbents’ behavior in the previous period. The assumption is made to avoid simultaneous movements in stage 2 (i.e. incumbent FSAs collude as a response to entry, while LCCs’ entry decisions hinge on the anti-competitive behavior of the FSAs), because no pure strategy Nash Equilibrium can be found unless restrictive symmetries are imposed. As a consequence, only the code-sharing decisions and mergers right before an entry takes place were considered. The sequential game presumption can be further clarified by permitting incumbent firms in stage 1 to predict the actions of later entrants when making their
Spring 2018
19
and what types of product to offer. Besides, his model is based on the following premises: a market will have N firms when π(N) > 0 and π(N+1) < 0; companies offering different kinds of product have separate payoff functions; profits are non-increasing with the number of competing firms, and firms can either make their entry and product offering decisions in the same period (the Stackelberg-style specification) or in two different periods (the two sub-stage specification).
own profit maximizing decisions. Drawing upon Mazzeo (2002), I come up with an empirical framework to assess the effects of product differentiation and incumbent col-
lusion on the profitability of different types of airlines. Mazzeo argues that two decisions are endogenous to a firm: the decision of entry Columbia Economics Review
Payoff In the airline entry context, suppose that FSAs and LCCs are providing two kinds of product: low-level service (L) by the lowcost carriers, and high-level service (H) by the full-service airlines. FSAs and LCCs have distinct payoff functions that rely on market demand features (represented by an X matrix), number of firms offering the same kind of product, number of firms offering heterogeneous type of product, and the anti-competitive behavior of FSAs (the ally variable introduced in section 4, here denoted as dm). Therefore, the payoff function is specified as πTm=Xm T+f(N, T,dm)+ Tm, where m is market, and T is the types of firm (H or L). The f function captures the effect on payoff by homogeneous and heterogeneous competitors: the 1*2 N vector stands for the number of FSA rivals (N1) and LCC rivals (N2); the gamma vector involves parameters that represent the incremental effect of a particular type of competitor (H or L) on payoffs: for example, LH3 is the effect of the 3rd H-type airlines on average L-type payoff; LL2 describes the effect of the 2nd L-type rival on average L-type payoff. Note that the specification does not permit heterogeneity within the same type (e.g. JetBlue’s effect on FSA profitability cannot be distinguished from that of Spirit airlines), in order to keep the number of parameters to estimate at a reasonably low level. The dummy variable dm captures whether FSA collusion (i.e. code-sharing, joining alliance, or merging) exists in market m, and the effects of collusion on payoffs are represented as L for LCCs and H for FSAs. Typically, FSAs collude to enhance their profits, and I expect H to be positive. Collusions by FSAs might hurt the profitability of LCCs, so I expect L to be negative. The unobservable in the payoff function, Tm, is presumed to be independent
20
Spring 2018
of all the observables, additively separable, and different for each type of firms in a given market. For instance, in the Denver to Los Angeles segment, 2 LCCs and 4 FSAs are operating nonstop in 2016Q1, so the observed market configuration is (L, H) = (2, 4). Since vector N captures only the rival carriers, N = (1,4) for each LCC and N=(2,3) for a FSA. We can parameterize the average payoff of an LCC as: πLm=Xm L+ LL1+ LH1+ LH2+ LH3+ LH4+ L*dm+ Lm and the average payoff of a FSA as: πHm=Xm H+ HL1+ HL2+ HH1+ HH2+ HH3+ H*dm + Hm Equilibrium Identification An equilibrium market can be depicted by the following inequalities and conditions: 1. πL(X,L,H,dm)>0 and πL(X, L+1, H,dm)<0; 2. πH(X,L,H,dm)>0 and πH(X,L,H+1,dm)<0, and 3. Define hT(X,L, H,dm)=Xm T+f(N, T,dm), and denote πTm=hT(X,L,H,dm)+ Tm. Hence, condition 1 and 2 can be represented by: 4. hL(X,L+1, H,dm)<- L<hL(X,L, H,dm), and 5. hH(X,L, H+1,dm)<- H<hH(X,L, H,dm) 4 and 5 jointly define the region in which the (L, H) outcome in market m can be realized. In addition, if product differentiation does soften competition, the profit margin will change less drastically with the entry of a heterogeneous firm than with a homogeneous one. Mathematically: πL(X, L, H,dm)-πL(X,L+1,H,dm)>πL(X,L,H,dm)πL(X, L,H+1,dm) and πH(X,L,H,dm)-πH(X, L,H+1,dm)>πH(X,L,H,dm)-π H(X,L+1,H,dm), which are equivalent to: 6. πL(X,L,H+1,dm)>πL(X,L+1,H,dm), and 7. πH(X, L+1, H,dm)>πH(X,L,H+1,dm) From which we can further derive πL(X,L,H+1,dm) > πH(X,L,H+1,dm) and πH(X,L+1,H,dm) > πL(X,L+1,H,dm). These two conditions, together with inequalities 4) and 5), characterize the equilibrium in each market. Entry Game Specification In contrast to the regression model in section 4, which treated incumbents’ entry decisions and collusive behavior as exogenously given, here the decisions on entry and collusion by all firms (FSA and LCC) are endogenous. Since I observe the equilibrium market configuration (L, H) in each route, the rules of the entry game do not significantly alter the likelihood estimation.
Estimation The parameters of the payoff functions can be estimated using maximum likelihood, which selects the parameter estimates that maximize the probability of the observed market structures in the constructed dataset. For the 1703 observations, the likelihood function takes the form of Using conditions 1) to 7), any realization of (L, H) can be represented by a realization of ( L, H). Assuming the error terms are bivariate normal, the joint PDF for ( L, H) takes the form In the dataset, L ranges from 0 to 2 and H from 0 to 5. A nested expression for the payoff of a type T airline operating in market m is
where dTLi is a dummy which gets 1 if the i-th low-type competitor is present. Since a market will have N entrants when π(N)>0 and π(N+1)<0, the probability of observing 0 Ltype firm is 1- [hL(X,1, H,dm)], of observing 1 L-type firm is [hL(X,1, H,dm)]- [hL(X,2, H,dm)], and of observing 2 L-type firms is [hL(X,2,H,dm)]. Similarly, for H-type firms, the odds of observing 0 H-type firm is 1- [hH(X,L,1,dm)], of observing 1 H-type firm is [hH(X,L,1,dm)]- [hH(X,L,2,dm)], and of observing 5 H-type firms is [hH(X,L, 5,dm)]. The likelihood function resembles:
Or in logarithm form:
In addition, pop and dist are transformed to popst and distst, using formula (5) and (6) in section 6.2. MLE Results Assuming the 1703 data points are independently drawn from the bivariate normal distribution, and the kernel of the log-likelihood function for a single observation is expressed as: Table B exhibits the coefficient estimates of the two payoff functions (πL and πH). UsColumbia Economics Review
ing the estimated parameters, one can predict the relative payoffs of operating as an LCC or a FSA under various market conditions (X and ally) and in different product-type configurations. For instance, the estimated intercepts indicate that, in markets with similar demand (X) conditions, monopolizing a market as an LCC is on average less profitable than as a FSA (_cons_LCC=0.5516 vs. _cons_FSA=0.7888).
“[T]he estimated intercepts indicate that, in markets with similar demand (X) conditions, monopolizing a market as an LCC is on average less profitable than as a FSA.” Market demand condition (popst) has a positive and significant effect on payoffs of both LCCs and FSAs, and the difference in relative size of the coefficients ( popst_lcc=0.1536 vs popst_fsa=0.2196) suggests that FSAs might favor markets with population above the mean (their payoffs grow faster with increasing market demands), while LCCs might prefer markets with below-mean population. In reality, the predilection for FSA remains when different values of popst are taken account of. For instance, holding distst at sample mean level and assuming popst is twice the sample mean in market i, a monopolizing FSA will in general acquire more profits (πH=0.7888+0.2196*0.6 931=0.9410) than a monopolizing LCC (πL= 0.5516+0.1536*0.6931=0.6581). If, holding others constant, popst is only 5% of the sample mean in market j, then a monopolizing FSA still earns more (πH=0.7888+0.2196*(2.9957) = 0.1309) than a monopolizing LCC (πL=0.5516+0.1536*(-2.9957)=0.0915). The relationship will only be reversed when market size shrinks to approximately 2.7% of the sample mean and payoffs for both types (H and L) are negative, indicating that the market is too small to sustain either type of airlines. In sum, the baseline preference for offering high-level service is too high for demand condition (popst) to alter it. Both estimations report positive estimates for the effect of collusion (ally) on airline payoff, suggesting that collusion among FSAs
Spring 2018
Columbia Economics Review
21
22
Spring 2018 (L-type) and 4 FSAs (H-type), I estimate the impact of first H-type rival on L-type payoff, and the average impact of the second, third, and fourth H-type rivals on πL. This parameterization of the payoff functions helps avoid potential perfect collinearity issues among the dummy variables, and significantly reduces the number of coefficients to estimate. As a consequence, the payoff function for LCCs is transformed into π Lm=X m L+ LL1D LL1+ L0H1D L0H1+ L0HXNUM L0 m (7), where DLL1 represents the HX+ L*dm+ L presence of the first L-type competitor, DL0H1 stands for the presence of the first H-type competitor when there are no L-type rivals, and NUML0HX captures the number of additional H-type competitors when there are no other L-types. In a similar vein, FSA’s payoff function can be parameterized by π Hm=X m H+ HH1D HH1+ HH2D HH2+ H0L1D H0L1+ m H0LXNUMH0LX+ H*dm+ H (8), where DHH1 denotes the presence of the first H-type competitor, DHH2 is the dummy for the second H-type competitor, DH0L1 represents the presence of the first L-type competitor when there are no H-type rivals, and NUMH0LX includes the number of additional L-type competitors when there are no other H-types. In short, the only difference in the alternative specifications (7) and (8) lies in the f function
will increase the payoffs of both FSAs and LCCs. The first conclusion is compatible with traditional oligopoly theory, while the second is quite counter-intuitive. One plausible explanation is that demands are fairly inelastic in certain markets, and consequently, the colluding FSAs raise prices to maximize payoffs, making some customers switch from flying with them to with low-cost carriers. However, the effects of rivals on airline payoffs are quite unanticipated. Instead of having negative signs, almost all estimates (except for d_lh1 and d_hl1) are positive in value, albeit the extent of impact is gradually declining (abs (d_lh_i) > abs (d_lh_i+1); abs (d_hh_i) > abs (d_hh_i+1)). The negative signs of d_lh1 and d_hl1 show that the first heterogeneous carrier significantly reduces the payoff of an airline, but later heterogeneous entrants and
homogeneous firms improve the profitability of an airline, regardless of its type. The results are different from the conclusion in Mazzeo (2002), in which he argues that homogeneous competitors have stronger negative impact on payoffs than heterogeneous competitors, and firms are eager to differentiate. Here, I propose an alternative specification for the payoff functions that involves fewer parameters to estimate. Current MLE analysis requires estimation of 18 coefficients and 2 intercepts from πL and πH, and the relatively small sample size (N=1703) is insufficient to produce accurate estimates for the 20 parameters. Borrowing from Mazzeo (2002), I estimate the “average competition effect” of additional heterogeneous competitors when there are more than one heterogeneous rivals. For example, in a market with 2 LCCs Columbia Economics Review
“[T]he first heterogeneous carrier significantly reduces the payoff of an airline, but later heterogeneous entrants and homogeneous firms improve the profitability of an airline, regardless of its type.” in πTm=Xm T+f(N, T,dm)+ Tm, while the X matrix and error term stay the same. The estimation results are not much different from those in table B. A feasible interpretation of the outcome is that some necessary control variables might be missing in my model, and improvements can be made by finding and adding eligible controls that are exogenously predetermined or immutable.
Spring 2018
23
Conclusion and Potential Ways to Improve Evidence from the sequence of entry ratios shows that the U.S. aviation market is not perfectly competitive, and entrants face barriers to entry. Empirical results from the regression further demonstrate that the anticompetitive behavior from incumbent airlines does affect the entry decisions of low-cost carriers. Although the results are significant, the empirical part of this paper does not take into account incumbent’s reactions in stage 2 of the game. In addition, since entering into a new market does not happen frequently, the approach in section 4 and 5 treats the entry decision problem in a static manner by looking at cross-sections of markets and analyzing the impact of existing market structures on entry decisions. The dynamic aspect of entry decisions, in which LCCs choose to enter different routes across time, is not examined.
“Empirical results from the regression further demonstrate that the anticompetitive behavior from incumbent airlines does affect the entry decisions of low-cost carriers.” A feasible strategy to refine the current paper is to use a panel data method and choose a starting point (e.g. 2013Q1) as a baseline. Only the routes opened by an LCC after the cutoff time are regarded as valid observations of entry, and collusions that take place in one period only have anti-competitive impacts from the next period onwards. Additional factors that might affect entry decisions can be added to the regression equation. For instance, if opening up a new route leads to greater connectivity in an airline’s network (i.e. by adding a new route into the airline’s network, more one-stop routes are created), then the firm might be more willing to enter the market, and the number of new one-stop itineraries that would be realized can be used as an explanatory variable that depicts the network-improving effect. However, simply adding regressors to the
equation will dilute the effect of the target variable that this paper has been focusing on. This new variable, which would capture networkimproving effects might be highly correlated Columbia Economics Review
with the d dummies in the MLE, making it a bad control. As a result, the introduction of additional variables should be moderated.
24
Spring 2018
The Impact of an Urban Neighborhood’s Demographic Characteristics on Renters’ Housing Unaffordability Chen’s paper examines the impact of various demographic factors, including percentage of college student comprising a community and residents’ usage of public transit, on housing affordability for renters. Her focus on renters, a group typically unrepresented in the literature on housing markets, has key implications for policymakers in better determining how their decisions impact all constituents of a community. She ultimately finds that college students decrease housing affordability for renters, an important result for urban planners to take into account when designing community partnerships with universities. - B.H.
By Celene Chen
Northwestern University ‘18
H
OUSING unaffordability is a pressing issue for households not only because housing costs are “the single largest expenditure item in the budgets of most families and individuals” (Quigley and Raphael 2004, p.129), but also because home equity is the “largest store of savings for most households entering retirement” (Sass 2017). Housing affordability determines how much income is available to a household after housing costs, while also dictating how feasible buying a house may be for a renting household, which is crucial for financial stability and retirement. This study will analyze key demographic factors of neighborhoods and their correlation with housing unaffordability. By studying the neighborhoods of Boston, San Francisco, Austin, Philadelphia, Chicago, Portland, New Orleans, Charlotte, Washington D.C., Milwaukee, Detroit, and Los Angeles from 20052015, this study will designate to local policy makers what factors may be correlated with increasing housing burdens. This study will measure the impact of demographic changes, including universities, racial makeup, profes-
sions, along with factors previously identified as correlated with housing unaffordability for renters (Quigley and Raphael 2004, Bowes and Ihlanfeldt 2001). The factors studied include: college student renters in the neighborhood, housing unit quality, white renters as a portion of total neighborhood population, renters commuting by means of public trans-
“The factors studied include: college student renters in the neighborhood, housing unit quality, white renters as a portion of total neighborhood population, renters commuting by means of public transportation, and renters working in typically highpaying industries” Columbia Economics Review
portation, and renters working in typically high-paying industries, which are specified in Table 1. After-housing income also influences a household’s expenditure on child investment. Cognitive achievement in children, which is inversely u-shaped to housing affordability, sees its inflection point when percentage of income expended on housing exceeds 30% (Newman and Holupka 2014). The negative externalities associated with housing unaffordability extend to people whose educations and salaries are hindered from a childhood in a family facing housing stress. Moreover, homeless populations require government assistance for medical care and necessities; in 2015, the federal government spent 5.141 trillion dollars on homeless assistance programs (US Interagency Council on Homelessness 2016). In 2016, African Americans consisted of 35.2% of the homeless population when making up only 13.3% of the total population (US HUD 2016, Census Bureau 2016). With the government spending trillions on homelessness and housing policies’ racially charged history in segregation, housing affordability intersects race and class. An analysis of housing affordability with de-
Spring 2018 mographic variables accounts for the social context of housing, which is often not the primary focus in economic papers. Having an understanding of what trends are associated with increases in housing unaffordability at the neighborhood level as opposed to the city level will inform local legislatorsâ&#x20AC;&#x2122; decisions on relationships with universities, housing, and diversity initiatives. To study these trends, this study utilizes the Integrated Public Use Microdata Series (IPUMS) to source the American Community Survey (ACS) for Ordinary Least Squares regressions.
Literature Review Quigley and Raphaelâ&#x20AC;&#x2122;s authoritative study on housing affordability in 2004 created a model to quantify affordability for renters and concluded that reductions in affordability
mostly resulted from increasing rents as opposed to decreasing incomes. While Quigley and Raphael (2004) studied housing unaffordability and rent increases as a function of better quality housing units and zoning regulations, they did not account for demographic shifts because their study was at the national level as opposed to the neighborhood level. By performing a more detailed analysis of housing affordability by utilizing their model with the addition of demographic variables, this study could reveal more specific findings for legislators and add to the existing body of housing research. Bowes and Ihlanfeldt (2001) proved how property values increased when in proximity to rail stations as a result of decreasing time spent on commute and increasing retail in the neighborhood. However, the increase in property values was mitigated by the proxim-
Columbia Economics Review
25
ity of the stations from the central city area and the income distributions of the neighborhoods the stations resided in. With the notion of proximity to rail transit increasing property values, this study looks at the percentage of a neighborhoodâ&#x20AC;&#x2122;s population commuting by public transit to determine if popular usage of public transit drives up rents and housing unaffordability. Cortes (2004) was the first to research university campus impacts on the housing market of the local neighborhood it resides in. In his overview, he reflected upon how university neighborhoods are markedly different from non-university neighborhoods--they have higher rates of unemployment and house more minorities and households living in poverty. The author discovered a positive relationship between proximity to universities, both public and private, and housing valua-
26
Spring 2018
tions, but did not in relation to rents. However, his dataset featured data from 1980-1990 and was focused on the cities of Chicago, Cleveland, Detroit, Milwaukee, and Philadelphia. In studying college-age residents in a neighborhood, this study seeks to continue research into how universities impact their neighborhoods with a focus specifically on how the population of college-age students may impact neighborhoods’ rent. Previous studies have examined how colleges impact the economic environment surrounding them (Harris 1997, Turner 1997, Steinacker 2005). However, few have analyzed the specific impacts of college campuses and the nearby housing market. Existing economic literature regarding university effects on housing unaffordability is predominantly limited to house values as opposed to renting costs for housing. A study on the University of Wisconsin-Whitewater found that houses closer to the university sold at higher prices (Kashian and Rockwell 2013). A case-study on the University of Pennsylvania found similar results-house prices near campus increased by 160% from 2000-2010 as opposed to 54%
for the rest of Philadelphia (Ehlenz 2016). Studies focusing on the appreciation of house prices near universities neglect to account for the increases in housing unaffordability for renters. Kashian and Rockwell referred to increases in housing prices as positive externalities, but that is from the perspective of the homeowner as opposed to the renter.
Data To uncover how an urban neighborhood’s demographic characteristics may impact housing affordability, this study utilizes neighborhood-level data drawn from IPUMS. Neighborhood data is specified in this study because within cities, there are disparities of wealth, as is evidenced by the range of unemployment rates and housing unaffordability seen in Table 2. Furthermore, determinants of housing unaffordability for renters at the neighborhood level will help urban local policymakers whose day-to-day decisions are at that level. With differences in demographics, city neighborhoods may have causes for housing unaffordability that have not been captured at the
Columbia Economics Review
national level. IPUMS is located at the University of Minnesota’s Population Center and supplies data from US censuses in the form of microdata. A strength of this dataset is the neighborhood granularity in the ACS, which surveys individuals. IPUMS is used to extract data from the ACS from 2005-15. The ACS is administered by the U.S. Census and is a national survey sampling about 3.4 million addresses annually. Specific sampling rates are chosen for blocks-for the 2000-10 ACS, blocks with less than 800 housing units were sampled at a 1-in-2 rate whereas those between 1,200 and 2,000 were at a 1-in-6 rate. The data is collected by mail, telephone, and computer assisted telephone interviewing. This paper follows the methodology crafted by Quigley and Raphael, who, in their influential 2004 study on housing affordability, utilized IPUMS to extract ACS data for the purpose of analyzing housing affordability on a national level from 1980-2000. In this study, IPUMS is used to extract ACS data to analyze neighborhoods in cities intended to represent urban life in the US. Quigley and Raphael’s model compared the income
Spring 2018 of renters to the median rent. If the median neighborhood rent exceeded 30% of the renter’s income, the housing cost was deemed unaffordable for the renter. This study utilizes the same methodology and is detailed in Appendix Table 1, which lists the variables used and their calculations. Rent in the IPUMS dataset is collected as an individual’s monthly expenditure on rent while income is an individual’s yearly income. For this study, the monthly rent is transformed to a yearly value to measure housing unaffordability. The summary statistics are presented in Table 2. Of note is the diversity of neighborhoods sampled, which is exhibited by the wide ranges of unemployment (1.2% to 50.7%), percentage of residents commuting via public transport (0 to 90%), and percentage of white residents (0 to 91.8%). 0% white residents were observed in two neighborhoods: one in Chicago, and one in Detroit. Especially noteworthy is the mean of housing unaffordability--an average of 20% of renters in neighborhoods faced housing unaffordability. Figures 1 and 2 demonstrate a positive relationship between housing unaffordability, and college students and renters with highpaying jobs. These figures showcase how demographic explanatory variables may be correlated with housing unaffordability. While it may be counterintuitive that the higher the percentage of renters who work in highpaying professions is, the higher the housing unaffordability of the neighborhood will be, this relationship aligns with common gentrification concepts. The idea is that as the demographics of a neighborhood change to house younger renters and those who work in higher-paying industries, housing becomes more unaffordable for everyone. The strong, positive correlation between college students (.5954) and high-paying industry renters (.5902), and housing unaffordability further supports the phenomenon of gentrification.
affordability as well as controlling for those that decrease income. In addition to examining “college students”, this study includes the percentage of renters who are white, work in high-paying industries, are employed, and commute to work via public transportation. An additional control, mean rooms, is a proxy for housing unit quality, which Quigley and Raphael (2004) determined to be a source of housing unaffordability. The expectation is that there will be positive correlations for all aforementioned variables except for percentage of white renters. Since a percentage relationship is valued in this analysis, a logarithmic transformation for number of college students is used in the model. This way, one can determine how a 1% change in college student population may impact housing unaffordability. However, since college students often earn little to no income, this study may superficially depict that housing is unaffordable for their income level when compared to the median rent, even if they are financially supported by a wealthy family. This bias would inflate the relationship between housing unaffordability and college students. However, the mean percentage of student renters in a neighborhood is 2.51% in this study, and the mean percentage
Hypothesis
of college students making up residents facing housing unaffordability is 4.03%, so the bias should be limited.
My hypothesis is that “college students” increase “housing unaffordability” for renters in urban neighborhoods. Theoretically, an influx of college renters, all else constant, will increase renters’ housing unaffordability, as college renters will generate greater demand for renting units, which, in turn, increases rents. This study focuses on factors that may cause increasing rents to impact housing un-
“Theoretically, an influx of college renters, all else constant, will increase renters’ housing unaffordability, as college renters will generate greater demand for renting units, which, in turn, increases rents.”
Econometric Model and Empirical Findings The initial Ordinary Least Squares Regression performed was of a linear form with the estimated regression in Figure 3. To correct
Columbia Economics Review
27
for potential heteroskedasticity–since housing unaffordability estimates may vary greatly between neighborhoods of different sizes–the model was run with heteroskedasticity-robust standard errors. While the standard errors and t-scores of the linear model were not high and low respectively, the tell-tale signs of multicollinearity, a VIF test (Appendix: Table 5) was performed. As no VIF value was equal to or greater than 5, and the mean VIF value was 2.01, multicollinearity is not categorized as a considerable issue. However, because the dataset used is a panel dataset featuring a dozen cities over eleven years, a fixed effects model is used with heteroskedasticity-robust standard errors (Figure 4). Since there is typically some correlation between the explanatory variables, and because nearly every model omits a variable, bias is likely without a fixed effects model. With this model, many of these variables are implicitly included. Dummy variables are included for each city and year so they have different intercepts (see appendix Table 1 for dependent and independent variable definitions). While this is not a true fixed effects model because the entities specified are at the city level as opposed to the neighborhood level, the city dummy serves the purpose of incorporating potential omitted variables. As this dataset has 1,025 observations, the loss of degrees of freedom with the added fixed effects is not consequential. Multicollinearity in this model is expected for the fixed effects, and when performing a VIF test (Appendix: Table 6), all variables with a VIF greater than 5 were fixed effect variables. While the VIF numbers for the Fixed Effects Model are markedly higher than those of the linear model, the Mean VIF, 3.44, is still below 5, and none of the explanatory variables have a VIF of greater than 5. Additionally, the consequences of multicollinearity, higher standard deviations and lower t-scores, do not manifest significantly in the fixed effects regression results. Table 7 presents the regression results of both the linear and fixed effects model. All explanatory variables are statistically significant at the 1% level. The fixed effects model will be used in this discussion of findings because of its removal of bias and better fit (a higher R squared and adjusted R squared). As was expected, an increase in the number of college students in the neighborhood is correlated with an increase in housing unaffordability in the neighborhood. A 1% increase in college students is correlated with a 2.05% increase in housing unaffordability,
28
Spring 2018
all else held constant. Also, as expected, a 1% increase in professional renters is correlated with a .49% increase in housing unaffordability. The coefficient for unemployment aligns with expectations that neighborhoods with more unemployed households will face greater housing unaffordability. A 1% increase in the unemployment rate is correlated with a .29% increase in housing unaffordability. The truism that white neighborhoods in urban areas face less housing unaffordability is demonstrated in the regression, where a 1% increase in the white population in a neighborhood is correlated with a .15% decrease in housing unaffordability. The variables chosen as a result of findings from existing literature, pctpub (the percentage of renters who use public transportation to commute to work) and meanrooms (the mean number of
“[I]t is imperative for policymakers to consider how universities’ expansion may impact the housing affordability of their renting constituents.”
for the individual renter, which is observed in this study. Furthermore, Quigley and Raphael (2004) denoted zoning laws as indicative of housing unaffordability, but this data was not available at the neighborhood level. While the variables may be positively biased to account for zoning laws, the fixed effects model should limit this impact.
Conclusion with Policy Implications This study allows local policymakers representing predominantly renters in urban areas to make decisions based on a renter-specific study, a demographic not highly represented in the current economic literature. With a 1% increase in college student population correlating with a 2.05% increase in housing unaffordability, it is imperative for policymakers to consider how universities’ expansion may impact the housing affordability of their renting constituents. The findings of this study affirm the qualitative and anecdotal evidence that has supported those campaigning against gentrification of urban neighborhoods. Given that college students and professional renters, who are often the drivers of gentrification, have a statistically significant correlation with increases in housing unaffordability,
rooms in the neighborhood’s renting units), do not align as superbly as the previous variables. Bowes and Ihlanfeldt (2001) discovered a positive relationship between house values and proximity to rail stations. Following that logic, the more renters in a neighborhood that commute via public transportation, the higher the rents and housing unaffordability. While the coefficient on pctpub is statistically significant in the positive direction, a 1% increase in renters using public transit to commute is correlated with just a .06% increase in housing unaffordability. Quigley and Raphael (2004) denoted unit quality as a driver for housing unaffordability. Using meanrooms to denote better quality rental units resulted in a sign opposite from what was expected–neighborhoods with a larger mean of rooms per unit were more affordable to those residents. Using meanrooms as a proxy for quality of rental unit is a rudimentary measure. In this study, it may be capturing how neighborhoods that have rental units with more rooms are able to have rent split between more people, which will be cheaper Columbia Economics Review
gentrification has negative impacts on neighborhoods. While increasing rents may benefit landlords, for renters, increased housing stress correlates with worse health outcomes and greater reliance on government expenditure for assistance. Policymakers may seek to reconsider community partnerships with universities to mitigate negative impacts universities may have on housing affordability. Urban planners may also reconsider viewing gentrification in a positive manner when viewing its impacts on housing affordability. Instead of focusing on gentrifying an area, which may have negative consequences on housing affordability, policy-makers could focus on more direct measures of benefiting the community such as investment in community infrastructure. While this paper serves to fill the relative dearth of research on renters’ housing affordability, this study is not a comprehensive analysis on demographic trends’ impacts on housing affordability. As was mentioned, there are econometric issues with college students counting as part of the neighborhood facing unaffordable housing. More research focusing on specific demographic drivers of increasing housing unaffordability for renters (as opposed to homeowners) is highly recommended.
Spring 2018
29
Table 3: Correlation Table
Figure 3: Linear Model for Ordinary Least Squares Regression housingunaffordabilityit = 0 + 1log(scollunderit) + 2pctpubit + 4pctwhiteit + 5pctprofessit + 6meanroomsit + it
3
erateit +
Figure 4: Fixed Effects Model for Ordinary Least Squares Regression housingunaffordabilityit = 0 + 1log(scollunderit) + 2pctpubit + 3erateit + 4pctwhiteit + 5pctprofessit + 6meanroomsit + citydummies + yearfixedeffects + it
Columbia Economics Review
30
Spring 2018
The Link Between Immigration and Trade: Evidence from South Korea Cheah’s analysis seeks to elucidate the link between immigration and trade in South Korea. Using rigorous statistical analyses, Cheah
With immigration and trade at the forefront of our nation’s current political debates, Cheah’s research illuminates new avenues of thought for future policy design. -K.W.
By May Lyn Cheah UC Berkeley ‘18
Introduction
I
MMIGRATION to South Korea has increased rapidly after the government loosened strict immigration controls in response to its aging population concerns. The share of South Korean elderly (people who are 65 years or older) has increased by 3.7 times since 1970 while the OECD average has increased just 1.5 times. In fact, South Korea has the fastest pace of aging and the lowest fertility rate among OECD countries. South Korea has shifted from being a labor-abundant country to being a country which hires foreign workers to work in its manufacturing and farming industries. Research by the South Korea Economic Research Institute indicates that South Korea might need approximately 15 million immigrants by 2060 to prevent the decline of population and to curb labor shortage. Stock of foreign born population in South Korea increased from 210,249 in 2000 to 1,091,531 in 2014 (see Figure 1). The increase is partly attributed to the implementation of Employment Permit System (2004), a non-
seasonal temporary labour migration program which aimed at curbing labor shortage in the “3D” industries (difficult, dangerous, dirty). The recent developments surrounding South Korea’s immigration policies which led to a higher percentage of foreign-born population have given rise to many research papers on the impact of foreign immigration on political and social development in South Korea. However, the literature on the impact of foreign immigration in South Korea on bilateral trade is relatively scarce. Thus, this research paper aims to fill this gap by investigating the impact of foreign immigration, specifically in South Korea, on bilateral trade flows between South Korea and its immigrants’ countries of origin. Trade plays a vital role in promoting a country’s economic growth, and immigrants’ networks increase bilateral trade flows between countries of origin and destination countries. For instance, the ground-breaking paper by Gould demonstrates that there is a positive impact on exports and imports flows due to increase in immigration in the United States (Gould 1994). The dramatic rise in aging population in South Korea will possibly Columbia Economics Review
result in a high age dependency ratio and a shrinking labor force, leading to slower economic growth. Thus, immigration policy is expected to be gradually relaxed over the next few years, and the number of low-skilled immigrant workers is projected to increase gradually. This research aims to deepen the understanding of immigrants’ role in promoting bilateral trade flows, which might promote positive economic growths in both countries of origin and destination countries. The main contribution of the paper is to elucidate the differences in impact on bilateral trade flows of immigrant stock, depending on the OECD classification of immigrants’ countries of origins and the countries’ income levels. Gould proposes that there are two mechanisms that can explain the relationship between immigration and bilateral trade flows (Gould 1994). Firstly, immigrants tend to prefer to buy goods from their home countries; secondly, immigrants are more informed about market opportunities and usually have connections with businesses in their home countries, thereby reducing transaction costs of bilateral trade (Gould 1994). Gould suggests that if the first hypothesis is right, then
Spring 2018
Columbia Economics Review
31
32
Spring 2018 Figure 1: Total Immigration Stock in South Korea (2000-2014)
imports from immigrants’ countries of origin will increase with rise in immigration, and if the second hypothesis is right, exports to immigrants’ home countries will increase. He concludes that if both exports and imports increase, that implies the significance of both hypotheses. This research aims at addressing the following questions: 1) How does a large increase in immigration stock impact overall trade volume? What are the magnitudes and signs of change in exports and imports flows due to increase in immigrant stock in South Korea? 2) How do immigrants from OECD and non-OECD countries compare in impacting exports and imports volume? 3) Do immigrants from OECD or non-OECD countries have a larger positive effect on overall trade? 4) How do the immigration elasticities of exports and imports of South Korea compare to that of other countries? The hypotheses that will be tested include the following. Firstly, immigration exerts an overall positive impact on both South Korean exports to and imports from immigrants’ countries of origin due to reduction in trade costs and immigrants’ preference for their home countries’ goods. Secondly, immigrants from OECD countries have a larger positive effect on imports from their home countries compared to immigrants from non-OECD countries. One of the reasons can be due to the fact that immigrants from OECD countries tend to be highly skilled and thus possess higher purchasing power. Thirdly, immigrants from non-OECD countries have a larger positive effect on exports to their home countries than immigrants from OECD countries. This can be attributed to the higher value-added effects of these immigrants since the infor-
mation that they have about their countries’ underdeveloped market institutions is previously almost inaccessible to South Korea. Lastly, immigrants from OECD countries exert an overall larger positive effect on bilateral trade flows between their countries of origin and South Korea than immigrants from nonOECD countries.
Literature Review Previous literature demonstrates that immigration impacts bilateral trade through two main channels, i.e. immigrants’ preference for home goods and immigrant ties effect (Gould 1994). Firstly, it is commonly believed that immigrants prefer to consume their home countries’ products, which are often not available in the host countries. Moreover, substitutes of their home country products can hardly be found in host countries. Thus, the scarcity of home country products in host country and the immigrants’ demand for their origin countries’ products results in an increase of imports from those home countries. Secondly, immigrants tend to possess extensive knowledge of their origin country markets, business connections, business practices as well as language proficiency, enabling them to help lower the transaction costs of bilateral trade. Gould’s groundbreaking paper (Gould 1994) demonstrates that there is a positive impact on bilateral trade flows between host and home countries due to increase in immigration. Previously, traditional factor endowment theorem such as the Rybczynski theorem, which lies within the context of Heckscher–Ohlin model, suggests the comColumbia Economics Review
plementarity of immigration and trade flows only if the host country is labor abundant. Gould paper is unprecedented because it is the first paper which proposes a model that estimates the relationship between immigration and bilateral trade flows after accounting for differences in factor endowments across countries. He argues that immigrant ties to their countries of origin play a pivotal role in promoting bilateral trade flows between host country and countries of origin due to lower transaction costs which are incurred in the process of securing foreign market information and fostering trade relations. Following Gould’s pioneering research, several other papers have also found strong causal relationship between immigrant stocks in host country and bilateral trade flows. Head and Ries (1998) found that a given percentage increase in inflow of immigrants result in a larger percentage increase in Canadian imports from the immigrants’ home country than exports from home country. This finding deviates from Gould’s findings which suggest that the pro-trade effect of immigrants on
“The dramatic rise in aging population in South Korea will possibly result in a high age dependency ratio and a shrinking labor force, leading to slower economic growth.” exports is larger. The paper also attempts to find the impact of different subgroups of immigrants in promoting trade and it finds that independent immigrants influence trade the most while refugees influence trade the least. The mixed results from Gould’s and Head and Ries’ studies have motivated me to investigate whether there is a larger change in exports or imports due to increase in immigrant stocks in South Korea. My paper seeks to further investigate the prevalence of a non-individual, specific mechanism by segregating the dataset into subgroups, according to OECD classification and the countries of origin income levels. Immigrants from non-OECD countries are
Spring 2018 Table 1: Descriptive Statistics
hypothesized to be statistically significant in impacting trade due to the knowledge that they have about their home countries which are more dissimilar to South Korea in terms of social and market institutions. I will also be employing similar approach to my dataset. However, I will be using different specifications- I interact immigrant stock with the dummy variables of OECD membership of immigrants’ home countries to find out if there is a difference between immigrants from OECD and non-OECD countries in impacting trade. The coefficient of interaction variable will provide deeper insight into the heterogeneity effects of immigranttrade link. My research paper employs an alternative approach to divide the immigrants into subgroups by using OECD membership status and countries’ income levels. Kim and Lim (2016) used South Korean immigration data to find out the effect of heterogeneity in network effects. They classified immigrants into two groups, i.e. skilled and unskilled immigrants based on their visa type. The comprehensive nature of my dataset, which includes 191 countries over 20002014, would increase the efficiency of my econometric estimates and accuracy of the econometric model used, due to the higher
number of countries and longer period observed. In addition, my regression analysis
“[I]mmigrants from OECD countries exert an overall larger positive effect on bilateral trade flows between their countries of origin and South Korea than immigrants from nonOECD countries.” would include a free trade agreement dummy variable to control for the increase in trade due to trade pacts. There has been a substantial amount of immigrant-trade link literature associated with European, North American countries, as well as Australia and New Zealand. However, there are very few papers on immigrationtrade link written on Asian countries; thus this scarcity in the literature provides an opColumbia Economics Review
33
portunity to delve deeper into immigrationtrade relationship in the Asian context. A recent United Nations dataset reveals that there is a larger increase in international migrants in Asia than in any other major region between 2000 and 2015. Intraregional migration is the most common; in 2015, 82 percent of migrants in Asia originate from another country within the same region. The drastic increase in immigration in Asia reinforces the idea that more research should be done on the Asian region to contribute to future policymaking decisions. South Korea is a unique Asian country to study as it has only recently turned from a net migrant-sending country into a net migrant-receiving country: foreign workers only started to migrate to South Korea in early 1990s due to lowly-skilled labor shortage (Dong et al. 2012). Most literature have focussed primarily on countries with extended immigration history; thus South Korea’s relatively short history of net inward migration allows us to isolate the impact on trade due to latest immigration from “longestablished” bilateral relations (Peri and Requena 2009). Moreover, the immigration policy loosening associated with the implementation of Employment Permit System in 2004 is an exogenous shock that result in a drastic increase in immigrant workers. The relaxation of immigration policy is a move undertaken by the South Korean government to curb the labor shortage problem in “3D” (difficult, dangerous, and dirty) industries. The substantial increase in immigrant workers over a short period of time allows us to better identify the effects of new immigrants on trade.
Description of Data & Regression Specifications Description of Data For the purpose of this research, I will be employing a panel dataset consisting of 913 observations. The data set contains countrylevel variables, includes 181 South Korean trading partners and spans a period of 15 years, from 2000 to 2014. My dependent variables are the natural log of exports, imports and total trade; the export and import values are measured in terms of current USD and the data is obtained from OECD SITC Revision 32. The natural log of immigrant stock data by country of origin is the independent variable in this study, and the immigration
34
Spring 2018
Columbia Economics Review
Spring 2018 data is obtained from OECD International Migration Database. The control variables in this research are the natural logs of GDP, GDP per capita, distance as well as presence of trade agreements. Both GDP and GDP per capita data are obtained from the World Bank database3. Both GDP and GDP per capita are measured in constant 2010 USD. Bilateral distances between South Korea and its immigrants’ country of origin, measured in kilometres, is obtained from CEPII GeoDist Database. The bilateral distances are calculated by CEPII using the great circle formula; the calculation involves using the longitudes and latitudes of the most important cities in terms of population size. The free trade agreement dummy variable is constructed using data from World Trade Organization (WTO) which shows South Korea’s existing free trade agreements with other countries (see Table 1). The databases used to construct the panel dataset are reliable as they are extracted from established organizations such as OECD and the World Bank. Detailed data on the number of illegal immigrants is hard to obtain, and I believe that it will not have a significant effect on my regression results due to the small
“South Korea’s relatively short history of net inward migration allows us to isolate the impact on trade due to latest immigration from “longestablished” bilateral relations” proportion of illegal immigrants compared to the whole immigrant population. Regression Specifications I will be using an augmented version of the standard gravity equation for the purpose of this paper. The augmented version of the equation, was first introduced by Linnemann (1966) who augments Tinbergen’s standard gravity equation to explain trade flows (Tinbergen 1962) by including population as one of its explanatory variables. The standard gravity equation originally states that the total
sum of exports and imports between countries i and j, Tij , during a particular year, increases with the product of both countries’ nominal gross domestic product, YiYj and decreases with distance between the two countries’ economic centers Dij , I will be using the fixed effect model and my estimated equation used is as follows: (1) ln Tradeij =
+ 1 ln Immigrant Stock from country j + 2 ln GDPj + 3 ln GDP percapitaj + 4 ln Distanceij + 5 Trade Agreement + eij. 0
Empirical Strategy Methods Firstly, I will run an ordinary least squares (OLS) regression with yearly fixed effects using the outcome variable and explanatory variables as listed in the estimated equation above. Country fixed effects are not included due to the presence of time-invariant regressors in my original equation; i.e. ln(distance) and the OECD dummy variable. Secondly, I will separate my sample size into two groups, according to the OECD classification of immigrants’ countries of origin. Three different OLS regressions will be performed on each subgroup using three different outcome variables while keeping the same explanatory variables. The three different outcome variables are ln(tradeij), ln(exportsij) and ln(importsij). These OLS regressions will enable me to find out the differences in size of magnitudes of effect of immigrant stock on total trade, South Korean exports to and finally imports from immigrants’ countries of origin respectively. Lastly, I will divide my sample into four subgroups, according to the income levels of immigrants’ countries of origin, i.e. high income, upper middle income, lower middle income, low income countries, and perform OLS regressions on each subgroup. This will enable me to make group comparisons and find out if the coefficients for ln(immigrant stock) differ across groups. Empirical results Regression on total exports, imports and trade (sum of exports and imports) Firstly, I run OLS regressions with year fixed effects on logarithms of exports, imports and total trade volume respectively using the following equation:
Columbia Economics Review
ln Tradeij =
35
+ 1 ln Immigrant Stock from country j + 2 ln GDPj + 3 ln GDP percapitaj + 4 ln Distanceij + 5 Trade Agreement + eij.
0
The coefficients on the logarithms of South Korean GDP and GDP per capita (lnGDP and lnGDPpcpt) are all positive and statistically significant at 99% significance level for all the regressions. The coefficients of logarithms of bilateral distance variable (lndist) are all negative and statistically significant. This is consistent with coefficients of regressions, as predicted by the standard gravity model and is consistent with previous empirical results. The coefficients of lnimstockthe variable of interest, are all positive and statistically significant for regressions on lnexports, lnimports and lntrade, as shown in Table 2. A 1 percent increase in immigration stock in South Korea will, on average, increase the export volume by 0.221 percent, import volume by 0.199 percent and bilateral trade volume by 0.175 percent. This supports my first research hypothesis that immigration has an overall positive impact on bilateral trade flows between South Korea and immigrants’ countries of origin. These results are consistent with previous empirical researches that found that immigrants increase bilateral trade flows due to reduction in trade costs and preference for their home countries’ goods. The reducTable 2: OLS Regressions on lnexports, lnimports, and lntrade
36
Spring 2018
Table 3: Regressions on lnexports, lnimports and lntrade for subgroups OECD and Non-OECD countries of origin
tion in transaction costs of bilateral trade can be attributed to the fact that immigrants are more informed about market opportunities and they tend to have more connections with businesses in their home countries (Gould 1994). Also, immigrants tend to prefer to buy goods from their home countries; those goods are generally unavailable in the host country (in this case, South Korea). Their demand for their home countries’ goods drives the demand for imported goods into South Korea. Secondly, I divide my sample size into two groups, OECD and non-OECD countries of origin according to the OECD member status of immigrants’ home countries. OLS regressions on logarithms of exports, imports and total trade volume (lnexports, lnimports and lntrade) are performed on both subgroups. The coefficients of lnGDP, lnGDP percapita are all positive and statistically significant across both subgroups. This finding is consistent with the standard gravity model’s theoretical prediction. However, the coefficients of logarithms of bilateral distance variable, lndist, are all positive and statistically significant for regressions across both subgroups except for one- regression on lnimports of nonOECD subgroup. This finding contradicts the theoretical prediction of standard gravity model which demonstrates the inverse relationship between trade and bilateral distances between countries. This anomaly could be attributed to the fact that the distances between South Korea and its major trading partners
(United States, Germany, Australia, Mexico) are higher than between other countries. The coefficients of logarithms of immigration stock (lnimstock) are all positive and significant across both subgroups. For OECD subgroup, the coefficient of lnimstock is much higher for regression done on lnimports as compared to lnexports. Table 3 shows that the coefficient of lnimstock is 0.278 for OECD, which is higher than 0.182 for nonOECD subgroup when regression is done on lnimports. For OECD countries, a 1 percent increase in immigration stock will increase import volume by 0.278% on average while a similar increase will increase import volume by 0.182% on average in non-OECD countries. This result supports my second hypothesis that immigrants from OECD countries have a relatively larger positive effect on imports from their home countries than exports to their home countries as compared to immigrants from non-OECD countries. This could be due to the fact that immigrants from OECD countries tend to be highlyskilled with high-paying jobs; thus, their higher purchasing power enables them to demand a substantial amount of goods from their home countries and drives the demand for imported goods into South Korea. For the non-OECD subgroup, the coefficients of lnimstock is higher for regression done on lnexports as compared to the regression done on lnimports. For regression done on lnexports, the coefficient of lnimstock is Columbia Economics Review
higher for non-OECD subgroup as compared to OECD subgroup (0.235 for non-OECD vs. 0.0860 for OECD). This denotes that a 1 percent increase in immigration stock will on average, increase import volume by 0.235% for non-OECD countries and 0.086% for OECD countries. This results supports my third hypothesis that immigrants from nonOECD countries have a larger positive effect on exports to their home countries compared to immigrants from OECD countries. One plausible reason for this is that immigrants from non-OECD countries have substantial information about their countries’ underdeveloped market institutions; this information is previously almost inaccessible to South Korea. Thus, these immigrants play a more influential role in increasing exports than their OECD counterparts whose countries have developed market institutions that provide open access to vast amount of information. It is found that immigrants from OECD countries exert a smaller positive effect on bilateral trade flows between their countries of origin and South Korea. This is indicated by the coefficients of lnimstock when regression is performed on lntrade; the coefficient of lnimstock for OECD countries is 0.155, which is smaller than 0.166 for non-OECD countries. For OECD countries, a 1 percent increase in immigration stock will increase bilateral trade volume by 0.155% on average while a similar increase will increase import volume by 0.166% on average in non-OECD countries. In order to make group comparisons and find out if the coefficients for ln immigrant stock differ across countries of origins with
Table 4: Regressions on lnexports for low, lower middle, upper middle and high income countries
Spring 2018 Table 5: Regressions on lnimports for low, lower middle, upper middle and high income countries
Table 6: Regressions on lntrade for low, lower middle, upper middle and high income countries
income, lower middle income, upper middle income and high income. The coefficients of lnimstock is statistically significant for all groups except for the high income category. Table 5 and Table 6 show the results of OLS regression on lnimports and lntrade respectively for all the four subgroups. The coefficients of lnimstock are statistically significant for low and upper middle income group and are statistically insignificant for lower middle and high income group. Immigrants from low income countries are statistically significant in increasing exports, imports and trade while immigrants from high income countries are found to be statistically insignificant in increasing trade. This finding is consistent with White’s study that shows that immigration from comparably low income countries is the key driver in the U.S. immigration-trade linkage (White 2007). The heterogeneity in immigrant-trade relationship according to origin countries’ classification have been studied in previous literature; my empirical results provide support to the presence of heterogeneity in immigrant-trade relationship as my results indicate that immigrants from non-OECD and low income countries increase trade substantially more than other immigrants.
37
One of the limitations of this study is the lack of disaggregated data of immigrants according to income levels. Further research could be done to investigate the heterogeneity effects of immigrant networks among immigrants of different income levels. Furthermore, previous literature mainly used immigration data from developed countries such as United States, United Kingdom and Australia for research purposes. The lack of availability of immigration data on developing countries make research on those countries not feasible. Further research using data on developing countries will provide new insights into the role of immigrants in their host countries as well as heterogeneity effects in immigration-trade link.
“[I]mmigrants from non-OECD countries increase overall trade significantly more than immigrants from OECD countries.”
Conclusion
different income levels, I divided my sample into four subgroups, according to the income levels of immigrants’ countries of origin, i.e. high income, upper middle income, lower middle income, low income countries. According to the World Bank Atlas method, for current 2017 fiscal year, low-income countries are countries with GNI per-capita of $1,025 or less in 2015, lower middle-income countries are those countries with GNI per-capita between $1,026 and $4,035, upper middleincome countries those countries with GNI per-capita between $4,036 and $12,475 and high-income countries are countries with GNI per-capita of $12,476 or more. OLS regression is then performed on lnexports, lnimports and lntrade for each subgroup. Table 4 shows the results of OLS regression on lnexports for all the four subgroups, i.e. countries of origin which are classified as low
Using South Korea’s data, this research finds evidence that supports the hypothesis that immigration has a positive effect on bilateral trade flows between immigrants’ countries of origin and the host country. This paper finds that a 10 percent increase in immigrant stock will increase South Korean’s trade volume by 1.75%. The findings also indicate that immigrants have a higher positive impact on exports than on imports. This finding suggests that immigrants’ business network effects play a significant role in promoting trade flows. One of the main findings is that immigrants from non-OECD countries increase overall trade significantly more than immigrants from OECD countries. Similarly, immigrants from low-income countries are found to be statistically significant in increasing overall trade, exports and imports while immigrants from high-income countries are found to be statistically insignificant in affecting trade flows. These findings support the idea of heterogeneity effects in immigrant-trade link.
Columbia Economics Review
It is predicted that there will be gradual loosening of immigration policies over the next couple of years due to the staggeringly low birth rate in South Korea. As such, this research aims to contribute to future policy-making decisions by enhancing the understanding of immigrants’ role in promoting bilateral trade flows. The classification of immigrants in South Korea according to two subgroups based on their home countries’ OECD membership will provide fresh insights into immigrant-trade relations. The extensiveness of my dataset increases the efficiency of my estimated regression parameters, thereby resulting in more conclusive evidence of heterogeneity effect of immigrant networks in the context of immigrant-trade linkage. In sum, this paper will hopefully provide a deeper insight into immigrant-trade linkage in South Korea, which has not been researched extensively in current literature.
38
Spring 2018
Challenges for Decentralization and Federalism: Scaling up an Educational Policy for 184 municipalities in Brazil takes into account a range of important variables and thus allows him to pin down what makes such a policy successful. Moreover, the reader can infer important factors that may help them shape their views with regards to their home country.
By Luigi Caloi New York University ‘18
Introduction
T
HE municipality of Sobral has been for the past years the role model for improvement in public education in Brazil. This municipality is located in the northeast of Brazil in the poorest of the country’s regions. Yet, it has dramatically improved its score on the Brazilian Basic Education Quality Index (IDEB) for fifth grade students over the last two decades. In 2000, after a diagnostic test finding that 48% of 7 year olds were still illiterate, not even being able to form words, Sobral’s mayor set the ambitious goal to literate every student by the age of eight. The program set forward—Programa pela Alfabetização na Idade Certa (Program for Literacy in the Right Age, PAIC)—had five main pillars: (i) teacher and principals training; (ii) courseware for students and teachers; (iii) creation of goals and external evaluation; (iv) bonus for best performing schools (and financial help for the worse performing ones); (v) school “shielding” from politics (e.g. taking the power from the mayor to select school principals).
Inspired by Sobral’s success, education officials in the state of Ceara, where Sobral is located, decided to replicate the most acclaimed of Sobral’s policies, the Literacy in the Right Age Program (PAIC) in 2007. However, Ceara faced the challenge of (1) providing the same program to 184 different municipalities, each with different teachers and students from different backgrounds, which I call the demographic context effect. (2) The quality of the trainers also decreased—while Sobral was able to create a specialized school for teacher training, Ceara relied on each municipality to higher non-specialized trainers (government workers) to implement the program. Finally, Ceara had to (3) face the political challenge of integrating state and municipal level governments. Despite these challenges, PAIC had impressive results, increasing Ceará fifth graders’ test scores by approximately 0.10 standard deviations (SDs) in Portuguese and 0.18 SDs on Mathematics between 2007 and 2011 (Carnoy and Costa, 2015). In this article, I will first replicate Carnoy and Costa’s work to find the impact of PAIC on Ceara’s students test scores and then analyze the variation of PAIC’s impact per municipality to underColumbia Economics Review
stand the challenges that scale imposed to the program. This study contributes by providing evidence on which factors explained the variation of impact of PAIC in different contexts (e.g. richer and poorer schools). On the practical side, I illustrate the applicability of my findings by simulating the decision making of a policy maker deciding whether to adopt or not a version of PAIC in his state/municipality. The challenges for the scale up of PAIC for 184 different municipalities is especially important because in 2013, relying on Ceará’s success, the federal government decided to create its own national version of the PAIC in 2012—the PNAIC. The program has the identical goal of eradicating illiteracy for every child by the age of eight and it acts on very similar four main pillars.
Literature Review: Empirical Evidence on Similar Programs In this from the grams to literature program.
section, I summarize the evidence literature on similar literacy proPAIC, and the findings from the on the challenges of scaling up a
Spring 2018
Comparable Literacy Programs Early grade literacy interventions became popularly known as effective to improve educational systems, and especially literacy scores. Many teacher training programs have been implemented throughout the world, but the most closely related to PAIC was the No Child Left Behind (NCLB) in the U.S.. NCLB was applied for kindergarten through third grade in school districts in a number of U.S. states in 2003 and funded by the national government. The funds could be used by the state officials for teacher coaching, as well some other activities. The goal of NCLB, similarly to PAIC, was to have all children read at or above grade level by the end of third grade. Evaluations of the NCLB show mixed results. The national evaluation of Reading First (Gamse et al., 2008), using a regression discontinuity method, found that at the national
level the program has a large impact on teacher practices, but that did not translate into significant impact on average student reading comprehension scores in Grades 1, 2, or 3. Baker et al. (2011), on the other hand, found later that in Oregon, where implementation was stronger than in other states, NCLB had positive impacts on students’ scores. Overall, while we have overwhelming evidence that early grade interventions can have substantial long-term effects, the evidence is unclear for the closest program to PAIC—the No Child Left Behind in the U.S. Scale up Scaling up a successful small-scale program involves changes to the program. In some cases, this means a change in the providers of the program. A shift in providers reduces the average quality of the program provider, Columbia Economics Review
39
by decreasing both the ability of the provider (e.g. moving from specialists to generalist civil servants) and by sometimes hiring less motivated providers. Cameron and Shah (2017) call this change in providers the implementation agent effect. Using a randomized control trial, they evaluate the impact of a sanitation intervention—the Community Led Total Sanitation (CLTS)—in rural East Java in Indonesia, whose goal is to increase demand for sanitation by sending CLTS facilitators to villages to discuss the negative consequences of lack of sanitation. They then analyzed the scale up of the program in some villages where the local government took over the implementation from professional resource agencies and found that all of the positive impact was coming from villages where professional resource agencies implemented the program. Why did the shift in providers nullifynullified
40
Spring 2018
the benefits of the program? They argue that the demographic context of the scaled-up version of the program was unchanged, and that both the smaller and scaled up versions were simultaneously implemented, which indicates that what caused a decrease in the impact was the shift in provider from the professional resource agencies’ workers to the local government’s providers.
“Early grade literacy interventions became popularly known as to improve educational systems, and especially literacy scores. Many teacher training programs have been implemented throughout the world, but the most closely related to PAIC was the No Child Left Behind (NCLB) in the U.S.”
ers with different characteristics and backgrounds. Stein et al., in a study funded by the U.S. Department of Education, analyzed the scale up of Kindergarten Peer Assisted Learning Strategies (K-PALS), which had the goal of changing class structure for Grades 2 through 6 in the U.S. The program required teachers to pair students that were weak readers with strong readers to work through structured activities for approximately 35 minutes, three times a week, while teachers supervised them. They use several randomized field trials in schools at Nashville, Minnesota and South Texas to test for teacher fidelity of implementation (defined as how close a program implementation is from the idealization of the policy) and the impact on students’ performance. According to them, many teachers can immediately go back to their mode of teacher after the program is over. After creating fidelity scores for each teacher in the program, Stein et al. find that on the one hand the variation on the effects of the K-PALS program is highly explained by teacher fidelity, but on the other hand the only teacher characteristic that affected teacher fidelity was a teacher’s belief that students could succeed, independent of students’ attitudes and habits. As I argue later on, PAIC also faced the challenge of convincing teachers to adopt the new teaching strategies taught in PAIC.
Literature review: Conceptual Framework for Scaling up Policies In other cases, a constraint on budget can lead to reduced inputs (e.g. fewer textbooks on a school intervention), which may decrease the overall quality of the program. Kerwin and Thornton used a randomized experiment to compare two versions of a early primary literacy program in Northern Uganda—one was a full version while the second was a reducedcost version, which used government employees and less materials, to simulate a scale up. The program provided a revised curriculum, with teaching materials, support and training for teachers and parent engagement. When they analyzed the effects on the overall reading score index, the lower cost had no statistically significant effect, while the full version had a significant increase on average of 0.63 SD. Here, a shift in providers and a reduction on inputs (school material) seemed to simply vanish all of the program’s impact. The scale up of a program also brings it to a new setting context, with students and teach-
In the previous section, we have discussed the evidence presented on the literature for when scaling up a project can fail. We have seen that a shift in the provider of a service, a reduction on the inputs and materials, or a new population receiving the program can decrease the impact caused. I now discuss the theoretical framework to explain why scale up versions of successful policies fail. Davis et al. (2017) provide an economic and theoretical framework to understand the challenges of scale-up, dividing the effects between “supply” and “demand” effects. Supply effects represent changes in the way the program is provided at scale while demand effects represent changes in the type of receiver to whom the program is provided. Supply side: How heterogenous quality of workers lead to less effective scaled results Still according to Davis et al. (2017), ef-
Columbia Economics Review
fectiveness of a program must decrease with scale because governments attempt to hire the best workers for the small-scale version, which leaves out workers for the scaled version who have on average a lower quality. More formally, suppose a program is set to serve n participants at a cost of m dollars per participant. Then the program’s total impact is ∆ = p * F(H(L; w), K) where F (·) is a constant returns to scale production function which takes workers’ human capital, H(L;w), and capital, K, as inputs and p is the market price or social value of the program’s output. The average impact, ∆/n, at a cost-benefit ratio will be p * F(H(L; w), K)/(wL+rK). If we assume that human capital is inelastic, meaning that for a 1% change in L, production will increase by less than 1%, then the cost-benefit ratio will decrease and the government will have to increase wages to create the same average impact, ∆/n (Davis et al., 2017). In other words, because the average quality of the workers must decrease with scale, then the average output will decrease holding average costs constant. Another important conclusion that can be drawn made from this
“Studies have over and over shown know students’ characteristics and family background have a huge impact on school performance.”
model is that if we assume that the government does not do a good job at selecting the best workers for the low-scale program, then human capital might not be inelastic and the scale-up version of the program can have a similar performance. For the purpose of analyzing PAIC, that means that if the teacher trainers hired by the municipality of Sobral on the small scale version of the program are more qualified than the ones hired by the other municipalities within PAIC, then we would expect the scaled up version of the program (PAIC) to be less effective on average. Davis et al. (2017) provide the ideal way to isolate the demand from the supply effect from scaling. Suppose that the small scale pro-
Spring 2018 gram hires n workers, while the scaled version hires S*n workers. The key to their method is that the government form an implicit ranking of all applicants. If a government can create this list explicitly, then we can simulate the effects of the scaled up version of the program (S*n participants), even with only n participants. All we would have to do is to run a test program with a random sample of n workers from the pool of workers that the government would choose at scale S*n. This allows us to simulate changes in the supply side for scale S*n (changes in the composition of workers), while making no demand-side changes (we do not need to change the type of receiver from the program because no scaling up is necessary). While the state of Ceara did not to run such an experiment, it is still a good benchmark to understand the weaknesses of my analysis. Demand side: why different population characteristics can lead to different results Studies have over and over shown know students’ characteristics and family background have a huge impact on school performance. In one of the most influent of these studies for the Brazilian context, using data from 2003 from the System of Evaluation of Primary School (a predecessor of Prova Brasil), Naercio Filho (2007) finds that only between 10 to 30% of students’ scores variation is due to school variation, while the rest of the variation occurs within schools, among students. He finds that the best predictors of a student’s grade in Brazil are the mother’s education level, race, previous grade repetition, number of books at home and adolescent labor. Previous research has also shown that teacher experience and self-perceived efficacy can affect affect teacher fidelity of implementation, and therefore the impact of the program. Why? Gersten, Chard, and Baker (2000) argue that new and inexperienced teachers are more likely to continue use of a program than experienced teachers, because teachers in the early stages of their careers are characterized by a “survival” orientation for instructional methods, which can lead to greater openness to trying new approaches. Another characteristic linked to better adoption by teachers is teachers’ self-perceived efficacy (Dusenbury et al., 2003; Ruiz-Primo, 2005). PAIC has, therefore, the challenge of scaling a teaching program to all teachers and students of the state of Ceara, which includes underprivileged students that have historical-
ly performed worse, as shown by Filho (2007) and teachers that could vary on how much they apply the lessons learned from PAIC. Why politics can influence the impact at scale Besides the demand and supply effects, scale up could also cause a political effect due to a change in the political environment or structure in a scaled up version of the program. In PAIC, scaling up meant involving both the state of Ceara and 184 municipalities. Here, I assume that a mayor tries to maximize his chances of being re-elected, which I suppose to be a function of how the population perceives the improvement in social services and how a mayor can distribute patron-
“PAIC’s ambitious goal is to bring the literacy rate for 8 year old students to 100%. To achieve that, it has 4 channels of action: (i) teacher training; (ii) provision of books and material to students and teachers; (iii) external learning evaluation and (iv) collaboration between state and municipal governments.” age resources. In PAIC, mayors can distribute R$765.00 (around U$250.00) for the government workers to lead the training program and R$1,200.00 (around U$ 400.00) for each coordinator, which they could either choose the most efficient workers, or they could use the income for patronage. If we suppose that a mayor from an opposition party can get less credit for the improvements in education led by PAIC, then we should conclude that she has a smaller incentive to commit the resources from the state government to productive investments for the program and PAIC should have a worse performance. Before moving on to the next section, where I analyze the differences in Sobral’s small scale version program and Ceara’s PAIC Columbia Economics Review
41
large scale program and predict why PAIC could fail in certain municipalities, it will prove useful to clarify my organization of the different literatures—although in practice scaling up and replicating a policy might be very different, we can think about both problems in almost the same conceptual manner. Both policy makers that want to replicate or scale up a policy ask whether that policy will (i) work with different populations; (ii) work with different human capital (e.g. nonspecialized workers at the implementation of the program). The only challenge unique to scaling up a policy will therefore be the one of political integration between federal entities. Differences in Sobral’s and PAIC’s program that could lead to a failure in scale-up PAIC’s ambitious goal is to bring the literacy rate for 8 year old students to 100%. To achieve that, it has 4 channels of action: (i) teacher training; (ii) provision of books and material to students and teachers; (iii) external learning evaluation and (iv) collaboration between state and municipal governments. The program is celebrated as a pact between the state government and the 184 municipalities to cooperate and share the responsibilities for education. The state of Ceara created 20 Regional Organizations for Municipal Organization (CREDE), which are responsible for supervising the implementation by each municipality, sending the results in the external evaluation and even visiting each municipality every two months. CREDEs are also responsible for hiring specialized-employees to train 800 municipal implementation agents (64 hours per year), who will later be responsible for leading the training section for the 20,242 math and Portuguese teachers in Ceara (64 hours per year as well). The mayor and the municipal secretary of education are free to choose the team that will work locally on PAIC—that includes administration staff and the 800 implementation agents. Challenges of coordinating the state government and 186 municipal governments As theorized by my theoretical framework in “Why politics can influence the impact at scale,” I predict that if a mayor from a party that is not within the governor’s coalition can get less credit for the improvements in education led by PAIC, he/she should be less committed to PAIC and therefore its effect should be smaller. Mayors can select both the
42
Spring 2018
team that supervises the implementation of the program and the implementation agents, which are responsible for training the teachers. Both can be a good patronage resource— implementation agents earn R$765.00 and coordinators earn R$1,200.00. When visiting Ceara’s state Department of Education government, I asked about whether the party of the mayor affected the mayors’ commitment to PAIC, and the team responded that fortunately it didn’t. They argued that PAIC creates a direct incentive for the commitment of every mayor because the state of Ceara linked 18% of the Circulation Tax of Goods and Services (ICMS), which is transferred from the state government to the municipal governments, to the average performance of each municipality on the Brazil Exam (Prova Brasil). They also argued that there was a big effort from the team to visit every municipality and sign a commitment contract with each mayor, irrespective of each mayor’s party. Nonetheless, the team from the Department of Education from the state of Ceará acknowledged that non-committed mayors were (i) not providing the necessary support for the program and (ii) committed corrupted practices when hiring the implementing agent (responsible for leading the training program to the teachers), such as hiring family members. One of the leaders of the program during its initial phase mentioned trying to implement an objective hiring process, but her proposal was rejected by the mayors. Whether these practices were related to the mayor’s party, however, they were not sure. Municipal officials, on the other hand, seemed to blame the mayor’s party for a lack of commitment. During a visit to the Board of Education from Fortaleza (Ceará’s capital), the team responsible for implementing PAIC complained that the Board only started giving them strong and real support after the 2008 election, when a mayor from Ceara governor’s party won the election, replacing the previous mayor from an opposition party. According to them, “politics” was blocking the necessary support. In other words, if we assume that a mayor from the opposition party can get less credit for the improvements in education led by PAIC, we should conclude that she has a bigger incentive to use the resources from the state government as patronage resources and PAIC should have a worse performance. On the other hand, PAIC’s officials have created financial incentives through the Circulation
“...the team from the Department of Education from the state of Ceará acknowledged that non-committed mayors were (i) not providing the necessary support for the program and (ii) committed corrupted practices when hiring the implementing agent...” Tax of Goods and Services (ICMS), which could lead to every mayor being incentivized to commit to PAIC.
Columbia Economics Review
Challenges due to different characteristics of target populations when operating at scale Different target populations could lead to different outcomes for PAIC. By reaching out to 186 municipalities, PNAIC faces the challenge of providing the same program for teachers and students from very different backgrounds. Following Stein et al., I predict that new teachers (5 years or less of experience) will adopt the program teaching strategies with more fidelity and will therefore have a higher impact. Moreover, teachers with college degrees should be more prepared and gain less from the program. I also predict that teachers that have a higher trust in their student’s performance will have a bigger impact. Finally, I test whether the program helps to decrease the gap in grades of students with and privileged and underprivileged backgrounds (I test for parent’s education, race and students’ previous scores).
Spring 2018 Challenges due to lower quality implementation workers As argued in the supply side of how heterogenous quality of workers lead to less effective scaled results, the average outcome should decrease holding average costs constant due to inelastic human capital and because the average quality of the workers must decrease with scale. This is especially true in PAIC, where the structure and the scale of the program limits the ability of Ceara to hire the best workers. While Sobral has its own School for Permanent Continuing Education, Ceara hires specialized teachers to train 800 municipal implementation agents, who are selected by the municipal government to implement the program. PAIC also invests less per teacher trainer—while the average salary for the trainers is R$1,200.00 (around U$400) in Sobral and teachers receive 96 hours per year of training, the average salary is R$765.00 (around U$250.00) for PAIC and teachers receive 64 hours of training per year.
Data The data for this study are drawn from the National Assessment of Educational Achievement, which is publicly known as Prova Brasil. The Prova Brasil is a biannual school-level assessment of fifth and ninth graders created in 2005 to test all students (not a sample of schools) in public schools with a minimum of 20 students per class, although data is available as student-level microdata only for 2007, 2009, and 2011. This rich data set provides not only information about students’ mathematics and reading (Portuguese) exam scores, but also about the socio-economic background of students, teachers, and principals. The exam was structured so we can compare the scores over time and grades. We are able to follow the performance of states, municipalities, schools and grade cohorts but we can’t follow the performance of specific students because we do not have matchable students identifiers across years. For 2011, the data set contained information for 5,201,730 students from 55,924 schools in 27 states and in 2007 we have data for 4,109,265 students in 48,704 schools. For the data cleaning process, I followed the same methods that Carnoy and Costa (2015) used. To keep most schools that were affected by the PAIC in the data set, I keep only the urban municipal schools in the morning and afternoon shifts (I exclude rural schools, state
and federal administered schools and schools that have night shifts). I also restricted our sample to schools that participated in both 2007 and 2011 exams in which we could identify the teacher’s subject (Portuguese or Mathematics). The final school-level panel contained data for 275,072 students from 1,002 schools from five different states— Ceara, Piaui, Pernambuco, Paraiba and Rio Grande do Norte. Table 2 illustrates the means of outcome variables, important covariates in this sample and important demographic variables. My dependent variables are the math and portu-
“By reaching out to 186 municipalities, PNAIC faces the challenge of providing the same program for teachers and students from very different backgrounds.” guese scores from individual students on the Prova Brasil. The scale of this variable ranges from 0 to 500 in the fifth and ninth grades. For the analysis done later on, I normalized the grades with mean equal 0 and standard deviation 1 and used z scores, but in Table 1 I still illustrate the data range from 0 to 500. Following Carnoy’s choice, I compared Ceara’s students grade improvements to students’ improvements in those four bordering states due to their geographic proximity, socio-economic similarity and historical similar grade results in education. In contrast, south and southeast states are much richer and have consistently performed better in Prova Brasil and other tests. Table 1 presents the averages for important variables on student characteristics, teacher characteristics and grades, for Ceara and the four bordering states.
Methodology Impact analysis Considering that the Prova Brasil is given every two years for fifth and ninth graders and that PAIC was implemented in many schools in 2007 and reached all schools in 2008, the 2011 test was the first one that measured students that had gone through the policy for Columbia Economics Review
43
two or three years. To measure the impact of PAIC on the grades of fifth graders in Ceara, I first replicate Carnoy’s methodology. Carnoy takes advantage of the fact that the policy did not affect students from bordering states, nor from older cohorts to construct a Differencein-Difference-in-Difference (DDD) model, in which we compare (i) Ceará’s fifth graders’ test scores (who were affected by PAIC) to bordering states fifth graders’ test scores and (ii) Ceará 5th graders’ test scores to Ceará 9th graders’ test scores (who haven’t been through the effects of the policy). A regular Difference-in-Difference (DD) model compares the improvement in scores of a treatment group (here the fifth graders in Ceara) and a control group (here the fifth graders in bordering states). The main assumption for inferring causality is that the change in grades for bordering states’ fifth graders between 2007 and 2011 is a good proxy for what the change in grades for Ceara state’s fifth graders would have been if Ceara had not created PAIC. The DDD model relaxes this assumption by creating one more control group that also did not go through the policy (here the 9th graders in Ceara), and nets out their improvement as well, which controls for common policies and improvements in Ceara to both fifth and ninth grades. The DDD model is the difference between the two, taking the following conditional expectation function (CEF): (1) Es, t, g, D, Xit = s + t + g + µst + gt + sg + stg + Xistg’B Where represents state-level fixed effects, is a time trend, is a dummy for the grade effect, µ represents state-specific time effects that are common across grade groups, is time-varying grade effects, is state-specific grade effects, and is our coefficient of interest—the effect of the PAIC on the achievement of students who attended fifth grade in 2011. For that, we’ll use data from the Prova Brasil, which has abundant data for control on students and teachers and is administered to all students (not a sample of schools) in public schools with a minimum of 20 students per class. The coefficient of interest, , is defined as follows: (2) DDD = DD1 – DD2 = (∆yce,5g - ∆ybo,5g) – (∆yce,9g - ∆ybo,9g) = ,stg
44
Spring 2018
While DD1 nets out the constant improvements throughout time of 5th graders on both Ceara and bordering states, DD2 nets out the constant improvement for 9th graders in Ceara. Equation 2 allows us to identify the effect of PAIC on the grades of fifth year students in Ceará, by taking the difference between the variation on the grades of fifth year students in Ceará (∆yce,5g), and the variation across bordering states (∆ybo,5g), and the difference between variation on the grades of 9th grade students in Ceara (∆yce,9g) and the variation on the grades of 9th grade students in bordering states (∆ybo,9g). The ∆ represents difference in time, so we are already taking into account time fixed effect from each cohort both in Ceará and the bordering states. This can be estimated by using the following regression framework: (3) yit = s + 2011it + 5thgradeit + Cearáit · 2011it + 5thgradeit · 2011it + Cearáit · 5thgradeit (3) + Cearáit · 5thgradeit · 2011it + it Where s is the state-level fixed effect, 2011 is a dummy variable for the 2011 year of the Prova Brasil; 5thgrade is a dummy variables for the fifth-grade students in 2007 and 2011 (within-state comparison); and Ceará is a dummy for Ceará state students differentiated from students in border states (cross-state comparison) Step 2: Analyze the variability: After replicating Carnoy’s impact methodology, I analyze whether the four factors mentioned in section 4 affect the variation on outcome for PAIC in the municipalities. Hypothesis 1: Mayors from the governor’s coalition will be more committed to PAIC and therefore have a positive impact on the program’s outcome. To test for the first hypothesis, I will create a new difference method to analyze the interaction of the PAIC intervention and a dummy variable, named coalition, that represents whether a municipality mayor’s party is from the coalition from the state governors’ party. For this, I will need to change the DDD model to allow for a new difference, with this new difference at the municipal level. Here we are trying to estimate the difference of the impact of PAIC on grade from coalition municipalities and opposition municipalities. I model this difference as ∆2COALITION = (∆COALITIO-
yt - ∆COALITIONy(t-1)) = (yCOALITION(t) - ynoCOALITION(t)) (yCOALITION(t-1) - ynoCOALITION(t-1)). In other words, we are looking at the change in the gap between the average grade of coalition municipalities and opposition municipalities. The model is the following: N
(4) 4D = DD1 – DD2 = (∆2COALITIONyCE,5g – ∆2COALITIONyBO,5g) – (∆2COALITIONyCE,9g – ∆2COALITIONyBO,9g) = πtsgc. As usual, y represents the scores of the students in the Prova Brasil, where CE,5g indicates students from the 5h grade in the state of Ceara and BO,5g indicates students from the bordering states in 5th grade. Similarly, CE,9g represents students from Ceara in 9th grade, and BO,9g indicates students from bordering states in 9th grade. This model allows us to, besides netting out the constant effects across states, and across grades like the previous model, to also estimate πtsgc, which is the interaction of being in a “coalition” municipality and attending the fifth grade in Ceara state in 2011 (coalition and treatment variable). This can be estimated by the following regression: (5) yit = s + 2011it + 5thgradeit + Cearáit · 2011it · Coalitionm + 5thgradeit · 2011it · Coalitionm + Cearáit · 5thgradeit · Coalitionm + Cearáit · 5thgradeit · 2011it + πCearáit · 2011it · 5thgradeit · Coalitionm + it Where Coalitionm is a dummy for whether the mayor’s party is part of the governor’s party and m shows that this variable is in the municipality level. Our variable of interest is π, which we interpret as the difference of the impact of PAIC in coalition and opposition municipalities. Hypothesis 2: School demographic context: PAIC had a larger effect on schools that had (i) higher percentage of white students; (ii) were already performing worse and (iii) had a higher percentage of parents with university degree Here I am going to run the same 4D regression, but this time the treatment effect (2011it · Cearáit · 5thgrade) will interact with a set of school demographic variables—a dummy Columbia Economics Review
for whether a school’s percentage of white students is above the median for, a dummy that indicates whether a school’s performance is above the median of its state and finally a dummy for whether the school’s percentage of students that have at least one parent with a university degree is above the median. (6) yit = s + 2011it + 5thgradeit + Cearáit · 2011it + 5thgradeit · 2011it · SchoolDemographicj + Cearáit · 5thgradeit · SchoolDemographicj + Cearáit · 5thgradeit · 2011it + π2011it · Cearáit · 5thgradeit · SchoolDemographicj + it Hypothesis 3: Student demographic context: PAIC had a larger effect on (i) non-white students; (ii) students that were performing worse before the impact and (iii) students that did not have parents with university degree. Here I test for average differences on the impact of PAIC on students with different characteristics. Again, we run our 4D model, but this time the treatment effect (2011it · Cearáit · 5thgrade) will interact with a set of student demographic variables—a dummy that indicates if a student is white (whitei), a dummy that indicates if a student had grades above the median of his state (besti) and a dummy that indicates if at least one of the student’s parents have a college degree (paren univi). Notice the difference from hypothesis two and three. While on the former we are testing for the difference on PAIC effects for different school environments (e.g. percentage of white students in a school), on the latter we are testing for the difference on PAIC effects for different students, independent of their school (e.g. the impact for white student compared to the impact for non-white students). The model is the following: (7) yit = s + 2011it + 5thgradeit + Cearáit · 2011it + 5thgradeit · 2011it · SchoolDemographicit + Cearáit · 5thgradeit · SchoolDemographicit + Cearáit · 5thgradeit · 2011it + π2011it · Cearáit · 5thgradeit · SchoolDemographicj + it Where, again, teacher demographics represent the dummies for students’ (i) race; (ii) previously better performing (iii) parents’
Spring 2018
45
Hypothesis 5: the quality of implementation workers: the impact of PAIC will be positively correlated to teachers’ satisfaction with the program Here, we are trying to test for if the quality of the workers implementing the teacher training affected the students’ grades, as predicted in section 4.3. Unfortunately, we do not have data on the implementation workers for either the small scale version in Sobral or the large scale version in Ceara. However, we do have data on teachers’ satisfaction with PAIC, which I use as a proxy for the quality of the teacher training program. Unfortunately, we can’t run a 4D regression like in the previous cases, because teachers in bordering states and from older cohorts obviously didn’t evaluate PAIC, so we don’t have a control group. Thus, I simply regress improvement on average score for schools in Ceara from 2007 to 2011 on average teachers’ satisfaction of PAIC per school, controlling for other schools and teachers’ variables. I run this regression on the school level and not the student level, like the previous ones, because we do not have student identifiers, so we can not follow their improvements in scores throughout the years. (9)
university degree. Hypothesis 4: Teacher demographic context: PAIC had a higher impact on students with teachers that (i) had five years of experience or less; (ii) had a high confidence in their students’ performance and (iii) did not have a university degree. Prova Brasil reports on each teachers year of experience, so we use it to construct a dummy for students that have five years of experience or less. Prova Brasil also asks teachers the proportion of their students they think will finish primary school. I create a dummy for all teachers that answered “most students” or “all students” as a proxy for confidence in students’ performance. I also create a dummy for teachers that had a university degree.
I then create a 4D model, similar to the previous ones, that test the difference in the impact of PAIC for these three dummies: (8) yit = s + 2011it + 5thgradeit + Cearáit · 2011it + 5thgradeit · 2011it · TeacherDemographicit + Cearáit · 5thgradeit · TeacherDemographicit + Cearáit · 5thgradeit · 2011it + π2011it · Cearáit · 5thgradeit · TeacherDemographicj + it Where, again, teacher demographics represent the dummies for teachers’ (i) five years of experience or less; (ii) confidence on students and (iii) university degree. Columbia Economics Review
(ys(2011) - ys(2007)) = +
0 2
+ 12011st Satisfactionst +
it
Where 2011 is a dummy for the year and satisfaction ordinal variable for the teacher’s satisfaction with PAIC in school s and time t. We should not interpret regression (9) as causal—it simply describes whether improvement on grades for schools in Ceara was higher for schools where teachers were more satisfied with PAIC, controlling for other important variables. We should be careful when interpreting all the regressions, but especially this one.
Findings In the introduction, I argued that the improvement on Ceara students’ average scores in the on the Brazilian Basic Education Quality Index (IDEB) for fifth grade students led the politicians to conclude that PAIC was successful at improving students’ performance. Using the 3D methodology described in the previous section to estimate a causal relationship between PAIC and the improvement on average scores, I found that PAIC improved average student math scores by 0.16 stand-
46
Spring 2018 ing that a mayor from an opposition party to the governor’s party can get less credit for the improvements in education led by PAIC, it should be less committed to the program and therefore PAIC should have a small effect. PAIC officials, on the other hand, argued that the financial incentive created by the Circulation Tax of Goods and Services (ICMS) would incentivize every mayor to commit to PAIC. My results indicate that the incentive created by the ICMS was indeed enough to incentivize both mayors from the coalition and the opposition. Table 3 shows the result for the 4D model where the treatment group (2011 × Ceara × Fifth grade) interacts with the coalition dummy variable on the last row. The effect was not statistically significant for neither Portuguese nor Mathematics. Table 4 shows the results for the 4D regression, which interacts the treatment group (2011 × Ceara × Fifth grade) and dummies for teacher characteristic variables: (i) university degree; (ii) expectation of students and (iii) years of experience. The results reinforce the literature’s argument that unexperienced teachers should have a higher impact because they are more open to new teaching strategies—the impact was 0.065 SDs higher in Mathematics and 0.107 SDs in Portuguese for unexperienced teachers (5 years or less) compared to experienced teachers. The estimates already control for student and teacher characteristics and can be read on columns 3 and 6 in the last row.
ard deviations and Portuguese scores by 0.12 standard deviations (SD). My results are very similar to Carnoy’s. Table 2 shows the result for the DDD model, which includes state fixed effects, student covariates and teacher covariates. PAIC’s effect is estimated by the interaction of the dummies Ceara, fifth grade and 2011 in the last row. Columns 3 and 6 are controlling for student and teacher variables, but the effect stays fairly constant (0.11 - 0.10 SDs for Por-
tuguese and 0.15-0.16 SDs for Mathematics). The estimated impact is also statistically significant at the 99% confidence level for all regressions. One could also argue that this is a lower bound for the estimate considering that some of the schools were not exposed to the full 3 years of the program. I now describe the findings for the second step, where I analyzed the variability of PAIC’s impact among the 184 municipalities of Ceara. I have hypothesized that, assumColumbia Economics Review
“The results reinforce the literature’s argument that unexperienced teachers should have a higher impact because they are more open to new teaching strategies.” On the other hand, despite the literature’s prediction that the impact should be bigger for teachers with higher expectations for their students, I found that PAIC’s impact was 0.162 SDs lower in Mathematics and 0.155 SDs in Portuguese for teachers with high expectations for their students (columns 2 and
Spring 2018
5). As to university degree, the results points that the effect was 0.132 SDs in Mathematics and 0.092 SDs lower in Portuguese for teachers that had a university degree (columns 1 and 4). There are many possible alternatives to interpret this result: the first is that PAIC is a means to correct the low formal preparation in universities that teachers receive in Brazil and therefore has a higher causal effect for teachers that do not have an university degree. However, teachers with low university degrees could also be correlated to other factors that led to the improvement of grades in Ceara, causing a bias. One of the concerns in the literature is that schools within an environment less conductive to learning can win less from a policy outcome. Yet, one of PAIC’s goal was to reduce inequality between schools, as it gave special attention to worse performing schools and
constantly visited municipalities that were performing worse. I now analyze the variability on PAIC’s impact for schools characteristics. Table 5 shows the result for the 4D model where the treatment group (2011 × Ceara × Fifth grade) interacts with school demographic dummy variables. Better performing schools in 2007 had a much lower impact from PAIC than lower performing schools for both Portuguese and Mathematics—0.427 SDs for both subjects (columns 2 and 5). The impact was also much lower in both subjects for schools where the percentage of parents with university degree was higher than the median—0.253 SDs for Portuguese and 0.273 SDs for Mathematics (columns 3 and 6). This seems to indicate that PAIC had the benefit of reducing the inequality among schools, besides improving average scores. Surprisingly, the estimate for PAIC’s impact Columbia Economics Review
47
was higher for schools where the percentage of white students was above the median for both Portuguese (0.016 SDs) and Mathematics (0.060 SDs), but it is only statistically significant for Mathematics. As discussed in the demand side of why different population characteristics can lead to different results, students’ background and characteristics largely affect students’ performance and therefore create inequality from an early point. On the other hand, investments on early age education programs are seen as being effective for generating long term improvement for students, and could therefore help to reduce the inequality among them. I now analyze the PAIC impact for students with different characteristics. Table 6 shows the result for the 4D model where the treatment group (2011 × Ceara × Fifth grade) interacts with student demographic dummy variables. First, the estimate for PAIC’s impact for students that were performed above the median in 2007 (2011 × Ceara × Fifth grade × Best) was much smaller than for the ones that were performing worse—0.956 SDs lower in Portuguese and 1.224 SDs lower in Mathematics (columns 2 and 5). Both results were statistically significant at the 0.1%. The difference was less clear for white and nonwhite students, with the effect being low in Portuguese for white students and not statistically significant for Mathematics. The comparison for the impact between students that had parents with university degree and students that didn’t reinforce PAIC’s ability to reduce the inequality between advantaged and disadvantaged students. I estimate that PAIC had an impact of 0.081 SDs lower in Portuguese and 0.123 SDs lower in Mathematics for students whose parents had a university degree. Last, I regressed the change on average grades for schools in Ceara on the teacher’s satisfaction of PAIC, controlling for other important student and teacher covariates. The estimate was 0.101 SDs and statistically significant at the 0.1% level. I have omitted the output table due to the simplicity of the regression. We should not read this result as a causal link, but this positive relationship reinforces that where the implementation workers leading the teacher training sessions were more skilled and prepared, PAIC’s effect was larger.
48
Spring 2018 However, previous evidence shows that the same policy can have smaller effects when scaled or implemented on new sites, due to three main theoretical reasons: (i) providing the same program populations with different characteristics (schools context, teachers and students); (ii) the quality of the implementation workers (who led teacher training sessions) should decrease with scale and (iii) integrating municipal and state level entities could lead to mayors from the governor’s opposition parties to commit less to PAIC and therefore decrease PAIC’s impact. Nonetheless, PAIC was able to improve students’ scores on average 0.11 SDs in Portuguese and 0.16 SDs in Mathematics. When
“...there was no statistically significant difference between the impact on municipalities where the government was from the governor’s coalition parties and on municipalities where she was from an opposition party.”
Conclusion With the decentralization of Brazilian education system, leaving out the administration
of primary school to the 5570 municipalities, policy makers hope that innovative and successful education policies in one municipality can be replicated or scaled to many others. Columbia Economics Review
compared to the findings of other programs in the literature, the effects are larger than the ones found in programs that are solely focused on improving teaching (Carrasco, 2014). Especially if we compare it to the closest program in the U.S.—the No Child Left Behind (NCLB)—which found no impact on the national level (Gamse et al., 2008). Against this backdrop and assuming that the 3D model gives us the impact of PAIC, I have analyzed how PAIC’s impact varied across the municipalities and schools of Ceara to understand which of the three factors explained the difference in effects. First, I found that there was no statistically significant difference between the impact on municipalities where the government was from the governor’s coalition parties and on municipalities where she was from an opposition party. This indicates that PAIC’s financial incentives, along with the state government officials attempt to create equal incentives for every mayor to commit to
Spring 2018
the program was successful. Will PNAIC (the federal scale up of PAIC) also create the same incentives for every mayor of the 5570 municipalities of Brazil? Perhaps. However, the federal government has not created a financial incentive for municipalities like the state government of Ceara did with PAIC, and federal government officials are not able to give as much attention to municipalities as the PAIC did, by constantly visiting municipalities that were underperforming. Second, for the challenge of providing the
same program to different populations with different characteristics, we found that PAIC had a higher impact for (i) unexperienced teachers; (ii) teachers with low expectations for their students and (iii) teachers without a university degree. This findings are important because they indicate that PAIC could be a good alternative for providing training for those teachers that were not formally trained. Moreover, they reinforce the literatures opinion that a teacher training program can be important to support teachers that are feeling Columbia Economics Review
49
discouraged by their school environments. For the school context, I found that PAIC’s impact was larger for schools that were performing below the median in 2007, and schools with a percentage of parents with university degrees below the median. This indicates that PAIC’s officials attempt to reduce inequality among schools have worked. I found very similar results when analyzing the impact on students, independent of their schools—the impact was higher for students performing below the median in 2007 and for students whose parents did not have a university degree. For the quality of the implementation workers leading the teacher training program, I found that the improvement in grades in the state of Ceara was correlated to teacher satisfaction with PAIC, which indicates that municipalities that hired better implementation workers to lead the teacher training sessions, had a higher impact on grades. Finally, these findings have a direct impact of what we can expect of PNAIC’s results (the scale-up of PAIC to all other 5384 municipalities of Brazil). For instance, we can extrapolate from PAIC’s findings and expect that the effect will be larger in municipalities that have a smaller percentage of teachers with university degree. This break down of PAIC’s impact for different student, teacher, and school populations is especially important in Brazil, where the administration of the primary school system is mainly left to the municipal governments. In other words, while an impact analysis on average scores can be important for us to think about the impact on a national scale, policy makers are interested on how PAIC—and other policies—will affect their own education levels, considering the social context of their own schools, teachers and students. Of course, most of my break down of the impact is descriptive, and unfortunately we can’t draw strong causal links between the social context variables and the impact of PAIC due to how the program was run. However, I hope that it has become clear throughout the text that if we ought to predict the impact of the scale up or replication of an education policy, we ought to go beyond an impact analysis of average scores and understand how it varies depending on the context applied.
50
Spring 2018
The Wage Gap Starts at the Kitchen Table This semester, we chose to highlight another piece from Columbia Economic Review's website, which features climate. – M.S.
—
By Hallie Gruder
Columbia University ‘20
Almost everyone is aware of the gender wage gap: the average American woman earns 80% the hourly wage of the average American man.[1] Most debates on the wage gap split into two sides: either the gap is merely a result of women’s preferences for lower-paying work, or it is due entirely to workplace discrimination against women. According to the former explanation, nothing needs to be done about the wage gap at all; according to the latter, the only possible solution is a revolution in social attitudes towards women. But what if this debate is overlooking causes that are neither direct discrimination nor women’s freely made choices? Harvard labor economist Claudia Goldin suggests one such overlooked cause: a penalty that she argues jobs with long, inflexible hours uniquely place on women.[2] Why would jobs with long, inflexible hours result in women being paid a lower wage? Part of the answer lies in the household, rather than in the job market. Women are still pressured to be their children’s default caretakers, and raising children takes up a lot of time. According to the Bureau of Labor Statistics, the average woman spends .6 hours a day on childcare while the average man only spends 0.3.[3] This imbalance is not limited to families with children. Women spend an average of 2.3 hours in non-childcare household duties while men spend an average of 1.4.[4] Overall, women spend almost twice as much on their time on at-home labor than men, time that can therefore no longer be used for paid labor.[5]
Obviously, since women have less time available for paid work, all else being equal, women’s yearly salaries will be lower than men’s. But the gender wage gap is more than just a difference in men and women’s yearly wages; it’s a difference in their hourly wages, too. Women are not poorer than men merely because we work fewer hours of (paid) labor a year, but also because we earn less each hour. So how do women’s shorter work hours lead to lower per-hour pay? This is the key question Goldin’s theory aims to address. What Gol-
“Why would jobs with long, inflexible hours result in women being paid a lower wage? Part of the answer lies in the household, rather than in the job market. [...] the average woman spends .6 hours a day on childcare while the average man only spends 0.3.” Columbia Economics Review
din hypothesizes is that employers value employees who are willing to work rigid and long hours. Therefore, people who are able to work these hours are more likely to get raises and promotions. Men disproportionately benefit from these raises and promotions because they have more time available to work long hours. Is there empirical evidence for this claim? Two findings from Goldin’s 2014 address to the American Economic Association might help. The first finding made use of the fact that different jobs have different gender wage gaps. There was a strong positive association between fields with rigid and long hours, such as legal and business, and larger gender wage gap. In other words, empirically, jobs known to require long hours have more unequal hourly pay.[6] The second bit of evidence makes use of the fact that certain careers reward people for working long hours. In these fields, workers who put in long hours get both higher yearly incomes and higher hourly wages. So if one were to graph hours worked against incomes, there would be a very high slope, as increasing hours by a certain percent leads to a larger percent increase in income.Goldin plotted these slopes against gender wage gaps for different fields. The result was that fields with slopes with respect to hours worked also had higher gender wage gaps.[7] In other words, the more employers rewarded long hours, the less women flourished. Goldin’s findings suggest a contributor to the wage gap other than the usual suspects of direct
Spring 2018
hiring discrimination or a female preference for lower-paying jobs. Quite simply, the way work is structured favors men, and this privileging of men is unrelated to any employer preference to hire men. Why does the structure of work favor men? Because women disproportionately perform unpaid labor in the household, which prevents them from performing paid labor in the workplace. One might object that the wage gap is is not a problem because, as Goldin’s findings suggest, it is mostly explained by different ways men and women balance work and family. These choices were not made at gunpoint, and may reflect preferences that on average differ across genders. But women — and men — make career choices in the presence of social pressures. And there is evidence that, as a result, their choices are not optimal. For instance, according to a 2013 Pew Research poll, 46% of men say they spend too little time with their children — twice as many as women. There is also evidence that men face more pressure to work
long hours that women. For example, a 2003 study on lawyers’ pay found a higher penalty to men than to women for cutting back hours or taking a year off. So the wage gap may not solely be a measure of hiring discrimination. It nonetheless represents struggles playing out across millions of American
“But women — and men — make career choices in the presence of social pressures. And there is evidence that, as a result, their choices are not optimal.” Columbia Economics Review
51
families’ kitchen tables to balance family obligations and work obligations fairly, in the face of social pressures that affect women and men. [1] Carpenter, Julia. “Why men need to believe in the wage gap.” CNN Online, 20 February 2018. Accessed 19 March 2018 from http://money.cnn. com/2018/02/20/pf/men-wage-gap/index.html [2] Goldin, Claudia. “A Grand Gender Convergence: Its Last Chapter.” [3] “American Time Use Survey.” United States Bureau of Labor Statistics, 20 December 2016. [4] Ibid [5] Ibid [6] Goldin, Claudia. “A Grand Gender Convergence: Its Last Chapter.” [7] Goldin, Claudia. “A Grand Gender Convergence: Its Last Chapter.”
52
Spring 2018
content
Columbia Economics | Program for Economic Research Columbia Economics Review