170
C H A P T ER 4
T H E L O NG S HA D O W O F I N F O R MA L I T Y
The main variable of interest is the cumulative change of output (or employment) informality during the sampling windows. Employment informality is proxied by selfemployment in percent of total employment, whereas output informality is measured by DGE-based estimates on informal output in percent of official GDP. Additional control variables include initial poverty/inequality levels, which are measured at the start of the sampling windows, to capture persistence in poverty/inequality outcomes; initial levels of informality; cumulative GDP per capita growth during the sampling window; a constant; country and time fixed effects; and squared initial informality to control for the possible nonlinear relationship between informality and poverty.
ANNEX 4C Bayesian model averaging approach Model uncertainty is a common issue in regressions that investigate the correlates of informality. Past theoretical models and empirical studies have identified many potential drivers and implications of informality, ranging from social and economic factors underlying underdevelopment to institutional conditions (Schneider, Buehn, and Montenegro 2010; World Bank 2019). The BMA approach can address model uncertainty formally—by recognizing that the identity of the true model is unknown and that it may be preferable to combine evidence from many different models. Here the BMA model is used to show the potential correlates of output informality in EMDEs. A hyper-g prior is used for each coefficient, following Feldkircher and Zeugner (2012), which may achieve greater robustness than the priors used in the earlier literature. Priors on the inclusion probabilities are discussed below. Grouping variables. Multiple variables can represent the same broad concepts. For example, both the share of population with primary schooling and above and the share of population with secondary schooling and above can proxy for the quality of human capital in that country. BMA approaches should be designed to take this into account (Durlauf, Kourtellos, and Tan 2008; Ghosh and Ghattas 2015). In the analysis underlying this chapter, variables that represent common concepts are grouped together following Dieppe (2020) and Durlauf, Kourtellos, and Tan (2008). As in their work, a group is deemed relevant if the posterior probability of including at least one variable from the group exceeds the prior inclusion probability. To account for the dependency within groups, the prior inclusion probability of each variable is defined as follows: i j
m = 1 − (1 − p j )
1 kj
where m ij , p j , and kj are the prior inclusion probability of variable i in the group j, the probability of including at least one variable from the group j, and the number of i variables in group j, respectively. m j is set so that the prior probability of including at least one variable out of the kj variables in the group is equal to pj . The quantity pj is set to 0.5 for all j, so there is no specific prior knowledge on the probability of a group’s