met Special Edition Ă&#x;ETA 2013
Modelling individual and collective choice behaviour in social networks Portfolio Rebalancing with Transaction Costs Modeling Credit Risk: An Asymmetric Jump-Diffusion Approach An intermodal container network model with flexible due times and the possibility of using subcontracted transport Multivariate Linear Mixed Modeling to Predict BMR
Ă&#x;ETA Special 2013
Contents 4
10
Modelling individual and collective choice behaviour in social networks: An approach combining a nested conditional logit model with latent classes and an Agent Based Model Susanne Koster
Portfolio Rebalancing with Transaction Costs Pauline Vermeer
16
Modeling Credit Risk: An Asymmetric Jump-Diffusion Approach
22
An intermodal container network model with flexible due times and the possibility of using subcontracted transport
Pascal L.J. Wissink
Bart van Riessen
28
Multivariate Linear Mixed Modeling to Predict BMR Els Kinable
Index of Advertisers Blue Sky Group 3 Study Store 33 Econometrie.com 39 Veneficus Back Cover
Subscription
Colophon
Higher year members of the Econometrisch Dispuut and its contributors are
Medium Econometrische Toepassingen (MET) is the scientific journal of the
automatically subscribed to the Medium Econometrische Toepassingen. A
Econometrisch Dispuut (ED), a faculty association for students of the Erasmus
stand alone subscription costs
University Rotterdam. Website: www.ectrie.nl/met
persons and
12 per year for students,
12 for private
24 for companies and institutions. The subscription is automati-
cally continued until cancellation. Unsubscribe by sending an email to the address
Editorial Board: Matthijs Aantjes, Marijn Waltman and Joris Blokland | Design:
of the editorial board.
Haveka, de grafische partner, +31 - (0)78 - 691 23 23 | Printer: Nuance Print,
ISSN: 1389-9244
+31 - (0)10 - 592 33 62 | Circulation: 600 copies
Š2014 - No portion of the content may be directly or indirectly copied, pub-
Address Editorial Board | Erasmus University Rotterdam | Medium Econome-
lished, reproduced, modified, displayed, sold, transmitted, rewritten for publication
trische Toepassingen | Room H11-02 | P.O. Box 1738 | 3000 DR Rotterdam |
or redistributed in any medium without permission of the editorial board.
The Netherlands | met@ectrie.nl | Acquisition: Max Schotsman | +31 - (0)10 - 408 14 39
MET | Volume 20 | Ă&#x;ETA Special | 2013
1
Special Edition
Dear reader,
IT, finance and marketing processes.
Since the first edition of the Best Econometric Thesis Award (βETA) last year was such a success, the Econometrisch Dispuut decided to continue this prestigious event. For the second edition this year, we also released a MET-publication that is especially written for the βETA 2013.
In this MET you will find adjusted versions of the theses written by the nominees. So are you a master or bachelor student who likes to know how a good master thesis has to be written? Read the theses, because before you know it, you will have to write one yourself! The winner and the nominees will not only have a publication in our academic magazine MET, but also be awarded with a monetary price during the ceremony.
The βETA is an award organised for our master students in Econometrics & Management Science, who worked incredibly hard on their theses. We want to reward the five students with the best master theses. After all, your master thesis is the icing on the cake and that is why we are honoured to hand out this award. For a chance to win this award, the student has to apply for the βETA at their master thesis coordinator from the Econometric Institute of the Erasmus University Rotterdam by handing in their thesis, a summary of the thesis and a motivation letter explaining why he or she should win the award. As soon as each master thesis coordinator has received all theses submitted by the students, they will nominate the best master thesis of their specialisation. The master thesis coordinators will nominate these best theses on the following criteria: innovation, reproducibility, added scientific contribution, potential for publication and entrepreneurship potential. I proudly present the nominees of the βETA 2013: Els Kinable – Econometrics Bart van Riessen – Operations Research and Quantitative Logistics Susanne Koster – Quantitative Marketing Pauline Vermeer – Quantitative Finance Pascal Wissink – Quantitative Finance Eventually our exclusive jury will announce the winner of the five nominees during the award ceremony. This jury consists of top members of the Econometric Institute together with a representative of our sponsor Veneficus. Veneficus is specialist in transforming complex data-analyses to clear, visual output. They obtain the very best from your numbers and furthermore provide an improved integration of 2
Interested in the βETA? Don’t hesitate to join the ceremony, which will take place on Friday the 17th January. The ceremony of the βETA is going to be a great experience; speeches will be given by the nominees, prizes will be handed out and there will be a celebratory drink afterwards. If you are a master student, you can also think about subscribing for next year and maybe you will be the winner of the βETA 2014! Please enjoy reading this special edition of the MET and I hope to see you at the ceremony, Isabelle Indeweij Gerlings Educational Officer Special thanks to: The master thesis coordinators Econometric Institute: Prof. dr. Richard Paap - Econometrics Dr. Wilco van den Heuvel - Operations Research and Quantitative Logistics Dr. Erik Kole - Quantitative Finance Prof. dr. Dennis Fok - Quantitative Marketing The jury: Prof. dr. Dick van Dijk Dr. Christiaan Heij Prof. dr. Albert Wagelmans Mr. Joost van der Zon Venificus: Mr. Robbert Bos Mr. Joost van der Zon MET | Volume 20 | ßETA Special | 2013
R
Er is steeds meer aandacht voor dekkingsgraden van pensioenfondsen. Blue Sky Group staat bekend als een pensioenuitvoerder waar het balansmanagement van pensioenfondsen en de ALM-studies van hoog niveau zijn. Dit is te danken aan econometristen van onze afdeling Beleggingen en hun nauwe samenwerking met actuarissen van de afdeling Pensioenen. Blue Sky Group is de partner bij uitstek voor bestuurders van pensioenfondsen. Wij zijn specialist in alle aspecten van het ondersteunen van een bestuur: pensioenadministratie, bestuurlijke processen, fiduciair vermogensbeheer en communicatie. Zien volgens Blue Sky Group is begrijpen wat veranderingen betekenen, anticiperen op de uitdagingen van de toekomst en nu al met passende oplossingen komen. Wil jij weten wat het is om voor Blue Sky Group te werken? Ga dan naar www.blueskygroup.nl.
Modelling individual and collective choice behaviour in An approach combining a nested conditional logit mod Susanne Koster Erasmus University Rotterdam
By examining online social network and microblogging sites such as Twitter, we obtain an understanding about the collective opinion formation and its diffusion processes. We use Twitter data of three Dutch talent shows: The Voice Kids, The Ultimate Dance Battle, and Holland’s Got Talent. The network of Twitter followers and followees represents the social network; the choices of the individuals to tweet about one of the candidates of the talent show represent the individual choice behaviour. We are interested in the factors that influence individual choice behaviour in a social network. Second, we are interested in how we may change the collective choice behaviour in the social network. We implement a nested conditional logit model with latent classes and an Agent Based model to run social network simulations.
4
Introduction The internet age has increasingly become a social network age. Social networks and microblogs are now an important tool for information dissemination, sharing interpersonal communication and networking. The choices we make in our daily lives are increasingly influenced by these social networks. This opens a field to examine how human behaviour is influenced by the online world in general and online social networks in particular. By examining online social network and microblogging sites such as Twitter, we obtain an understanding about the collective opinion formation and its diffusion processes. In addition to this, the amount, availability, and accessibility of social network data allows researchers to examine this field in its full complexity with ever increasing detail. This research addresses the notion of individual choice behaviour and the collective dynamics evolving from individual behaviour in social networks. We use twitter data of three Dutch talent shows as case studies. The network of Twitter followers and followees represent the social network; the choices of the individuals to tweet about one of the candidates of the talent show represents the individual choice behaviour. We are interested in the factors that influence individual choice behaviour in a social network. We review literature on social networks (Watts and Dodds (2007), Katona et al. (2011), Cha et al. (2010), Weng et al. (2010)), on new product diffusion (Bass (1969), Langley et al. (2012), Goldenberg et al. (2010, 2001)) and on social influence (Cialdini (2001), Leibenstein (1950), McPherson et al. (2001), Rafaat et al. (2009)). We distinguish social network (the choice behaviour of others in the social network), cognitive (the individuals own choice behaviour), and external factors (influence of the television show). The first research question is: To what extent do social network, cognitive and external factors influence individual choice behaviour in a social network? Second, we are interested in how we may change the collective choice behaviour in the social network. For our case studies this means that we try to make another candidate in the talent show more popular in terms of the number of tweets. We vary external factors (i.e. changing the sequences and frequency of the performances of the candidates), and we alter the choice behaviour of influentials (i.e. making influentials with many followers tweet about a
MET | Volume 20 | Ă&#x;ETA Special | 2013
n social networks: del with latent classes and an Agent Based Model
Table 1: Case study descriptives
Figure 1a
Figure 1b
Figure 1: Collective choice behaviour: the number of tweets per candidate (a) and the cumulative number of tweets per candidate (b) for the talent show The Voice Kids certain candidate). We believe that managers may influence these variables. The second research question is: How may collective choice behaviour be influenced by varying external factors and by altering the choice behaviour of influentials? Data Twitter data is collected on three Dutch talent shows: The Voice Kids (February and March 2012), The Ultimate Dance Battle (March to May 2012), and Holland’s Got Talent (March to June 2012)1. In these television shows, the public votes on the best singing child, the best dancing group, and the best stage act respectively. During the show, viewers are encouraged to tweet using the shows’ hashtag (i.e. #thevoicekids, #tudb, #hgt). We use Twitter API to stream messages about the program using this hashtag. Table 1 shows the descriptives of the three case studies. 1
Data were collected during the master thesis project at TNO
MET | Volume 20 | ßETA Special | 2013
Figure 1 shows the number of tweets per candidate (a) and the cumulative number of tweets per candidate during the final show of The Voice Kids (b). Below the graph, the construct Product Display is shown, indicating when a candidate performs. The black lines in the graphs indicate the time at which the voting lines of the talent shows closed and the time at which the voting results were known. The Twitter dataset consists of a list of tweets for which the content, the time stamp, and the sender is known. The followers of the sender are also known. This makes it possible to reconstruct the social network and to see in which time period individuals receive a tweet and when they send a tweet themselves. We are also able to see the content of the tweets. Based on the available data, we operationalize the constructs, which is shown in Table 2. The Product Display construct uses time periods of one minute; as the performances of the candidates are three to six minutes long. The one minute time period causes a lot of “zeros” in the dataset, i.e. many observations concern individuals that do not tweet at time t. For instance, for
5
Table 2: Operationalization of the constructs The Voice Kids case study this means that we would have 20.822 individuals times 225 minutes is equal to 4.68 million observations, of which less than 0,5% contain a tweet observation. An efficient data-analysis strategy is to sample disproportionally more from the smaller “Tweetgroup” than from the larger “NoTweet-group” in order to create more computational efficiency. In the method paragraph, we explain how to correct the model parameters for this selection bias. For the dataset of the three case studies 99.5% of the observations in the “NoTweetgroup” are randomly left out. This leaves 37,831 observations for The Voice Kids case, 21,216 observations for The Ultimate Dance Battle case, and 86,668 observations for the Holland’s Got Talent case. Method We implement a nested conditional logit model with latent classes. This model results in estimated parameters that indicate the relationship between individuals choice behaviour and its determinants. The parameters are latent class specific to allow for unobserved heterogeneity among individuals. The Agent Based model uses the estimated parameters and simulates interacting heterogeneous agents in a social network. The nested conditional logit model naturally clusters interrelated choices into nests. We distinguish the nest of sending tweets about candidates or sending a general tweet and the nest of sending no tweet. The probability to tweet about one of the candidates is defined as:
6
(1)
(for all candidates and the General Tweet
option) where, • is a vector containing all standardized explanatory variables described in Table 2. • is the individual and candidate specific intercept parameter. • is a vector of coefficients. • , means that user tweets about candidate at time . If , a user sends a general tweet at time . • . When , user tweets about the talent show on time . When , user does not tweet about the talent show on time . The probability to send a tweet and to send no tweet is defined as:
(2)
where, • is the inclusive value. • is a vector containing all standardized explanatory variables. The variables are the logarithm of the number of friends, and the logarithm of the number of followers. It is assumed that an individual is more active on Twitter, when this person has many friends and followers. • is a vector of coefficients. Latent classes
We implement latent classes to limit the number of parameters while capturing heterogeneity. We assume that there are latent classes in the population. The overall probability that an individual belongs to class equals with . The indiMET | Volume 20 | ßETA Special | 2013
vidual-specific intercepts will be replaced by a latent class specific intercept , the parameters by , by , and by . Let be the set of parameters for class . The likelihood contribution of individual belonging to latent class is given by (adapted from Franses and Paap (2010), pp. 108):
(3)
Write and , the vector of the parameters for all latent classes combined. Now, the likelihood function of the nested conditional logit model is given by: (4) We use the EM algorithm to obtain parameter estimates. This algorithm is an iterative parameter estimation routine, which provides a (local) maximum of the log likelihood function using an iterative two-step procedure: the E (Expectation) step and the M (Maximization) step. One iterates over the two steps until convergence is achieved. The EM algorithm is computationally attractive, and convergence is assured (Wedel and DeSarbo, 1995). The EM-algorithm as suggested by Wedel and DeSarbo (1995) is programmed in R and consists of the following steps: • Step 1: Initialization: The parameter are initialized to homogeneous starting values that result from the Nested Conditional Logit model without latent classes, . We perturb the starting values by adding a small random vector in order to be able to execute the E-step. • Step 2: The E-step: In the E-step, we estimate the expected value of , named , given the current estimates of of and of . Using the Bayes rule, we compute:
(5)
(6)
In the M-step, the following function is maximized with respect to the parameters and : (7) This maximization function is numerically solved, using the Nelder-Mead optimisation algorithm. • Step 4: Repeat step 2 and 3: The full EM algorithm is iterative. Step 2 and 3 are repeated until convergence of the value of the maximization function in formula 7. If the difference of this maximization value of the current iteration and the maximization value of the previous iteration is less than , the maximization step is stopped. We use likelihood function of formula 4 to compute standard errors of parameters and , using the Richardson method to compute the hessian matrix. The standard errors are equal the diagonal of the expected value of the inverse hessian (Heij et al., pp. 228). The number of latent classes is determined by minimizing the Bayesian Information Criterion (BIC). To correct for selective sampling, we define as the weight we give to “NoTweet” observations based on the sampled proportion of “NoTweet” observations of total number of “NoTweet” observations, or . is defined as the weight we give to “Tweet” observations based on the sampled proportion of “Tweet”-observations of total number of “Tweet” observations, or . Scott and Wild (1997) show that by Bayes theorem the following holds: where
(8)
.
• Step 3: The M-Step The EM algorithm makes use of an complete data loglikelihood function, which assign each individual to a class s, i.e. , i.e.:
In our case studies, we sample 0.5% of the “NoTweet” observations and 100% of the “Tweet” observations. Therefore .
MET | Volume 20 | ßETA Special | 2013
7
Table 3: Parameter estimates of the three case studies, n.s. indicates not significant coefficients
Figure 2a
Figure 2b
Figure 2: Agent Based model simulation results: the number of tweets per candidate (a) and the cumulative number of tweets per candidate (b) for the talent shows The Voice Kids with cooling down period of 6 minutes
To build a rigorous Agent Based model, we use the guidelines for rigor of Rand and Rust (2011). The agent follow the results as specified by the nested conditional logit model with latent classes. The estimated parameters are transformed into parameters that match unstandardized dependent variables. The model is programmed in Repast. Results The minimum BIC value of The Voice Kids case study is obtained when 4 latent classes are implemented (McFadden R2 is 0.247). For The Ultimate Dance Battle case, 3 is the optimal number of latent classes (McFadden R2 is 0.263). For the Holland’s Got Talent case, 2 is the optimal number of latent
classes (McFadden R2 is 0.394). The McFadden R2 is larger than 0.2 for all cases. Values of 0.2 to 0.4w are considered as highly satisfactory (McFadden, 1974, pp.121), and we therefore consider the fit of the three models as highly satisfactory. Table 3 shows the parameter estimates of the three case studies. We find support that social network, and cognitive factors positively influence individual choice behaviour in a social network. The Global Social Network effect, i.e. the effect of public opinion of the complete social network, explains most of the variance in the individual choice behaviour in The Ultimate Dance Battle and the Holland’s Got Talent case, and second most of the variance in the Voice Kids case. The cognitive factor Consistency, i.e. the ten-
8
MET | Volume 20 | ßETA Special | 2013
Table 4: The effect of influence measures for the Agent Based Model for The Voice Kids case
dency to be consistent with previous choices, explains most of the variance in the Voice Kids case. The Local Social Network effect, i.e. the effect of the choice behaviour of direct connections, explains a smaller, but significant part of the variance in individual choice behaviour. For the influence of external factors, i.e. moments at which the product is displayed to the individuals, we find mixed results. For the Holland’s Got Talent case study, we find full support for a positive and significant effect, and for the other two case studies we find partial support. During the simulations we notice the occurrence of so called cascades, i.e. the number of one or more of the tweet options rapidly increases in time to a total number that is approximately 50-100 times larger than the average level of the total number of tweets. In the Voice Kids case, it appeared that after approximately 150 time periods approximately 50 agents start tweeting about one of the candidates and cannot stop tweeting about this same candidate. Note that it is not the case that every agent starts tweeting. The cascades may be the consequence of non-stationary components in the nested conditional logit model. We decide to build in a cooling down period of 6 minutes for The Voice Kids case, which ensures that an agent is not able to send a tweet again immediately after his previous tweet. Next, we compare the simulation output in terms of the simulated total number of tweets per choice option over time (Figure 2) to the empirical total number of tweets per choice option over time (Figure 1). In Figure 2(a) and (b) we see the number of tweets over time of The Voice Kids simulation with a cooling down period of 6 minutes. In The Voice Kids case the television start at time step 100. The peaks in Figure 2(a) show the moments when one of the candidates performs. Figure 2(b) shows that the ranking of the candidate in terms of the number of tweets is the same as in Fig-
ure 1(b). Also, the total number of tweets per candidate closely approximates the empirical number of tweets. We simulate the scenario’s for The Voice Kids case. Table 4 summarizes the findings of the what-if scenario’s for the Agent Based model for The Voice Kids case. The simulations show that increasing the number of performances of the candidate (Product Display) influences the ranking of the candidates. The other two influence measures have no effect.
MET | Volume 20 | ßETA Special | 2013
9
Discussion and managerial implications The EM algorithm has the advantage of being computationally attractive, of being easy to program and of convergence being ensured. A disadvantages of the algorithm is that its convergence rate is slow, i.e. the computational time per number of latent classes per case is between two to five days. Another disadvantage is that the algorithm is burdened with convergence to local optima, which we try to solve by taking different starting values. The Agent Based model proved to be useful in the assessment of whatif scenario’s. We show that the simulation model is able to compute output similar to real world output, based on empirical input parameters. Further research is needed to solve issues as the limitation in the number of agents, and the cascades due to agents who cannot stop sending tweets. The cascades may be the consequence of non-stationary components in the nested conditional logit model. Remark that the simulated dependent variables are extrapolations of the empirical dependent variables, and may therefore have extreme values. A solution may be to model logarithms of the dependent variables, which may temper extreme values of the simulated dependent variables. The nested conditional logit model shows the underlying determinants of individual choice behaviour.
We apply the model to three talent shows. Managers may use the model for other applications, where individuals choose among a fixed set of products or candidates, for instance candidates in other television shows or in political elections. The latent classes of the nested conditional logit model take into account unobserved heterogeneity. For instance, this research showed that some individuals are more likely to be influenced when a product is displayed to them than others. For marketing managers a model with latent classes is useful in order to classify consumers or voters based on their actual behaviour. Marketing managers may use this information for market segmentation and for the design of effective segment specific marketing campaigns. We show how managers may use the parameters of the nested conditional logit model into the Agent Based model. The simulation model has empirically determined input parameters, and it has therefore a high input validity. Managers are able to test scenarios in a risk-free way, and still have trust in the validity of the results. References [1] Bass, F. M., 1969, A new product growth for model consumer durables. Management Science, 15, pp. 215-227. Cameron, C., Trivedi, P.K., 2005, Microeconometrics: methods and applications, New York, Cambridge University Press. Cialdini, R. B. 2001. Influence: science and practice, Boston, Allyn & Bacon. Cha, M., Haddadi, H., Benevenuto, F. Gummadi, K.P., 2010, Measuring user influence in Twitter: The million follower fallacy, ICWSM ’10: Proceedings of international AAAI Conference on Weblogs and Social Media. Franses, P.H., Paap, R., 2010, Quantitative models in marketing research, New York, Cambridge University Press. Goldenberg, J., Libai, B., Muller, E., 2001, Talk of the network: a complex systems look at the underlying process of word-of-mouth, Marketing Letters, 12,3, pp. 211-223. Goldenberg, J., Libai, B., Muller,. E. 2010, The chilling effects of network externalities, International
10
Journal of Research in Marketing, 27, pp. 4-15. Heij, C., de Boer, P., Franses, P.H., Kloek, T., van Dijk, H.K., 2004, Econometric methods with applications in business and economics, New York, Oxford University Press. Katona, Z., Zubcsek, P., Sarvary, M., 2011, Network effects and personal influences: Diffusion of an online social network, Journal of Marketing research, 48, 3, pp. 425-554. Langley, D.J., Bijmolt, T.H.A., Ortt, J.R., Pals, N., 2012, Determinant of social contagion during new product adoption, Journal of Product Innovation Management, 29, 4, pp. 623–638. Leibenstein, H., 1950. Bandwagon, Snob, and Veblen Effects in the Theory of Conspicuous Demand. Quarterly Journal of Economics, 64, 183-207. McFadden, D., 1979, Quantitative methods for analysis travel behaviour of individuals: some recent developments, In: Hensher, D.A. and Stopher, P.R.: (eds): Behavioural travel modelling, London: Croom Helm, pp. 279-318. McPherson, M., Smith-Lovin, L., Cook, J.M., 2001. Birds of a feather: Homophily in social networks, Annual review of sociology, 27(1), pp. 415-444. Rafaat, R.M., Chater, N., Frith, C., 2009, Herding in humans, Trends in Cognitive Sciences, 13 (10), pp. 420-428. Rand W., Rust, R.T., 2011, Agent-based modelling in marketing: guidelines for rigor, International Journal of Research Marketing, 28 (3), pp. 181-193. Scott, A.J., Wild, C.J., 1997, Fitting regression models to case-control data by maximum likelihood, Biometrika, 84, 1, pp. 57-71. Watts D.J., Dodds, P.S., 2007, Influentials, networks, and public opinion formation, Journal of Consumer Research, Vol. 34, December 2007, pp.441-458. Wedel, M., DeSarbo, W.S., 1995, A mixture likelihood approach for generalized linear models, Journal of Classification, 12, pp. 21-55. Weng, L. Flammini, A., Menczer, F., 2012, Competition among memes in a world with limited attention, Scientific Reports, 2, 335; DOI:10.1038/srep00335. Notes [1] Data were collected during the master thesis pro-
MET | Volume 20 | ßETA Special | 2013
ject at TNO. [2] Data were collected during the master thesis project at TNO.
PICTURE
MET | Volume 20 | Ă&#x;ETA Special | 2013
11
Portfolio Rebalancing with Transaction Costs Pauline Vermeer Erasmus University Rotterdam
Portfolio rebalancing attracts more attention nowadays. Many investors found themselves substantially underweighted in equities relative to their strategic asset allocation during the financial crisis, since equity markets crashed. As a result, investors had to revise their portfolios in order to bring the portfolios’ asset allocations back in line with their aims and preferences. This is called portfolio rebalancing. Over time, many rebalancing strategies have been developed. The main benefit of rebalancing is the risk control relative to the targetasset allocation. Therefore, recent literature derives a rebalancing strategy that minimizes the tracking error at the lowest possible costs; this is called the efficient rebalancing strategy. However, it remains unexplored how this efficient rebalancing strategy performs against a wide range of heuristic alternatives. In this thesis, we review the efficient strategy and evaluate the traditional rebalancing strategies under some best practices. We find that rebalancing to the edge of the no-trade region when the allocation deviates 3% from its target, by monitoring every week, approaches the effectiveness of the efficient rebalancing strategy. The results are robust for different data sources, a Monte Carlo simulation and a block bootstrap simulation method.
12
Introduction Over time, any asset portfolio is going to drift from its initial target allocations as some asset classes outperform others. As a result, investors have to revise their portfolios in order to bring the portfolios’ asset allocations back in line with their aims and preferences. This is called portfolio rebalancing. A rebalancing strategy addresses the risk that the risk-and-return characteristics are inconsistent with the investor’s goals and preferences. It controls for risk relative to the target asset allocation, i.e. tracking error. Many rebalancing strategies have been developed over time. In general, many institutions rebalance using fixedband approaches such as: the calendar-based (e.g. rebalance each year, quarter, month or week) or tolerance-range strategy (rebalance when asset weights deviate a certain percentage from target ratios). Any rebalancing strategy must attempt to balance the competing desires of keeping both transaction costs and tracking error low. Therefore, recent literature derives a rebalancing strategy that minimizes the tracking error at the lowest possible costs. This is called the efficient rebalancing strategy. The performance of the efficient rebalancing strategy against naïve alternatives is only investigated by Donohue and Yip (2003), who compare the efficient rebalancing strategy with heuristic approaches that rebalance back to the target allocations. However, it remains unexplored how this efficient rebalancing strategy performs against the best possible fixed-band strategy, by extracting the key features associated with rebalancing. We therefore compare the efficient rebalancing strategy with a wide range of fixed-band rebalancing strategies. There can be considerable variation between the fixedband rebalancing strategies. In order to find the best possible fixed-band rebalancing strategy, we shed light on several rebalancing matters. We first investigate whether rebalancing should be done on a scheduled basis or if a no-trade region needs to be defined around the target asset allocations, where trading does not occur when the current asset allocations lie in the no-trade region. Second, when a no-trade region is built around the target weights, the portfolio can be monitored at different time intervals. This comes down to rebalancing on a scheduled basis only if the portfolio’s asset allocations have driven from the threshold. Afterwards, we explore how far back to the target allocations one should rebalance the portfolio. The
MET | Volume 20 | ßETA Special | 2013
allocations can either be rebalanced back to the target ratio, to the bandwidth or to the midpoint between target and edge. Once we have evaluated the traditional rebalancing strategies under some best practices, we review the efficient rebalancing strategy. We specifically address two main questions in this research: ‘Is there a best possible fixedband rebalancing strategy, and if there is, does it approach the effectiveness of the efficient rebalancing strategy?’ Data The portfolio that is investigated in this research consists of 45% stocks, 40% bonds and 15% cash.1 Ten years of weekly returns are generated and 10,000 iterations are used. A starting wealth of EUR 100 is assumed. In this report we only describe the Monte Carlo simulation results. However, the full report shows that our results are robust for different data sources. All rebalancing strategies are tested using the same set of returns over the same horizon at each iteration. General Methodology for Rebalancing A wide range of rebalancing strategies are compared in this thesis. This section explains the general methodology for all rebalancing strategies. Let the vector denote the return of the financial assets from time till , where , with (equities, bonds and cash). Dividend payments are reinvested in equities and interest payments are reinvested in bonds. Then, the gross return, , of asset is calculated by: (1) where is the change in the asset price of asset at time to time . Let be the weight at period of asset and let the weights start with the initial target weights at time . Based upon the initial target weights , and the gross return we can decide on the weights in period . Two restrictions on the weights are imposed: 1) weights have to sum up to one, and 2) short positions or borrow positions are not allowed, . Then, the new weights due to the price movements are calculated by:
MET | Volume 20 | ßETA Special | 2013
(2) The cumulative portfolio value, is the sum of the value of the asset values: (3) The values of the initial target weights, , are 45% stocks, 40% bonds and 15% cash. Rebalancing involves buying or selling assets in your portfolio to reach some desired weight. The new weight is dependent on the chosen rebalancing policy. Let us define the rebalancing policy by . The value of can be positive as well as negative as you can buy and sell assets in your portfolio. Due to the price movement, the next periods weight with rebalancing becomes:
(4)
Rebalancing should occur whenever its benefits outweigh its costs. Rebalancing is beneficial because it enables the actual portfolio to track more closely the returns of the target mix. Tracking error is an evaluation criterion that refers to the correspondence between the returns of an investor’s portfolio and the returns of the target portfolio. It is calculated as the standard deviation of the active returns, taking the correlations into account:
(5) where is a vector , is a vector and is the covariance matrix of the returns. This is in correspondence with the definition of tracking error defined by Leland (1999), Donohue and Yip (2003), Clark (1999), Pliska and Suzuki (2006) and Holden and Holden (2007). Costs are associated with adjusting asset weights. We start our analysis by considering proportional transaction costs: (6) where is the proportional cost of rebalancing asset : , and . Afterwards, we consider fixed costs: (7) where is an indicator function. It is equal to 1 if , i.e. the rebalancing policy is not equal to zero for asset at time , and it is equal to 0 otherwise. denotes the fixed costs and is equal to 0.5% for all assets. equals
13
zero if there is no rebalancing event at time . Methodology Fixed-band Rebalancing Strategies First of all, we calculate the buy-and-hold strategy, where the investor does not rebalance. This has the following implications for the rebalancing policy, : • The BH: for all assets at all time intervals , and is calculated by equation 2. A widely known rebalancing strategy is the calendar-based rebalancing strategy. This strategy rebalances the current asset allocations back at determined time intervals (e.g. Yearly, Quarterly, Monthly or Weekly). For the different calendar-based rebalancing strategies, this indicates the following for the rebalancing policy, : • CR: for all at all unless (yearly), (quarterly), (monthly) or the portfolio is continuously rebalanced (weekly), as weekly returns are used in this paper. Furthermore, we consider the % tolerance-band rebalancing strategies with an equal band size for all assets, with %, 3%, 5% and 10%. The tolerance-band rebalancing strategies are calculated as follows: • The TR1 : if keep such that for all assets . Otherwise rebalance and calculate the weight by equation 4. Methodology Efficient Rebalancing Strategy An efficient strategy should maintain a portfolio that tracks the target portfolio as closely as possible while minimizing the transaction costs. The decision to rebalance the portfolio should be based on the consideration of three costs: 1) the tracking error costs associated with any deviation in our portfolio from the actual strategic asset allocation, 2) the trading costs associated with buying or selling assets during rebalancing, and 3) the expected future cost depending on the actions taken. Clearly, future rebalancing decisions are dependent on the decisions made earlier. Rebalancing today reduces expected future transaction and tracking error costs but it increases current transaction costs. Rebalancing back to the edge of the portfolio gives lower trading costs today compared to rebalancing it all the way back to the target ratio. At each time an efficient decision has to be made, having in mind that future optimal decisions will relate to this decision. This idea is described by Bellman (1957). The efficient
14
rebalancing strategy chooses a rebalancing policy, such that these total costs are minimized: (8) where, is specified as: again, subject to the constraints: , and equation 4. where is a vector of current weights, , a vector of rebalancing policies, , the expected future cost from onwards given all future decisions, also known as the value function, and is a vector of the new weights . represents the risk aversion. Equation 8 is also known as the Bellman equation and it is the basis of the discretized portfolio rebalancing problem. There is a recursive relation between the value function at time , , and the expectation of the value function at time , .2 Results - Fixed-Band Rebalancing Strategies In this section we compare different fixed-band rebalancing strategies, assuming proportional transaction costs. First, we investigate the different rebalancing matters such as how frequently a portfolio should be monitored, how wide the no-trade region should be and whether rebalancing should bring an asset allocation back to its target or to some intermediate point. The impact of the rebalancing strategies is analysed on both relative and absolute performance. However, we find that the risk-adjusted returns are not meaningfully different between the different rebalancing strategies. Therefore, we will only discuss the relative risk in this paper.3 Monitoring Frequency and the No-Trade Region The best rebalancing strategy will achieve the desired degree of tracking error at the lowest possible costs. Figure 1 reports the average results of all 10,000 simulations for the buy-and-hold strategy (not rebalancing) along with several calendar-based and tolerance-band strategies that rebalance back to the target asset allocations. It gives a graphical representation of the annual tracking error versus the annual trading costs for the different strategies. All frontiers demonstrate a convex shape. The slope of the transaction costs/tracking error line shows a sharp decrease in annual transaction costs for lower levels of tracking error and vice versa. For wide bandwidths, where
MET | Volume 20 | ßETA Special | 2013
Figure 1a: Calendar-based versus tolerance-band strategies that monitor on a monthly basis
Figure 1b: Tolerance-band rebalancing strategies (TR) with different monitoring dates Figure 1: Tracking error versus proportional transaction costs for different rebalancing strategies
MET | Volume 20 | Ă&#x;ETA Special | 2013
15
Figure 2: Tracking error versus proportional transaction costs with monthly monitoring
Figure 3: Tracking error vs. fixed and proportional costs where c = [0.5%, 0.5%, 0%] and c0 = [0.5%]
16
MET | Volume 20 | Ă&#x;ETA Special | 2013
Figure 4: Tracking error versus proportional transaction costs, c = [0.5%, 0.5%, 0%], for different strategies and the efficient rebalancing strategy. the tracking error is large, a small increase in annual transaction costs can substantially reduce tracking error costs. On the other hand, for low bandwidths, where the tracking error is low transaction costs can be reduced by incurring only modest increases in annual tracking error. This is observed by tightening bands from 10% to 5% versus tightening bands from 3% to 1%. This indicates that it might not be cost-effective to reduce tracking error or transaction costs to very low levels, since it requires substantial increases in the transaction costs or tracking error. Furthermore, we find that the tolerance-band strategies can achieve lower costs while maintaining the same level of tracking error than the calendar-based strategies. This is consistent with the research of Leland (1999) and Donohue and Yip (2003) who find that fixed-band rebalancing is more efficient than a periodic approach where all asset classes are rebalanced back to the target asset allocations at fixed calendar intervals. Trading decisions of calendar based strategies are independent of the market behaviour. As a result, even if the portfolio is nearly optimal, rebalancing might still occur. This gives inefficient results compared to tolerance-band rebalancing, as the costs of rebalancing are higher than the benefits achieved with rebalancing back to the target ratio. Figure 1b shows that the tolerance-band strategies also differ among the monitoring frequencies. A reduction in terms of tracking error and transaction costs can be
achieved with respect to yearly monitoring when the portfolio is monitored at least every quarter. On the other hand, the differences between quarterly, monthly and weekly monitoring are not very large.
MET | Volume 20 | Ă&#x;ETA Special | 2013
17
Rebalancing Destination Besides the different rebalancing bandwidths and frequencies, different rebalancing destinations are possible. We follow Arnott and Lovell (1993), Masters (2003) and Leland (1999) and consider three different rebalancing destinations in this section: trade to the target, trade to the edge and trade to the midpoint. Again, the equal tolerance bandwidths and the calendar-based rebalancing strategies are considered. Figure 2 shows that the tracking error is smaller for rebalancing to the midpoint than rebalancing to the bandwidth since the midpoint is closer situated to the target ratio. However, this is accompanied with higher trading costs. More assets need to be sold or bought in order to bring the asset weights back to the midpoint versus the bandwidth, as proportional transaction costs are used. The inefficiency of trading back to the target allocations follows from the fact that trading back to the target leads to larger upfront transaction costs than trading to the edge of the no-trade region, as there is a possibility that the mis-weights will be naturally corrected by the market movements.
We observe it is suboptimal to rebalance to the target in terms of relative risk control and transaction costs compared to rebalancing to the midpoint or bandwidth. Trading back to the target asset allocation is not preferred. The same tracking error could be achieved at lower transaction costs if the investor chooses to trade to the boundary or the midpoint instead of trading back to the target.
trade region around the target asset allocations, 2) monitor the portfolio at least every quarter and 3) rebalance back to the boundary or midpoint of the no-trade region, depending on the trading costs. Then, by using these key features associated with rebalancing, we find that the 3% tolerance-band strategy that rebalances back to the boundary approaches the effectiveness of efficient rebalancing.
Fixed cost So far, our results show that asset weights only have to be brought back to the boundary of the no-trade region when proportional (and not fixed) costs are used. However, when fixed costs together with proportional costs are considered, a different conclusion is obtained. A graphical representation of the annual tracking error and the annual trading costs is shown in figure 3. We observe from figure 3 that it is not optimal anymore to adjust small changes in the weights due to the fixed cost component. Rebalancing to the bandwidth becomes more expensive compared to rebalancing to the midpoint. Consequently, it becomes optimal to trade to an inner point in the no-trade region. This confirms the findings of Holden and Holden (2007) and Tokat and Wincas (2007). Results - Efficient rebalancing strategy Our results above focused on fixed-band rebalancing strategies. Rather than trading assets that reach fixedbands or rebalance on a calendar-based approach, investors can minimize the tracking error and transaction costs at each period in time to optimize trades. It minimizes tracking error, transaction costs and expected future transaction costs. Donohue and Yip (2003) and Leland (1999) compare the efficient rebalancing strategy with heuristic strategies that readjust their asset allocations back to the target ratios. However, we concluded that fixed-band strategies that rebalance to the target are suboptimal compared to strategies that rebalance to the bandwidth of the no-trade region. Therefore, we compare the efficient strategy with the fixed-band strategies that rebalance back to the edge of the no-trade region. The transaction costs/ tracking error efficient frontier is shown in figure 4. Our results show that the benefits of the efficient rebalancing strategy diminish when fixed alternatives are used under a specified set of rules that enhance the performance of fixed-band strategies: 1) specify an equal no-
Conclusion In order to find a fixed-band rebalancing strategy that has the best averaged performance over all circumstances, we formulated the following research sub-questions: 1) Is there an appropriate no-trade region? And if so, what is the no-trade region? 2) How frequently should the portfolio be monitored? and 3) What is the desired rebalancing destination?. The results in this paper indicate that setting a no-trade region around the target is optimal relative to periodic rebalancing as the former can achieve lower costs while maintaining the same tracking error as the latter. It is important to be aware of conflicting factors when choosing the appropriate no-trade region. The threshold should not be set too low (0% or 1% band) as this results in a high turnover and high trading costs. On the other hand, the boundary should not be set too high (10% band or higher) to the extent that the portfolio fluctuations are never large enough to trigger rebalancing. Second, we find that the portfolio should be monitored at least on a quarterly basis, as a reduction in terms of tracking error and transaction costs can be achieved with respect to yearly monitoring when the portfolio is monitored at least every quarter. Third, our analysis demonstrates that when proportional transaction costs are used, it is optimal to rebalance to the boundary whenever the portfolio is outside the no-trade region. When the transaction costs have fixed elements as well as proportional elements then it becomes more beneficial to rebalance to an intermediate point in the notrade region. A full rebalance back to the target asset allocation results in a utility loss and is therefore never preferred. Our findings show that by formulating the rebalancing problem as a multi-period optimization problem, both tracking error and transaction costs can be reduced. In addition, we find that the differences between the efficient rebalancing strategy and the fixed-band strategies are reduced when the fixed-band rebalancing strategy satisfy
18
MET | Volume 20 | Ă&#x;ETA Special | 2013
the above stated conditions. Bibliography [1] Arnott, R. and Lovell, R. (1993). Rebalancing: Why? when? how often? The Journal of Investing, 2(1):5–10. Bellman, R. (1957). Dynamic programming, volume I. Princeton university press. [2] Clark, T. A. (1999). Efficient portfolio rebalancing. Dimensional Fund Advisors Inc., pages 1–24. [3] Donohue, C. and Yip, K. (2003). Optimal portfolio rebalancing with transaction costs. Journal of Portfolio Management, 29:49–63. [4] Holden, H. and Holden, L. (2007). Optimal rebalancing of portfolios with transaction costs. Norwegian Computing Center, 2:1–34. [5] Markowitz, H. M. and van Dijk, E. L. (2003). Singleperiod mean-variance analysis in a changing world (corrected). Financial Analysts Journal, 59(2):30–44. [6] Masters, S. (2003). Rebalancing. The Journal of Portfolio Management, 29:52–57. [7] Pliska, S. R. and Suzuki, K. (2006). Optimal tracking for asset allocation with fixed and proportional transaction costs. Quantitative finance, 4(2):233–243. [8] Tokat, Y. and Wincas, N. (2007). Portfolio rebalancing in theory and practice. Journal of investing, 16(2):52–59. [9] Leland, H. E. (1999). Optimal portfolio management with transaction costs and capital gains taxes. Haas School of Business Technical Report, University of California, Berkely, pages 1–54. Notes [1] Only the results for the risk-neutral investor are shown in this paper. In the full master thesis ‘Portfolio Rebalancing With Transaction costs’ you can find the results for different investor profiles. [2] We use the Markowitz and van Dijk (2003) method to solve the dynamic programming problem. [3] This paper is a shortened version of the Master Thesis: ‘Portfolio Rebalancing with Transaction Costs’. For more details on the absolute performance please consult the full report.
MET | Volume 20 | ßETA Special | 2013
PICTURE
19
Modeling Credit Risk: An Asymmetric Jump-Dif Pascal L.J. Wissink Erasmus University Rotterdam
This thesis proposes a new structural credit risk model to measure the asset dynamics of a company by allowing for asymmetric jumps in the asset value. The dynamics of the implied asset value are captured by a double exponential jump-diffusion (DEJD) process. Similar to the traditional Merton model, its closed-form expression allows for analytical tractability and algebraic derivation of risk-related measures. In an empirical assessment of the DEJD structural credit risk model to data of five Dutch banks, I find that the KMV-implementation of the DEJD model outperforms the KMVMerton model for four of the five banks in the sample according to the Bayesian information criterion. On average, 31% of the price movements that is usually ascribed to the volatility component of the Merton model is now captured by the jump component of the DEJD model. This leads to differences in risk assessment by the probability of default of up to 37% for the banks in our sample.
20
Introduction Ever since the extension of the Black and Scholes (1973) formula by Merton (1974) to a framework that can be used to value firms, researchers have made attempts to develop, improve and apply structural credit risk models to capture the asset characteristics of a company. Asset characteristics can be used to assess the creditworthiness of a company, and therefore provide a powerful instrument to companies and financial institutions. Although most literature on structural credit risk models focuses on modeling only the mean and volatility of asset returns, companies may also be subject to sudden drops (jumps) in their asset value, for instance, as a result of an unexpected default of a loan or due to a bank run. This thesis proposes a new structural credit risk model that allows for capturing asymmetric jumps in the asset value dynamics of a company. It relies on the double exponential jump-diffusion (DEJD) framework proposed by Kou (2002) to value financial options. The asymmetry feature of the model enables the model to capture the skewness, kurtosis and other asymmetric tail effects that are often observed in return series. (Also known as the stylized facts of return series; see Cont (2001).) Secondly, unlike most other jump-diffusion models that deal with credit risk, the model retains a closed-form expression. This allows for analytical tractability and hands-on algebraic extraction of risk-related measures, such as the probability of default (PD) and loss given default (LGD). To my knowledge, the development of a combination of all of these features into a single structural credit risk model has not been done before. The primary goal for its use to extract asset price dynamics from equity prices alone leads to a framework that requires the same input data as the Merton model. Combined with a similar analytical structure, the DEJD structural credit risk model that I propose is sufficiently close to the Merton model to allow for a one-to-one comparison of their results. The aim of this thesis is to provide a competitor model to the original Merton model that allows for a greater flexibility in the shape of the underlying distribution function, but retains a similar analytical structure and numerical implementation. I show that jumps are a paramount feature while assessing the credit risk of companies, as 31% of the shocks that are usually ascribed to the volatility component are now being captured by the jump compo-
MET | Volume 20 | Ă&#x;ETA Special | 2013
ffusion Approach
nent. This has large implications for the probability of default and loss given default of these companies. The results of the model are therefore relevant to investors and regulatory agencies for assessing the current state of banks and companies. Because the model lends itself for submission to stress tests due to the asymmetric jump component, it also provides a tool to assess the impact of sudden changes in the economic environment. The remainder of this paper is organized as follows. Section 2 introduces the DEJD framework and shows how it can be used to model credit risk. Section 3 describes the data that is used for the empirical assessment of the DEJD credit risk model. Section 4 describes the methodology and implementation of the model, followed by a discussion of the results in section 5. Section 6 concludes. The DEJD credit risk model Let denote a twice differentiable random variable which dynamics can be described by a double exponential jumpdiffusion process. Then the dynamics of at time are given by the stochastic differential equation (1) The first two terms of the right hand side of the equation, , denote a geometric Brownian motion with drift term , infinitesimal volatility and standard Brownian motion . The latter part of the equation, , defines the jump component. Here, is a Poisson process with rate and , are independent and identically distributed nonnegative random variables. The density of the jump is implied by , which is given by the double exponential distribution): , (2) The parameters and , , represent the probabilities of an upward and downward jump, respectively, and the values and represent their associated average upward and downward jump intensities. We can use the stochastic process in (1) to describe the dynamics of equity prices. Let denote the equity value of a firm, the asset value of the firm at time 0 and some predefined and fixed default barrier. Now assume that the asset value follows DEJD dynamics given by:
MET | Volume 20 | Ă&#x;ETA Special | 2013
(3) and suppose that the equity value is a twice differentiable function of assets with dynamics that can be described by: (4) Then, under the same assumptions as the Merton model (see Merton (1974)), the value of equity can be modeled as a European-style DEJD call option written on the assets of the firm with a strike price equal to . Using this observation I derive the following result by expanding the findings by Kou (2002) and Wang (2004) on pricing financial derivatives: where
(5)
Here, denotes the risk-free rate, and denotes the probability that the asset value exceeds the term between parentheses at maturity : The mathematical set-up of is rather complicated and is omitted here for the sake of brevity. (See Kou (2002) for details.) Notice that (5) bears large resemblance to the Merton model, where the normal cumulative distribution function is replaced by its DEJD counterpart . As such, given the definition of above, most of the terms also attain the same interpretations as for the Merton model. In particular, it can be shown that the probability associated with the asset value ending up below the default barrier at maturity , or probability of default (PD) for short, is given by (6) where . Similarly, the loss given default (LGD) is given by
(7) where . These expressions allow us to derive risk related measures
21
straight from the results on the parameters of the DEJD credit risk model in (5). Data Daily stock prices on five Dutch banks listed on the stock exchange, ING Group (ING), SNS Reaal (SNS), Van Lanschot (LAN), Kas Bank (KAS) and BinckBank (BIN) are obtained from Thomson Reuters Datastream. The sample ranges from December 31, 2005 to December 31, 2011 (1566 observations) for ING, LAN, KAS and BIN. SNS went public on May 18, 2006, and as such its sample ranges only from May 18, 2006 to December 31, 2011 (1467 observations). The corresponding number of outstanding shares for each of these banks for the same sample periods are obtained from Datastream as well. Daily market capitalization (i.e. the market value representation of equity) is calculated by multiplying the stock price by the number of outstanding shares at each point in time for every bank. Daily log returns on market capitalization are calculated as . The returns are essentially identical to returns on ordinary stock, making corrections for days on which new shares are emitted, stocks are repurchased and stocks are split. I also correct the market equity values of ING and SNS for other, non-banking related activities of their parent groups using the bank-to-group ratio of shareholders equity. To avoid confusion, in the remainder of this paper I use the abbreviations ING and SNS to refer to the banking segments of the groups rather than to the parent groups itself. As a proxy for the risk-free rate , data on German treasury bonds with a 5-year time to maturity are obtained from Datastream at a daily frequency for December 31, 2005 to December 31, 2011. Additionally, as a common approach to acquire an approximation on the default barrier , I use Compustat to obtain figures on the book value of debt for all banks for the period 2005 to 2011. These figures are manually complemented with data from annual reports. Methodology The initial parameter values for the equity process in (4) are obtained by applying the maximum likelihood estimation procedure devised by Ramezani and Zeng (2007). As I find that some of the methods suggested by Ramezani
22
and Zeng (2007) lack the robustness to compute some of the integrals that are incorporated in the maximum likelihood function, I employ a set of computational enhancements. (Details available on request.) Estimating the parameters of the DEJD process is nonetheless computationally intensive and takes several hours to complete. After obtaining the equity parameters, the next step is to link the equity parameters to the asset parameters using the DEJD credit risk model in (5). The approach that I pursue for this purpose resembles an implementation of the Merton model that is known as the KMV approach (Dwyer et al. (2004)). The KMV implementation of the Merton model is arguably the most widespread adaptation of the Merton model in common practice and therefore also serves as a fair competitor to benchmark against. Although the actual KMV-Merton model is a proprietary invention by Moody’s and its exact implementation details are undisclosed, an attempt has been made by Bharath and Shumway (2008) in order to replicate its results. To account for the extra parameters in the DEJD model, I adapt the KMV implementation as follows and propose the following algorithm. 1. Apply the maximum likelihood estimation procedure described by Ramezani and Zeng (2007) to obtain estimates for the equity parameters. Set the initial candidate values for the asset parameters as follows: , , , and , where , denote the number of upward and downward jumps, respectively. 2. Plug the equity series and asset parameters obtained in the previous step into the DEJD credit risk model (5) in order to calculate the implied asset prices for every day of the sample period preceding the observation of interest. 3. Use the asset values from step 2 to calculate the implied (log) return on assets. Apply maximum likelihood to the return series to obtain new estimates on the volatility , jump rates , , and jump magnitudes , . 4. Repeat steps 2 and 3 until the difference between the old set of values and the new set of values has converged to some small value for every parameter. In this paper, I set .
MET | Volume 20 | Ă&#x;ETA Special | 2013
Table 1: Parameters estimates for the return on assets Notes: This table reports the asset parameters that result from applying the KMV-Merton model (GBM) and the KMV implementation of the DEJD credit risk model (DEJD) after values for the equity parameter have been obtained for the sample period 2006–2011. Corresponding standard errors are presented in parentheses. The statistics reported in this table relate to the annualized results of applying the models with a time to maturity of T = 0:5 years. Boldface typesetting indicates significance at a 5% level. The following notation is pursued to describe the asset parameters: A: asset value in EUR mio.; FORMULE: mean; FORMULE: volatility; FORMULE: number of upward jumps; FORMULE: number of downward jumps; FORMULE: inverse average upward jump magnitude; FORMULE: inverse average downward jump magnitude. Results I apply the algorithm to extract the market value of assets and its associated return dynamics of all five banks for the period 2006–2011 using a time to maturity of years. The results enable us to calculate the PD and LGD for the next 6 months by the end of 2011 using equations (6) and (7), assuming that the asset dynamics remain unchanged. The performance is benchmarked against the KMV implementation of the Merton model, outlined by Bharath and Shumway (2008). Table 1 reports the results from applying maximum likelihood estimation to the asset parameters of the GBM underlying the Merton model and the DEJD process underlying the DEJD credit risk model. The accompanying standard errors for the DEJD credit risk model are computed using the outer product method (Hamilton (1994)). The parameter estimates for both the Merton model and the DEJD are highly significant, with a few exceptions for the GBM mean of BIN and the DEJD jump rate parameters and of SNS. On average, table 1 shows that 31% of the price movements that is ascribed to the volatility component of the Merton model is now captured by the jump component of the DEJD model. It is key to understand the logic behind the set-up of credit
risk models to interpret the results in table 1. Both credit risk models GBM and DEJD rely on the accounting standard that . Consequently, for fixed, this implies that the higher the leverage of a company, the less impact a change in equity return has on the return on assets . Therefore, when the leverage of a company is high, a series of high shocks in equity will bring about only a series of weak shocks in assets. (This can also be observed from the relation for asset volatility in the Merton model.) In practice, however, most often it are these shocks in equity that will cause a company to ‘puncture’ its default barrier when its already close to the default barrier (i.e. when is low, or is high). As a result of this property, the volatility ( ) and jump rate parameter ( ) estimates of the most highly leveraged banks in our sample (SNS, ING) are relatively low. The jumps that nonetheless do occur for ING are of “average” magnitude relative to the other banks in the sample ( bps and bps), whereas the jumps of SNS are relatively large ( bps and bps). Combined with the high leverage of SNS, it causes the implied asset value resulting from the KMV-implementation to be much lower if we account for jumps (79,081) than if we do not account for
MET | Volume 20 | ßETA Special | 2013
23
Table 2: Asset statistics 2011: KMV results Notes: This table reports the BIC, PD and LGD at the end of 2011 corresponding to a maturity of . The figures are based on the asset parameter estimates reported in table 1. BIC(A): Bayesian Information Criterion of the model fit to asset returns; BIC(E): Bayesian Information Criterion of the model fit to equity returns; PD: probability of default in %; LGD: Loss given default in EUR mio. (corrected for the default-free value of debt); EL: Expected losses in EUR jumps (79,207). The implied asset values for GBM and DEJD for most of the other banks in table 1 are much closer to each other, as the expected values of the returns for GBM and DEJD (not reported here) are also much closer to each other in these cases. Table 2 shows the results for the fit of both models as expressed by the Bayesian Information Criterion (BIC), and the values for the risk related measures PD, LGD and EL resulting from the asset parameter estimates in table 1. Observe that accounting for jumps does not necessarily also imply a better fit, since the estimation procedure is also more sensitive to outliers. This is particularly exhibited by the BIC(A) values for SNS (-18,602 and -18,603 for GBM and DEJD, respectively). Also bear in mind that the BIC(A) values reported here relate to the fit of a distribution to the implied asset returns. As I showed earlier, these values may differ depending on the model (GBM or DEJD) that is used. Hence, the BIC(A) values may relate to the fit of the models to different data series, which makes a one-to-one comparison of the BIC(A) values complicated. Therefore, I also report the BIC(E) values from the parameter estimation of the equity returns. Because the equity returns do not suffer from this imperfection, and because we essentially regard the return on asset as a scaled version of the return on equity , the BIC(E) values provide us with additional support in determining the best fit. From these results, we conclude that under any scenario the DEJD model provides a better fit for at least 4 out of 5 banks (ING, LAN, KAS and BIN). In the other case (SNS) the results show no clear preference for any model in particular, that is, if we take the worst performing measure BIC(A) instead of BIC(E)
as the leading point of reference. The impact of accounting for jumps on the risk-related measure PD varies from a difference of 0.004% for BIN up to 36.536% for SNS. The main cause of the large differences in PD for ING (4.872%) and SNS (36,536%) can be explained by their lower estimates for the implied asset value by the end of 2011 (see table 1). The accompanying differences in loss given default (LGD) portray a somewhat different view, where the estimates of the LGD are EUR 3 mio. lower for DEJD than GBM in the case of BIN, up to EUR 3,832 mio. for ING. The differences in LGD can be explained by the higher values for the drift term for DEJD. Although the chance of leaping over the default barrier is higher as a result of the jump component, the higher overall estimates for make a gradual recovery from a downward jump easier, resulting in higher LGD estimates. Nonetheless, the resulting figures for the expected losses (EL) are still higher for the DEJD process than for the GBM, ranging from EUR 0 mio. for LAN and BIN to EUR 125 mio. for SNS.
24
MET | Volume 20 | Ă&#x;ETA Special | 2013
Conclusion This thesis develops a new structural credit risk model that incorporates asymmetric jumps and applies it to five Dutch banks for the period 2006–2011. It is based on the DEJD model by Kou (2002) for valuing financial options. I also propose a recursive estimation approach which integrates the maximum likelihood procedure from Ramezani and Zeng (2007) into the KMV framework to obtain parameter estimates for the DEJD credit risk model. The BIC values resulting from the performance evaluation suggest that the DEJD credit risk model provides an over-
all better fit than the KMV-Merton model. The resulting parameter values of the jump component have large implications on the estimates of some of the risk related measures, leading to large differences in the probability of default. The results show that asymmetric jumps are an important feature to account for in assessing the credit risk of companies or institutions that attain high leverage ratios. The stakes of supplying good estimates of risk measures to central banks and investors are high, especially when a company is close to default. Hence, the model provides a useful tool to complement existing methods in obtaining quantitative risk measurements for regulatory agencies and other beneficiary parties. Additionally, the set-up of the model provides a hands-on tool for stress testing a company of interest by manipulating its asymmetric jump component. Among my suggestions for further study are applying the DEJD credit risk model to a larger dataset. Although the results in this thesis are highly promising, its application to a larger number of companies should provide more support for the observed strengths and weaknesses of the model. Furthermore, I also suggest to relax the assumptions on the static drift and volatility terms, along with relaxing the assumption on the fixed default barrier. Increasing the stochasticity of the model may reveal interesting characteristics of the equity and asset dynamics, especially when it is benchmarked against other jump-diffusion models. Finally, improvements in the maximum likelihood estimation procedure may be required in order to account better for the topological equalities among the jump rates of the equity and asset returns.
[2] F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81(3):637– 654, May–June 1973. [3] R. Cont. Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1(2):223– 236, 2001.
PICTURE
Acknowledgements I would like to thank my university thesis supervisors, Dr. Michel van der Wel and Dr. Kees Bouwman, to whom I am very grateful for guiding me throughout the writing process of this thesis. I also owe a lot to the Financial Stability department at the Dutch central bank and Dr. Tijmen Daniëls in particular, who have provided me with numerous helpful remarks and suggestions. References [1] S. Bharath and T. Shumway. Forecasting default with the merton distance to default model. Review of Financial Studies, 25(10):1–31, 2008. MET | Volume 20 | ßETA Special | 2013
25
An intermodal container network model with flexible subcontracted transport Bart van Riessen Rudy Negenborn Rommert Dekker Gabriel Lodewijks Erasmus University Rotterdam
An intermodal container transportation network is being developed between Rotterdam and several inland terminals in North West Europe: the European Gateway Services network. To use this network cost-efficiently, a more integrated planning of the container transportation is required. The most relevant aspects of such a planning are identified with a new model. This model introduces three new features to the intermodal network planning problem. Firstly, a combination of a pathbased formulation with a minimum flow network formulation is used. Secondly, overdue deliveries are penalized instead of prohibited. Thirdly, the model combines self-operated and subcontracted services. Two versions of the model are applied at two different levels. At a tactical level, the optimal service schedule between the network terminals is determined, considering barge or rail modes and both operation types. The most influential costs in this problem are determined. Another version of the model is applied at an operational level. With this model the impact of a disturbed service is determined, by comparing the undisturbed planning with the best possible update after the disturbance. Also the difference between an optimal update and a usual local update is measured, defined as the relevance. It is shown that each of the models is suitable for solving the problems. This paper focuses on the model and methodology at a tactical level to determine the optimal service schedule.
26
Introduction This paper proposes a new model to study planning in intermodal networks. A tendency of more integrated supply chains has sparked initiative in North-West Europe to create transportation networks for containers [6, 11, 13, 14]. Development of intermodal container networks These intermodal container transportation network are generally a cooperation between multiple barge service operators, rail service operators and terminals. Veenstra [18] introduced the concept of an extended gate: a hinterland terminal in close connection to the sea port, where customers can leave or pick up their standardized units as if directly at a seaport. The seaport can choose to control the flow of containers to and from the inland terminal. This control by the seaport distinguishes the extended gate from a dry port as defined by Roso [15] and introduces a central management for the intermodal container network. This concept has been implemented in the European Gateway Services (EGS) since 2007, a subsidiary of Europe Container Terminals (ECT) with three seaports in Rotterdam. The network consists of these three seaports and an increasing number of terminals in North-West Europe (see Figure 1). Definitions: intermodal and synchromodal The central management of the network allows for central intermodal network planning. Intermodal planning is defined as Multimodal transport of goods, in one and the same
Figure 1: Overview of EGS network [source: European Gateway Services]
MET | Volume 20 | Ă&#x;ETA Special | 2013
due times and the possibility of using
intermodal transport unit by successive modes of transport without handling of the goods themselves when changing modes [17]. With intermodal planning, the routing of containers with multiple consecutive services is possible, using intermediate transfers of the containers at network terminals. On top of that, a network with centrally planned transportation can use real-time switching, the possibility of changing the container routing over the network in real-time to cope with transportation disturbances. The combination of intermodal planning with real-time switching is often referred to as synchromodal planning, a new term at the agenda of the Dutch Topsector Logistiek [16]. However, no unambiguous definition for synchromodality exists yet. In this study, the following definition for synchromodality is used: intermodal planning with the possibility of real-time switching, or online intermodal planning. New aspects of the proposed model The study focuses on the cost-impact of using intermediate transfers and real-time switching. Existing intermodal planning models do not suffice for this purpose, because they do not allow flexible time restrictions for delivery nor the combination of self-operated and subcontracted services in the network. The daily practice in the container transportation is that planners and customers agree in mutual consultation on delivery times, both are flexible in case of disturbances. Secondly, container transportation networks use a combination of self-operated services and subcontracted services. In the latter case, transportation is paid for per TEU (twenty feet equivalent unit, a standardized container size measure). In the case of self-operated services, the network operator pays for the entire barge or train and incurs no additional costs per TEU, except for the loading and unloading of containers (handling costs). A new model is proposed that copes with both these aspects of container networks. The model was used in two forms: first, the model was used for a service network design, where the optimal frequency of services between all terminals in the network is determined. Secondly, an adapted model was used to assess the impact of disturbances on the network transportation costs. Also, the difference to a simple local update and a full update to cope with the disturbance was determined.
MET | Volume 20 | IĂ&#x;ETA Special | 2013
Structure of the paper This paper focuses on the model at a tactical level, the service network design model. Section II describes literature on existing service network design models, Section III introduces the new intermodal container network model. The case of EGS is used as an example for the intermodal container network model of this study in Section IV and the results of the experiments are discussed in Section V. Section VI concludes the paper and proposes further research. Literature review In academic literature, three levels of network planning are distinguished [3, 12]: strategic, tactical and operational planning. The exact boundary between these levels often depends on the point of view of the planning. In general, strategic planning focuses on long-term network design, such as locations of terminals or transport hubs (e.g. Ishfaq and Sox [7]). An overview of hub-location problems (HLPs) is provided by Kagan [10]. Operational planning focuses on the day-to-day planning of network transportation (e.g. Jansen, Swinkels et al. [9], Ziliaskopoulos and Wardell [20]). An overview is provided by Crainic and Kim [5]. The intermodal container network model was also applied at an operational level to identify important categories of disturbances. This paper will focus on a tactical level planning, the service network design. Service network design consists of the following aspects as described by Crainic [4]: the selection and scheduling of the services to operate, the specification of the terminal operations and the routing of freight. Network design models are often MIP-based formulation of a network structure where nodes represent terminals and arcs represent services [4]. Multiple modes can travel between the same network terminals, these are represented by multiple arcs. Both the assignment of cargo to routes and the number of services on each corridor are considered simultaneously. In the existing literature about intermodal container transportation networks, several service network design models occur, which can be categorized in two types: • Minimum costs network flow models (MNCF) • Path-based network design models (PBND) Both types of models are able to consider capacitated flow
27
Table 1: Examples of existing service network design models and multiple commodities, see Table 1. In this sense a commodity, or equivalently cargo class, is used to denote a set of containers that have equal properties, such as mass, origin, destination and delivery time. MNCF type of models have the possibility of flexible routing of cargo over various links in the network. Also, explicit constraints on the link capacity can be set. However, the main disadvantage is the number of decision variables for multi-commodity, multi-mode formulations. A variable is required for each cargo class on each arc. For applications with many origin-destination pairs, mass categories and delivery times, the number of decision variables becomes too high for practical computation times. For PBND type of models, the possible paths for each cargo class can be predetermined. A path is the exact route of a container using subsequent services and terminals. This reduces the number of decision variables significantly, provided that the number of possible paths is kept at a low enough number. However, with the traditional PBND formulations, the capacity of services travelling on each arc cannot be restricted explicitly, as multiple paths for the same or different cargo classes coincide on single services. For this reason, the model introduced in the next section uses a new formulation that combines the arc capacity restrictions with the routing of containers over predetermined paths. Some of the existing tactical service network formulations use strict constraints on delivery time (e.g. Ziliaskopoulos [20]). These strict constraints do not model the flexibility that transportation planners have in consultation with customers. Other models use formulations that model the economies of scale that occur when cargo is consolidated on an arc (e.g. Ishfaq [8]). The practice in current intermodal container networks is that multiple service and terminal operators cooperate and in this perspective the largest economies of scale occur by selecting services operated by the network operator (self-operated services) or subcontracted services. The difference in cost structure between these two cannot be modeled in
28
the existing formulations for the economies of scale. Hence, the proposed model uses an alternative formulation that better suit the flexible delivery time restrictions and the combined use of self-operated and subcontracted services. Proposed model The intermodal container network model proposed in this study differs on three main aspects from existing models for intermodal freight transportation: 1) The model combines a path-based formulation with a minimum cost network flow formulation to restrict the problem size and while the explicit capacity restrictions on the network arcs are still included. 2) Overdue delivery is not strictly restricted, but penalized by a penalty per TEU per day overdue. 3) The service network design allows for combined use of self-operated and subcontracted services. The model uses four sets of decision variables: the service frequencies denote the amount of self-operated services between terminal and with mode , defined as corridor . The service frequencies are determined while considering multiple demand periods . The amount of TEU of mass on self-operated or subcontracted services on corridor in period is denoted by the flow variables and , respectively. Finally, the path selection variable denotes the number of TEU of cargo class transported on path in period . A cargo class is a group of containers with equal origin and destination, the same weight class and with the same period for delivery. The objective of the model consists of four cost terms:
(1)
where and denote the costs of operating a service or subcontracting one TEU on corridor , respectively, are the costs per transfer, the number of transfers on path and are the costs per TEU for each day late delivery. The constraints of the model are the following:
MET | Volume 20 | Ă&#x;ETA Special | 2013
to the integer set of natural numbers. The model is applied to the real-world case of the the network transportation in the European Gateway Services network. Case study disturbances at EGS
Here, denotes the demand of cargo class in period ; associated with each cargo class is the weight class and due period , that is the time available for transportation of a container in cargo class . The mapping of selected paths to the flow variables is done with . The TEUcapacity and maximum weight of a service on corridor is denoted by and , respectively. Hence, the first term of the objective represents the cost for the selected services to operate self; the second term sums all costs for subcontracted transports in all periods ; the third term denotes the costs for transfers and the fourth term is the penalty cost for overdue delivery. Constraint 2 ensures that all transportation demand is met in all periods. The allocation of the demand to the paths is mapped to the flow variables by Constraint 3. This mapping depends on the used services (self-operated or contracted) in the predefined paths. Constraints 4 and 5 are the capacity constraints on each corridor, dependent on the selected number of services. Note that the capacity on subcontracted services is considered unlimited in this formulation. Constraint 6 ensures that the auxilary variable equals the total number of overdue days for all TEU of cargo class on path , by measuring the difference in the available delivery period and the predetermined path duration . If cargo class is on time using path , Constraint 9 ensures that is equal to 0. Constraint 7 ensures the same number of self-operated services back and forth on a corridor, to keep the equipment balances over the network. Finally, Constraints 8 and 10 ensure the nonnegativity of the other variables and Constraint 11 restricts MET | Volume 20 | ßETA Special | 2013
A. Network and paths The EGS network has been continuously growing with terminals and connections. This study’s focus is on the network as shown in Figure 1: it consists of three ECT seaports in Rotterdam (Delta, Euromax and Home) and several inland terminals in the Netherlands, Belgium and Germany. In this network, suitable paths between all locations are predetermined. To do this, Yen’s k-shortest path method is used [19]. This method is able to select shortest paths without loops in a network, based on Dijkstra’s algorithm. In this study, paths are selected based on the geographical length of the network arcs, up to a length of three times the length of the shortest path. Subsequently, the number of paths is reduced by omitting all paths that consist of more than three transportation legs and by omitting paths that have a detour of more than 10% in any of the transportation legs. This detour is measured as the difference in distance to the destination from both ends of leg , i.e. a path is considered to make a detour if in any of its legs. All of the remaining paths describe a geographic route with one to three transportation legs in the network. The final step is to generate all intermodal possibilities of such a route, based on the possibility of barge and train corridors between the network locations. Truck is only considered for the last (first) leg before the hinterland destination (origin). E.g. a route Rotterdam Delta Moerdijk Willebroek results in the following paths:
where both Delta and Moerdijk have a rail and barge terminal, but Willebroek doesn’t have a rail terminal. Note that truck mode is only considered for the last leg. With each path is associated a travel time and a number of transfers .
29
Table 2: Results EGS network design B. Costs and transportation demand The cost parameters in the study are based on the actual costs in the current operation of the EGS network. For that reason, the costs in this paper are masked by a confidentiality factor. The corridor costs per service and per TEU are modeled with a linear approximation of the actual network costs and the corridor length , e.g. . For each transfer a cost of is used. The cost of overdue delivery per 1 TEU per day is . An analysis of the transportation on the EGS network in the period of January 2009 - June 2012 did not show significant periodic behaviour. As the transported volume grew fast in 2010, the weekly demands were further analysed based on the period January 2011 - June 2012. Using Pearson’s Goodness-of-fit test [1], the hypothesis of normality of the distribution of the weekly volume was accepted with a -value of 0.93. Hence, for all cargo classes the parameters of the normal distribution of the weekly volume is determined. With this, ten 10-percentile subsets of the normal distribution are generated for each cargo class. These sets are used as ten periods in the proposed model. The model will solve the optimal service frequencies simultaneously, optimized for all ten 10-percentile sets.
due times on the results, • a case without the possibility of selecting subcontracts. This shows the impact of using subcontracts along with the network services. The results of the basic case and the three hypothetical experiments are shown in Table II. The table shows the resulting costs in total and separately for the four objective terms.
Results A. Experiments The model is solved for the EGS-case with AIMMS 3.12, using Cplex 12.4. Four different experiments are carried out. The basic case is the experiment with the parameters as described above. The other three cases are hypothetical situations to assess the influence of some effects: • a case where the tranfer costs are lowered by 50%, to find the effect of transfer costs on the service schedule, • a case where due times are ignored, or equivalently, the overdue costs are set to zero. This shows the impact of
B. Discussion The proposed intermodal container network model was able to solve the various experiments fast in most cases. Computation times varied between 2-4 minutes, except for the case where no subcontracts were allowed. Solving that hypothetical case took 1.5 hours. The regular solution time of minutes makes the model suitable for the service network design of the current problem instance. With increasing problem sizes, the number of arc-related variables ( , , ) increases quadraticly with the number of terminals. The number of paths (and path-related variables and ) could increase exponentially, but smart path generation based on experience or other insights can be applied to restrict the number of paths. Hence, it is expected that the model will perform well for larger problem instances as well. The case with 50% transfer costs obviously has lower costs for transfers. However, also the transportation costs (i.e. the costs for self-operated services and subcontracted transports) are reduced by 7.3%. The number of containers for which an intermediate transfer takes place increases from 1.2% to 5.4%. These results suggest that the network operator must look into the combined business model of services and terminals. Terminals with low utilization of the available capacity can easily handle intermediate transfers, and in that way possibly reduce transportation costs. The case where due times are omitted also shows a reduction of transportation costs, with 22%. Hence, in the studied case, 22% of the transportation costs (€77) are made in order to deliver on time. On top of that, in the basic case the model ‘accepts’ a fictional penalty of €122 for late delivery. This shows the importance of the overdue delivery flexibility introduced in the model. The case where subcontracted transports are not considered shows the importance of the combination of self-
30
MET | Volume 20 | ßETA Special | 2013
operated and subcontracted transports. Without subcontracted transports, the total transportation takes place with self-operated services. Operating all these services increases the transportation costs with 61% compared to the basic case solution. Even then, the number of late containers increases with 25%.
newly proposed model to a practical case of current container network development.
Acknowledgment The authors would like to thank ECT and EGS for providing the opportunity for the research into the EGS network as well as for the data about network costs and transportation demands. This allowed the authors to apply the
References [1] Cochran, William G. “The 2 test of goodness of fit.” The Annals of Mathematical Statistics (1952): 315-345. [2] Crainic, Teodor G., and Jean-Marc Rousseau. “Multicommodity, multimode freight transportation: A general modeling and algorithmic framework for the service network design problem.” Transportation Research Part B: Methodological 20.3 (1986): 225-242. [3] Crainic, Teodor Gabriel, and Gilbert Laporte. “Planning models for freight transportation.” European Journal of Operational Research 97.3 (1997): 409-438. [4] Crainic, T.G., “Service network design in freight transportation”, European Journal of Operational Research, 122.2 (1996): 272-288. [5] Crainic, Teodor Gabriel, and Kap Hwan Kim. “Intermodal transportation.” Transportation 14 (2007): 467537. [6] Groothedde, Bas, Cees Ruijgrok, and Lori Tavasszy. “Towards collaborative, intermodal hub networks: a case study in the fast moving consumer goods market.” Transportation Research Part E: Logistics and Transportation Review 41.6 (2005): 567-583. [7] Ishfaq, Rafay, and Charles R. Sox. “Intermodal logistics: The interplay of financial, operational and service issues.” Transportation Research Part E: Logistics and Transportation Review 46.6 (2010): 926-949. [8] Ishfaq, Rafay, and Charles R. Sox. “Design of intermodal logistics networks with hub delays.” European Journal of Operational Research (2012). [9] Jansen, Benjamin, et al. “Operational planning of a large-scale multi-modal transportation system.” European Journal of Operational Research 156.1 (2004): 41-53. [10] Kagan, E. Cost and sustainability trade-offs in intercontinental intermodal supply chain design. Erasmus University Rotterdam (2012). [11] Lucassen, I.M.P.J. and T. Dogger. “Synchromodality pilot study. Identification of bottlenecks and possibilities for a network between Rotterdam, Moerdijk and Tilburg”, TNO, 2012. [12] Macharis, C., and Y. M. Bontekoning. “Opportunities for OR in intermodal freight transport research: A review.” European Journal of Operational Research 153.2 (2004): 400416.
MET | Volume 20 | ßETA Special | 2013
31
Conclusions The following general conclusions are drawn for intermodal container networks, based on the results of this study: • With the current cost structure for transportation in North-West Europe, intermediate transfers will not result in a cost reduction. • A combined business model for network terminals and transportation provides opportunities for reducing transportation costs, by additional use of intermediate transfers. • The linear cost path-based network model is suitable for determining service frequencies in an intermodal transport network. However, the model has some limitations as well. The model does not take waiting times at terminals into account. In practice, a container has some waiting time at each terminal, depending on the service schedule. The expected waiting time depends on the resulting service frequencies that the model provides. Hence, an useful extension of the model would include the expected waiting times in the optimal service network design. A second limitation is the inland destination of containers. In the EGS example, network terminals are used as final container destination or container origin. In practice, several inland terminals can be used, depending on the inland warehouse address. The model could be extended to include inland addresses or include multiple inland terminals for a specific cargo class. Other possible extension are the inclusion of demurrage and detention costs for containers that are long on a terminal or in transit or the inclusion of fill-up cargo of empty containers with low time-pressure.
[13] Port of Rotterdam, “Port Vision 2030”, http://www. portofrotterdam.com/en/Port/port-in-general/portvision2030/Documents/Port-vision-2030/index.html [14] Rodrigue, Jean-Paul, and Theo Notteboom. “Dry Ports in European and North American Intermodal Rail Systems: Two of a Kind?.” (2012). [15] Roso, Violeta, Johan Woxenius, and Kenth Lumsden. “The dry port concept: connecting container seaports with the hinterland.” Journal of Transport Geography 17.5 (2009): 338-345. [16] Topsector Logistiek. “Uitvoeringsagenda Topsector Logistiek” Ministerie van Economisch Zaken, Landbouw en Innovatie (2011) [17] UNECE, ITF, Eurostat. “Glossary for Transport Logistics” (2009) [18] Veenstra, Albert, Rob Zuidwijk, and Eelco van Asperen. “The extended gate concept for container terminals: Expanding the notion of dry ports.” Maritime Economics & Logistics 14.1 (2012): 14-32. [19] Yen, Jin Y. “Finding the k shortest loopless paths in a network.” Management Science 17.11 (1971): 712-716. [20] Ziliaskopoulos, Athanasios, and Whitney Wardell. “An intermodal optimum path algorithm for multimodal networks with dynamic arc travel times and switching delays.” European Journal of Operational Research 125.3 (2000): 486-502. Notes [1] Masked by confidentiality factor. [2] Costs masked by a confidentiality factor.
32
MET | Volume 20 | ßETA Special | 2013
Jouw studievereniging wil het je zo voordelig en makkelijk mogelijk maken. Dus hebben ze een boekenleverancier die daarbij past.
Jouw studievereniging werkt nauw samen met studystore. En dat heeft zo z’n voordelen. Doordat we snugger te werk gaan, kunnen we jouw complete boekenpakket snel aanbieden tegen een scherpe prijs.
MET | Volume 20 | Ă&#x;ETA Special | 2013
33
Multivariate Linear Mixed Modeling to Predict B Els Kinable Erasmus University Rotterdam
Abstract
Background The purpose of this research is to derive efficient predictions of the basal metabolic rate (BMR) of Malaysian youngsters. BMR is a measure of the minimal rate of energy expenditure while not being involved in any physical activity (Henry, 2005). It is used to calculate energy requirements which on their turn serve general prescriptive and diagnostic purposes, including the assessment of appropriate drug doses and food needs, as well as research and treatment of obesity and diabetes (Gibson & Numa, 2003; Henry, 2005). We take a new approach to obtain efficient predictions of BMR by developing a multivariate model that uses linear mixed models (LMM). Linear mixed modeling is a popular method to analyze longitudinal data, because it can deal both with unbalanced data and with the correlation between repeated measurements through the inclusion of random effects (Fieuws & Verbeke, 2006; Jacqmin-Gadda et al., 2007; Laird & Ware, 1982). The data used in this research consists of repeated measurements of 27 anthropometric variables of 139 Malaysian youngsters in the age of ten to fifteen years. Among these variables are lean body mass, the length of several body parts, circumferences and skin fold measures as well as general characteristics such as age, weight and height. We conclude that our approach indeed results in efficient predictions of BMR. Due to confidentiality reasons, this article will only discuss the methodological outline and no results will be provided. Section 2 and 3 will cover the theory of the univariate and multivariate LMM, respectively. In section 4, we will discuss the estimation of the multivariate LMM and in section 5 we will discuss the derivation of predictions of BMR. Univariate Linear Mixed Model Linear mixed models are a common estimation method to deal with continuous data that consists of longitudinal repeated measurements (Verbeke & Molenberghs, 2009). They were introduced in biostatistics in 1982 by Laird and Ware, after which a variety of names is given to this type of modeling, including but not limited to hierarchical linear models, multilevel linear models and random or mixed effects models (Cameron & Trivedi, 2009; Verbeke & Molenberghs, 2009). Several features render this method suitable for analyzing nested data. Firstly, mixed models
34
MET | Volume 20 | Ă&#x;ETA Special | 2013
BMR
can deal with the correlation between repeated measurements (Jacqmin-Gadda et al., 2007; Laird & Ware, 1982; Verbeke & Molenberghs, 2009). After all, measurements of the same variable for the same individual are likely to be correlated with previous measurements of this variable for this individual. In ordinary linear models this dependency would result in inaccurate standard errors, while we assume independent and normally distributed errors that have constant variance (Cameron & Trivedi, 2009; Raudenbush & Bryk, 2002). As we will see later, the multilevel approach of an LMM allows for modeling more than one random error term, thus dealing with the dependencies between repeated measurements. By doing so, it enables to model and analyze the within- and between-individual variation (Laird & Ware, 1982). Another desirable feature of mixed models is their ability to deal with unbalanced data (Laird & Ware, 1982). Unbalanced data occur when there is an unequal number of measurements across subjects or when measurements are not taken at fixed time points (Verbeke & Molenberghs, 2009). This is frequently encountered in repeated measurements where it is often impossible to maintain exactly the same time interval between two measures for each respondent. Moreover, unbalancedness may result from missing values which are also common in a longitudinal research setting. Univariate linear mixed models (ULMM) consider a single response variable that is measured repeatedly over time. We assign the random variable to the response of individual at time , where and . For the th subject we then obtain the -dimensional vector after stacking the repeated measurements of each individual. As the number of measurements is allowed to vary over individuals, does not necessarily have the same length for each respondent. The hierarchical formulation of the ULMM is provided by (Verbeke & Molenberghs, 2009)
respectively. is a vector of length of unknown regression coefficients, which are called fixed effects. The random component consists of a subject-specific random effects vector, , and of , a vector containing the usual random error terms . The random effects in are independent and have a -dimensional normal distribution with mean vector zero and general covariance matrix (Verbeke & Molenberghs, 2009). The elements on the diagonal of represent the between-subject variance. The elements in are assumed to be independent and normally distributed with mean vector zero and covariance matrix . is frequently chosen to be equal to , where denotes the -dimensional identity matrix (Verbeke & Molenberghs, 2009). can be considered as the within-subject variance (Cnaan et al., 1997). The merit of the between- and within-subject variation becomes evident in the marginal formulation of the ULMM, which is usually of primary interest (Verbeke & Molenberghs, 2009): (2) Here the parameters are considered as error terms, implying that ERROR now consists of two types of error terms (Cnaan et al, 1997). In this way the ULMM enables to model and analyze the within- and between-individual variation, thus dealing with the dependencies between repeated measurements (Laird & Ware, 1982).
In the ULMM postulated in (1) we can distinguish a deterministic component, , and a random component, (Cameron & Trivedi, 2009). and are matrices of known covariates of dimensions and ,
Multivariate Linear Mixed Model A multivariate linear mixed model (MLMM) is used when we have repeated measurements on multiple response variables per individual. Its structure allows connecting the multivariate response variables by specifying a joint multivariate distribution of the random effects and optionally through the covariance matrix of the measurement error (Fieuws & Verbeke, 2004; Morrell et al., 2012). This results in increased flexibility which leads to more efficient parameter estimates, thus benefitting the predictive ability of the model (Morrell et al., 2012; Shah et al., 1997). In an MLMM setting, the vector from the previous section becomes an dimensional matrix where denotes the number of response variables that is obtained from an individual. Each row is therefore a joint realization of the responses of respondent , such that equals (Schafer & Yucel, 2002; Shah et al., 1997). The multivariate linear mixed model takes on the
MET | Volume 20 | Ă&#x;ETA Special | 2013
35
(1)
following form:
(3) In this equation, and remain and matrices of known covariates, respectively. , however, is now a matrix of regression coefficients that are common to all respondents and is a matrix of coefficients specific to individual (Schafer & Yucel, 2002). is now also an matrix of which the rows are independently distributed as , where is an unstructured covariance matrix with dimensions (Shah et al., 1997). We can vectorize the model in (3) by stacking the columns of the matrices, thus obtaining (4) Here, which is an dimensional vector and in the same way which has the same dimension as . and are block diagonal matrices in which the blocks consist of the and matrices of explanatory variables for each of the dependent variables (Morrell et al., 2012). has dimension and is an matrix, which is usually a subset of as in the univariate model (Shah et al., 1997). Here and denote the total number of fixed effects for all response variables and the total number of individual random effects, respectively (Morrell et al., 2012). is a vector of fixed-effects regression parameters and is a vector of individual random effects. is assumed to be where has dimensions (Shah et al., 1997). The error terms are with having dimensions (Morrell et al., 2012). Conditional on the random effects we define responses to be independent over time as before, such that , where denotes the Kronecker product and is an identity matrix (Morrell et al., 2012). We assume different individuals to be independent so that for we have and . The fixed effects are also independent from the random effects such that (Shah et al., 1997). The multivariate responses of an individual are tied together through the covariance matrix of the random effects and of the error terms. We define to be an unstructured covariance matrix which allows for covariance between the random effects within a particular response variable and it also allows for covariance among the random effects of different response variables (Fieuws
36
& Verbeke, 2004; Morrell et al., 2012). The error terms in the covariance matrix are correlated by allowing offdiagonal elements to be different from zero. The different responses of an individual are therefore joined by allowing both the random effects and the measurement error to be correlated. Parameter Estimation in the Multivariate LMM The parameters in the multivariate linear mixed model are estimated by means of the pairwise fitting approach that was introduced by Fieuws and Verbeke (2006). They developed this approach as a solution to dealing with models that have a large number of outcome variables. Previous methods were not able to overcome computational restrictions that are related to fitting such high dimensional models. This dimensionality problem can be solved by a two-step approach using pairwise estimation. For each response variable a univariate model is defined but instead of maximizing the likelihood of the multivariate model, it uses pairwise bivariate models which are fitted separately (Fieuws & Verbeke, 2006). If we define to be the vector containing all fixed effects and covariance parameters in the multivariate model, then represents the log-likelihood contribution of individual to the multivariate mixed model (Fieuws & Verbeke, 2006). In the first step, we fit bivariate models, namely all joint models for all possible pairs of outcomes: (Fieuws et al., 2007). Hence, the log-likelihoods that will be estimated have the following form (Fieuws & Verbeke, 2006) (5) and . is the vector containing all parameters from the bivariate joint mixed model that correspond to the pair . Hereafter, we will refer to the pair by , where is the total number of pairs. Equation (5) then becomes (Fieuws & Verbeke, 2006) (6) We stack the pair-specific parameter vectors into the vector . Clearly, an estimate of is obtained by separately maximizing each of the bivariate log-likelihoods (Fieuws et al., 2007). Fieuws and Verbeke (2006) stress the distinction between and . Whereas some parameters
MET | Volume 20 | Ă&#x;ETA Special | 2013
in will have a single corresponding element in , others will have multiple equivalents in . The covariances between random effects of different outcomes, for example, will have a unique counterpart in . However, the covariance between random effects of a single outcome or fixed effects of a single outcome in will have multiple counterparts in (Fieuws & Verbeke, 2006; Fieuws et al., 2007). In this case, a single estimate can be obtained by averaging the elements in that refer to a single counterpart in , thus obtaining the pair-specific estimates . However, this approach is not suitable for obtaining the standard errors as we should also account for the variability among pair-specific estimates (Fieuws & Verbeke, 2006). Moreover, two pair-specific estimates that belong to two bivariate models with a common outcome contain overlapping information and they are therefore correlated (Fieuws & Verbeke, 2006). Correct inference for the asymptotic standard errors for the parameters in and can be obtained by using a pseudo-likelihood approach (Fieuws et al., 2007). This comprises the second step. The second step in the pairwise fitting approach combines the multiple ML estimates that refer to a single parameter in . We will first discuss the estimation of the standard errors using pseudo-likelihood theory after which we will describe the way in which estimates of can be obtained by taking averages. The pseudo-likelihood framework decomposes the joint likelihood into a product of marginal or conditional densities, where this product is supposed to be easier to evaluate than the joint likelihood (Fieuws & Verbeke, 2006). We can rewrite the pairwise log-likelihoods postulated in (5) in a pseudo log-likelihood format as (Fieuws & Verbeke, 2006) (7) This formulation deviates somewhat from the classical pseudo-likelihood estimation in which the same parameter vector is present in the different elements of the pseudolikelihood function whereas we have a pair-specific set of parameters for each likelihood (Fieuws et al., 2007). In line with the general pseudo-likelihood theory we can define the asymptotic multivariate normal distribution of as (Fieuws & Verbeke, 2006; Geys et al., 1999)
While and depend on the unknown parameters in , their estimates can be obtained by dropping the expectations and replacing the unknown parameters by their estimates (Fieuws & Verbeke, 2006). Lastly, we obtain estimates for the parameters in by averaging over the available estimates for a particular parameter, i.e. by averaging over the estimates in that have a single counterpart in (Fieuws & Verbeke, 2006). This is achieved by setting , where (Fieuws & Verbeke, 2006). is a matrix that contains the coefficients to calculate the averages. For example, if contains estimates of the same parameter then will contain on appropriate entries for this parameter. is the covariance matrix for obtained in (8) (Fieuws & Verbeke, 2006). Predicting BMR The pairwise fitting approach provides estimates for the fixed effects and their standard errors as well as the covariance and correlation matrices for the random effects and measurement error. We can use the output of the multivariate model to develop predictions of BMR. Since the estimates of the fixed effects contained in follow a multivariate normal distribution, we can predict BMR by conditioning on the other responses that are included in the multivariate model. To this end we use that the resulting predicted BMR values will also be normally distributed. Formally, we have (9) where is predicted BMR, is a vector containing the other response variables in the multivariate model and is the vector with the observed values hereof. To obtain the mean and covariance of the predicted BMR values, respectively and , we use (10) and . (11) The elements of and stem from the partitioning
where is a block-diagonal matrix with diagonal blocks and is a symmetric matrix with blocks , defined as (Fieuws & Verbeke, 2006; Geys et al., 1999) MET | Volume 20 | Ă&#x;ETA Special | 2013
37
. Here, denotes the total number of response variables that are included in the multivariate model. is the mean value of BMR at a particular point in time that we obtain after multiplying the parameter coefficients of BMR resulting from the multivariate estimation by the known covariates of the fixed effects. is the vector of the mean values of the other response variables that are obtained in the same way as . The covariance matrix is a partitioning of the variance that is derived under the marginal formulation of the multivariate model at a particular point in time. Hence, using this is equal to , where now has dimensions . In conclusion, by deriving the conditional distribution of the multivariate normal distribution we can obtain predicted values of BMR. References [1] Cameron, A.C. and Trivedi, P.K. (2009). Microeconometrics: Methods and Applications. 8th Printing. New York: Cambridge University Press. [2] Cnaan, A., Laird, N.M. and Slasor, P. (1997). Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Statistics in Medicine, 16, pp. 2349-2380. [3] Fieuws, S. and Verbeke, G. (2004). Joint modelling of multivariate longitudinal profiles: pitfalls of the randomeffects approach. Statistics in Medicine, 23, pp. 3093-3104. [4] Fieuws, S. and Verbeke, G. (2006). Pairwise Fitting of Mixed Models for the Joint Modeling of Multivariate Longitudinal Profiles. Biometrics, 62, pp. 424-431. [5] Fieuws, S., Verbeke, G. and Molenberghs, G. (2007). Random-effects models for multivariate repeated measures. Statistical Methods in Medical Research, 16, pp. 387-397. [6] Geys, H., Molenberghs, G. an d Ryan, L.M. (1999). Pseudolikelihood modeling of multivariate outcomes in developmental toxicology. Journal of the American Statistical Association, 94(447), pp. 734-745. [7] Gibson, S. and Numa, A. (2003). The importance of metabolic rate and the folly of body surface area calculations. Anaesthesia, 58, pp. 50-83. [8] Henry, C.J.K. (2005). Basal metabolic rate studies in
38
humans: measurement and development of new equations. Public Health Nutrition, 8(7A), pp. 1133-1152. [9] Jacqmin-Gadda, H., Sibillot, S., Proust, C., Molina, J. and ThiĂŠbaut, R. (2007). Robustness of the linear mixed model to misspecified error distribution. Computational Statistics & Data Analysis, 51(10), pp. 5142-5154. [10] Laird, N.M. and Ware, J.H. (1982). Random-Effects Models for Longitudinal Data. Biometrics, 38(4), pp. 963974. [11] Morrell, C.H., Brant, L.J., Sheng, S. and Metter, E.J. (2012). Screening for prostate cancer using multivariate mixed-effects models. Journal of Applied Statistics, 39(6), pp. 1151-1175. [12] Raudenbush, S.W. and Bryk A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. 2nd Ed. California: Sage Publications, Inc. [13] Schafer, J.L. and Yucel, R.M. (2002). Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics, 11(2), pp. 437-457. [14] Shah, A., Laird, N. and Schoenfeld, D. (1997). A random-effects model for multiple characteristics with possibly missing data. Journal of the American Statistical Association, 92(438), pp. 775-779. [15] Verbeke, G. an Molenberghs, G. (2009). Linear Mixed Models for Longitudinal Data. New York: Springer Verlag.
PICTURE
MET | Volume 20 | Ă&#x;ETA Special | 2013
MET | Volume 20 | Ă&#x;ETA Special | 2013
39
Congratulations to the winners of the BETA!