met Special Edition Ă&#x;ETA 2014
Price Discovery in the S&P 500 Index and Options Market Explaining individual product choice behavior: online social contagion on twitter A Multinomial Probit Model to Infer User Segments in On-line Search and Purchase Patterns Flow formulations for the Time Window Assignment Vehicle Routing Problem Evaluating the diversification benefits of new asset classes in a portfolio Ă&#x;ETA Special 2014
Award ceremony followed by a celabratory drink! Date: Place: Time: Info:
30 January 2015 Erasmus Paviljoen 14:00 beta-rotterdam.nl
Contents 4
Price Discovery in the S&P 500 Index and Options Market Gelly Fu
8
Explaining individual product choice behavior: online social contagion on twitter Didier Nibbering
14 20
A Multinomial Probit Model to Infer User Segments in On-line Search and Purchase Patterns Sabine den Daas
Flow formulations for the Time Window Assignment Vehicle Routing Problem Kevin Dalmeijer
24
Evaluating the diversification benefits of new asset classes in a portfolio Derrick Olij
Index of Advertisers Econometrie.com 3 Veneficus Back Cover
Colophon Medium Econometrische Toepassingen (MET) is the scientific journal of
Address | Erasmus University Rotterdam | Medium Econometrische Toepassingen
FAECTOR (Faculty Association Econometrics & Operations Research), a
| Room H11-02 | P.O. Box 1738 | 3000 DR Rotterdam | The Netherlands |
faculty association for students of the Erasmus University Rotterdam.
marketing@faector.nl | Acquisition: Bob Maks | +31 - (0)10 - 408 14 39
Website: www.faector.nl/met Š2015 - No portion of the content may be directly or indirectly copied, pubFinal Editing: Lars van Kempen, Susanne Pols | Design: Haveka, de grafische
lished, reproduced, modified, displayed, sold, transmitted, rewritten for publication
partner, +31 - (0)78 - 691 23 23 | Printer: Nuance Print, +31 - (0)10 - 592 33 62
or redistributed in any medium without permission of the editorial board.
| Circulation: 150 copies
MET | Volume 21 | Ă&#x;ETA Special | 2014
1
Special Edition Dear reader, Following the successful previous editions of the Best Econometric Thesis Award (βETA), FAECTOR will continue to organise this prestigious event in cooperation with Veneficus! To honor tradition we have also released a special edition of the MET magazine, also known as the βETA – MET. The βETA is an award organised for our students in Econometrics who have worked incredibly hard on their thesis. The previous years the βETA only hosted an award ceremony for Master students in Econometrics & Management Science, but this year the βETA will also reward a Bachelor student for his or her exceptional effort. Students who deem their theses fit for competition can sign up through our website. A selection procedure will determine whose thesis will make it into the nominee phase. Three Bachelor theses will end up being nominated and four Master theses will be nominated to compete versus each other, of course in their respective study stages. The nominations are made based on the following criteria: innovation, reproducibility, and scientific contribution, potential for publication and entrepreneurship potential. I proudly present the nominated master students for the βETA 2014: Didier Nibbering – Econometrics Kevin Dalmeijer – Operations Research & Quantitative Logistics Sabine den Daas – Business Analytics & Quantitative Marketing Derrick Olij – Quantitative Finance A warm congratulations to the authors of the nominated Bachelor theses: Gelly Fu Hao Zhou Mathilde Aarnink
Veneficus is a specialist in transforming complex dataanalyses to clear, visual output. They obtain the very best from your numbers and furthermore provide an improved integration of IT, finance and marketing processes. This MET will present the adjusted versions of the theses written by the nominees. So are you a master or bachelor student who would like to know how to write a good thesis? Read this magazine to gain important insights in this academic skill and also to learn more about the topics our nominees have researched. The winners will not only have a publication in our academic magazine, but will also be awarded with nice prizes during the ceremony! Interested in the βETA? Don’t hesitate to join the ceremony which will take place on Friday the 30th of January. Veneficus will give an interesting workshop which will definitely be worthwhile before the ceremony starts. All in all a great, academic day which will end with a celebratory drink to congratulate the winners once more. Please enjoy reading this special edition of the MET and I hope to see you at the ceremony, Lars van Kempen Educational Officer of the 49th board Special thanks to: The master thesis coordinators of the Econometric Institute: Prof. dr. Richard Paap – Econometrics Dr. Wilco van den Heuvel – Operations Research & Quantitative Logistics Dr. Erik Kole – Quantitative Finance Prof. dr. Dennis Fok – Business Analytics & Quantitative Marketing The Master jury: Prof. dr. Dick van Dijk Dr. Remy Spliet Prof. dr. Patrick Groenen Mr. Joost van der Zon
The Bachelor jury: Dr. Christiaan Heij Mr. Bart Keijsers Mr. Hoksan Yip
During the event itself our exclusive jury will announce the winner of the nominees in both categories during the award ceremony. The jury exists of top members of the Econometric Institute together with a representative of our sponsor Veneficus.
Veneficus: Mr. Robbert Bos Mr. Joost van der Zon
2
MET | Volume 21 | ßETA Special | 2014
ECONOMETRIE
JOB VACANCIES AND INTERNSHIPS
Follow us on Facebook
Price Discovery in the S&P 500 Index and Option Gelly Fu Erasmus University Rotterdam
An index market and the corresponding index options market are linked together by the same latent true price. Nevertheless, due to frictions between the markets, new information impacts the two markets in different ways. Price discovery leaders are the markets in which information is processed the quickest and in the most efficient way. The literature tries to explain the differences in price discovery leaders with the trading cost hypothesis and leverage hypothesis. This research uses intraday quotes of the S&P 500 index market and S&P 500 options market to analyse the price discovery in each market by using different measures for the price discovery process. The results indicate that there is significant price discovery in both markets and evidence is found supporting the leverage hypothesis.
4
Introduction In perfectly efficient and frictionless markets new information impacts the prices of all the different markets simultaneously. All prices move to a new equilibrium in unison, otherwise arbitrage opportunities would exist in the markets. As a consequence, informed investors are indifferent between trading in these markets. In practice, however, frictions exist and markets are not completely efficient, and thus new information will affect different markets in different ways. This imperfection has attracted much attention to price discovery, the process by which markets incorporate new information to arrive at a new equilibrium price. An interesting area of the price discovery research is the comparison between a market of a primary asset and the market of its derivatives, such as options. As options would be redundant assets in frictionless and efficient markets, it is important to understand the price discovery process between the primary market and the options market. Knowledge of this price discovery process is relevant to both options market makers concerned with managing adverse selection risk, as well as investors who are searching for signals about the future price movements (Chakravarty et al., 2004). Surprisingly little research has been done on the price discovery process between the stock index market and the index options market. The purpose of this research is to fill this gap in the literature and to provide more insight on the field of price discovery. Data Intraday price series of the S&P 500 index and S&P 500 options extracted from the Bloomberg database are used in this research. This highly liquid stock index enhances the price discovery analysis, as the intraday data has ample observations to make the results robust. Due to limited data availability, the sample includes the period of March 24th 2014 until April 16th 2014, which is the period between the expiration days of the options in March 2014 and April 2014. As derivatives with short-term maturities are the most liquid, the S&P 500 stock options that expire at April 19th 2014 are used in this research. Also, due to lack of observations and non synchronicity, only the options with a strike of 1700 and 1800 are considered.
MET | Volume 21 | Ă&#x;ETA Special | 2014
ns Market
Methodology Implied Stock Index Prices As Holowczak et al. (2006) mention, an important concern is that the price of an option not only depends on the price of the underlying asset, but also on the underlying assets return volatility and higher moments. This concern creates an issue on how to identify and control variables other than the underlying asset price and thus causes a bias if left unchecked. Following Holowczak et al. (2006), this concern is solved by forming a portfolio of a long position in a call option and a short position in a put option at the same strike and maturity. This portfolio approach exploits the put-call parity:
(1)
This equality holds for a stock (St), an European call (Ct) and an European put (Pt) both at the same maturity ( ) and same strike price (K). The dividend yield and interest rate at time t are denoted as dt and rt respectively. This equality holds exactly for the S&P 500 index, as the options of this index are European styled. The resulting portfolio value does not depend heavily on the volatility of the stock index returns, and other higher moments. The chosen call and put options, which depend on the option trading activities, are factors that can in uence the portfolio value. Holowczak et al. (2006) argue that the relative magnitude of the portfolio value at different times can differ by approximately a constant. They solve this issue by regressing the stock price on the portfolio value:
(2)
The relative contribution of an asset price to the true price innovation variance is defined as the information share (Hasbrouck, 2003). In short, these information shares measure the relative importance of a market in the price discovery process. When a cointegration relation between the price series is established, the method of Baillie et al. (2002) can be applied to obtain information shares of the stock index market and index options market. This method allows us to compute the information shares in a straightforward matter. To begin the price series are modelled with the following Vector Error Correction Model (VECM), denoting Yt as the 2 X 1 vector of the price series St and Ot: (3)
This model uses the cointegration relation with cointegration vector = (1, -1)’ and is the 2 X 1 error correction vector, that measures the correction of the price series to their long-run cointegration relation. The 2 X 2 coefficient matrix Aj captures the short-run dynamics that are induced by market imperfections. The number of lags included is k, which is chosen based on the Schwarz Information Criterion (SIC). The 2 X 1 innovations vector ut has a mean of zero and is serially uncorrelated with the following covariance matrix, denoting as the variance of u1,t(u2,t) and as the correlation between u1,t and u2,t:
(4)
Information Shares In order to quantify the price discovery of a market, the information shares of Hasbrouck (1995) are computed. This measure focuses on the underlying true price of the asset prices and relies on the assumption that price volatility reflects new information. The innovation variance of this true price is decomposed into components that can be attributed to the innovations of each of the asset prices.
Gonzalo and Granger (1995) introduce a method that is based on the VECM to measure the permanent component in cointegrated systems. This component, which they call the “common factor”, is closely related to the information share of Hasbrouck (1995). This common factor associates the price discovery to the adjustment of a market to the price movements of the other markets and it ignores the correlations between the markets. Furthermore, the innovations to the common factor are assumed to be permanent and driven by new information. The common factor is a linear combination of the price series, where the weights are assigned by the common factor coefficient vector. These common factor coefficients measure the long-run impact of the price innovations to
MET | Volume 21 | ßETA Special | 2014
5
The fitted values of this regression are the nal option implied stock index prices denoted as Ot. Henceforth the price series St and Ot are used for the price discovery analysis.
the future prices; therefore the coefficients are used as an alternative measure for the price discovery. The common factor coefficients are also required for the computation of the information shares in the method of Baillie et al. (2002). The common factor coefficient vector is derived by Gonzalo and Granger (1995) as the normalized orthogonal of the error correction vector:
(5)
Hasbrouck (1995) explains that the information shares of a market are not unique, due to the correlation of the price innovations across markets, possibly caused by time aggregation. A suggested solution is to minimize the interval between observations to reduce the correlations. Another solution is to obtain lower and upper bounds by applying the Cholesky factorization to the covariance matrix :
(6)
The information shares depend on the ordering of the price series in the factorization. The upper (lower) bound of market i’s information share is obtained when the price series is the first (last) variable in the factorization. Baillie et al. (2002) show that the bounds of the information shares can be computed as follows1:
, (7a)
, (7b)
based on the SIC, which results in 9 lags and 6 lags for the data sets with a strike of 1800 and 1900 respectively. Table 1 reports the common factor coefficients of Gonzalo and Granger (1995) and the bounds of the information shares of Hasbrouck (1995) along with the mean of the bounds of the S&P 500 index and S&P 500 options. Notice that the bounds of the information shares are rather wide. This is due to the high correlations between the markets with correlation coefficients of 0.43 and 0.46 for the data sets with a strike of 1800 and 1900 respectively. The frequency of the observations in the data set appears to be not high enough to eliminate the correlations. The common factor coefficients are unique, because they are unaected by the correlations; these coefficients along with the mean of the bounds are useful to analyse the price discovery of the markets. The results indicate that the stock index appears to be the leader in price discovery compared to the options with a strike of 1800; this can be seen from both the common factor coefficient and the information shares. However, when the stock index is compared with the options with a strike of 1900 the opposite is observed; the options lead the stock index. This difference can be explained by the fact that in the sample period the portfolio of options in equation 1 is in-the-money (ITM) for the data set with a strike of 1800, whereas the portfolio is out-the-money (OTM) with a strike of 1900. OTM options provide more leverage, due to the fact that their prices are lower than ITM options. So the differences in price discoveries can be explained by the differences in leverage and thus the results conform with the so-called leverage hypothesis; higher leveraged options provide better price discovery. Table 1: Information share results of the S&P 500 index and S&P 500 options
where ISup and ISlow are the upperbound and lowerbound of the information shares respectively. Baillie et al. (2002) provide evidence showing that the use of the mean of these bounds has useful interpretations, and hence the mean of these bounds can be used for the price discovery analysis. Results As mentioned before, the lags of the VECM are chosen
6
MET | Volume 21 | Ă&#x;ETA Special | 2014
Conclusion This research examines the price discovery process among the S&P 500 index market and S&P 500 options market. The portfolio approach of Holowczak et al. (2006) is used to obtain option implied stock index prices, which do not depend on the volatility and other higher moments of the underlying assets return. The information shares of Hasbrouck (1995) and common factor coefficients of Gonzalo and Granger (1995) are computed to quantify the price discovery of each market. The results show that the S&P 500 index market leads the price discovery in favour of ITM options, but the opposite is observed when the stock index is compared to OTM options; then the options market is the leader in price discovery. These results provide evidence supporting the leverage hypothesis. What should be mentioned is that the bounds of the information shares are extremely wide, thus this might indicate that the results are not robust. Further research with higher frequency data should be done to obtain better information share bounds and to confirm the results in this research.
Notes 1 See Baillie et al. (2002) for an extensive derivation of these information share bounds.
Reference [1] Baillie, R. T., Georey Booth, G., Tse, Y., and Zabotina, T. (2002). Price discovery and common factor models. Journal of Financial Markets, 5(3):309-321. [2] Chakravarty, S., Gulen, H., and Mayhew, S. (2004). Informed trading in stock and option markets. The Journal of Finance, 59(3):1235-1258. [3] Gonzalo, J. and Granger, C. (1995). Estimation of common long-memory components in cointegrated systems. Journal of Business & Economic Statistics, 13(1):2735. [4] Hasbrouck, J. (1995). One security, many markets: Determining the contributions to price discovery. The journal of Finance, 50(4):1175-1199. [5] Hasbrouck, J. (2003). Intraday price formation in us equity index markets. The Journal of Finance, 58(6):23752400. [6] Holowczak, R., Simaan, Y. E., and Wu, L. (2006). Price discovery in the us stock and stock options markets: A portfolio approach. Review of Derivatives Research, 9(1):37-65.
MET | Volume 21 | Ă&#x;ETA Special | 2014
7
Explaining individual product choice behavior: o Didier Nibbering Erasmus University Rotterdam
The ability to learn from online behaviour will stand or fall with a good understanding of the influence of network connections. In this paper we construct a novel econometric framework to examine whether social contagion is at work within an online social network. We apply the methods to a Twitter dataset regarding the talent contest “The Ultimate Dance Battle” on the Dutch television. Because effects of contagion are easily confounded with other effects, we model the individual decision making process for posting messages in the online social network. In a dynamic multivariate multinomial logit model we distinguish between messages on topic and sentiment level and consider the presence of habit persistence in online behaviour. We correct for homophily in estimates of social contagion effects by including latent classes which capture unobserved individual heterogeneity. In the case-based analysis we are able to distinguish between different motives for online choice behaviour. We find base preferences of individuals and state dependence as main driving forces. Moreover, we find evidence for online social contagion of positive posts about participating teams of the talent show.
8
Introduction The past few years online social networks have proved to be of large social impact. Especially the microblogging site Twitter, which plays not only an important role in news supply, but is also frequently a major driver of publicity. Examples range from witch-hunts against bankers after the financial crisis of 2007 to the Twitter revolutions in the “Arab Spring” (Lotan et al., 2011). A growing number of businesses and authorities are already tracking Twitter to gauge sentiment and avert potential publicrelations problems. However, nobody has the answer on how to respond to mass convergence events, in which a fast growing number of people form a group with a shared opinion. The success of networkbased intervention strategies depends on whether people actually influence one another in an online setting (Aral, 2011). Christakis and Fowler (2011) state that “when there is no contagion, then no matter how successful the detailing intervention is, it will not lead to cascade effects”. So the key question is whether choices of online media users are driven by choices made in their network. The spread of behaviour from one social network user to another is called social contagion. McGuire et al. (1985) define this process more precisely; “one person serves as the stimulus for the imitative actions of another”. Although social contagion within social networks have been subject to many studies, three main problems arise in this research field. First, it is very cumbersome to obtain good network data that is both quantitatively as well as qualitatively sufficient. Second, the effects of contagion are easily overestimated. Finally, there is no clarity regarding which types of behaviour are subject to social contagion. Social influence within a network can imply the power to determine which topic is discussed. On the other hand, influence can be measured as the ability to spread a certain opinion. We overcome these difficulties by using the most recent developments in both social networks and econometric techniques. First, we apply our network analysis to Twitter. Because of the openness and availability of messages posted on Twitter we do not have to deal with imperfect response rates or imperfect accuracy of responses. Second, we specify a novel individual choice behaviour model in which we estimate the effects of
MET | Volume 21 | ßETA Special | 2014
online social contagion on twitter
contagion in the social network, while accounting for individual-specific heterogeneity among the members of the network. Moreover, we consider potential habit persistence and the possibility that choices made by individuals are caused by their own behaviour patterns (in the past) rather than by influence of their social network. By allowing for various forms of influence on the individual decision making process in combination with a carefully constructed estimation method, we try to avoid overestimation of the contagion effect. In conclusion, we have to correct for homophily and habit persistence to obtain reliable estimates of social contagion effects. Therefore we first investigate the individual choice behaviour on Twitter and subsequently determine which part of this behaviour can be explained by online social contagion. We distinguish between two levels of contagion. We examine whether activity in online social networks incites individuals to post and whether Twitter users change or retain the same opinion about a topic as a result of expressed views by connected network users. Data We use a data set consisting of messages from the microblogging site Twitter about the television program “The Ultimate Dance Battle”, collected by Koster (2012). The Ultimate Dance Battle is a talent contest for dancers on Dutch and Flemish television. Five choreographers and their teams compete against each other. In the final four live shows, one team is eliminated each time based on viewers votes and the opinions of expert jury members. In the fourth live show, there are two dance teams left and the winner is determined by the votes of the viewers. Throughout, viewers are encouraged to tweet about the show with the hashtag #tudb. Our data set includes all tweets with this hashtag sent during the four weeks in which the four live shows are broadcast. The data makes it possible to examine the effect of the choices made in the previous time period by the social network of an individual, on his or her own choices in the current time period. Moreover, we can estimate which part of individual choice behavior can be explained by decisions made in the past. We create five unordered categorical dependent variables corresponding to the five teams participating in the
MET | Volume 21 | ßETA Special | 2014
dance competition, using the choreographers’ names: Vincent, Thom, Min Hee, Michel and Jaakko.We perform a sentiment analysis on the data set to assign tweets to a choice category using the Sentistrength tool which supports both the English and Dutch language in messages (Thelwall et al., 2011). Each variable corresponds to a tweet decision about a team and each category in the variables corresponds to the chosen tweet category about that team - send no tweet, send a negative tweet, send a neutral tweet, send a positive tweet - in a certain time period by a certain individual. The dependent variable consists of the last three time periods and the first three time periods are included in a lagged dependent variable. We construct an independent variable on the basis of the directed network ties among the individuals in the data set, distinguishing between followers and friends i.e. people who the person follows. By including the decisions of the friends of an individual, we consider contagion in the online social network of the individual. We also construct a dummy variable indicating which candidates are voted out of the show. Method To examine heterogeneous preferences and social contagion we adopt an econometric framework. In this chapter we first specify a model on the data set and subsequently discuss the parameter estimation method. Model Specification We observe the individual choice behavior of Twitter users over multiple time periods in which they make decisions among the same sets of alternatives. At the same time, the individuals are connected to each other through network ties and are therefore aware of each others choice behavior. Hence, we have an ideal setting to study online social contagion. To avoid confounding network influence with other motives to tweet, we first model the decision making process and subsequently estimate which part of tweet behavior can be explained by online social contagion. We are dealing with tweet decisions over multiple categories for five different dance teams. The preferences of Twitter users for different dance teams can be correlated, which means that we have to take correlation between decisions into
9
account. To model the decision making process of the Twitter users we specify a multivariate model for multinomial choices. By including decisions made in the past as explanatory variable for present choice behavior, we make the model dynamic. In a non-nested multinomial model we assume that Twitter users simultaneous consider all alternatives per dance team, that is sending no tweet or sending a negative, neutral or positive tweet about a team. However, it may be more plausible that they first decide to send a tweet about a certain team and subsequently determine the tone of the message. We model this process in a dynamic multivariate multinomial nested logit model. Finally, we account for individual-specific heterogeneity among the individuals by implementing a panel data structure. By allowing for insample heterogeneity we can distinguish between differences in base preferences of Twitter users and correct for homophily in the estimates of the effect of network influence. Bel and Paap (2014) propose a new multivariate multinomial logit model which deals with multiple decisions over multiple categories and also allows for correlation between decisions. This model has two advantages which become clear in our application. First we can clearly distinguish the estimated effects between different tweet motives in terms of log odds ratios. Second, parameter inference is feasible using a composite likelihood approach, even in our case where we have four to the power five potential outcomes per individual. We discuss this estimation method in the next section. We start specifying the multivariate multinomial logit model on the structure of the decision making process of the individuals in the data set; the Twitter users. An individual, labelled i, faces in decision j a choice among Kj categories. We have five decisions concerning the participating dance teams, each including four categories: send no tweet, a negative tweet, a neutral tweet, or a positive tweet about that team. The multivariate multinomial logit model is derived from the conditional probabilities for category k given all other decisions made in the same time period. To explain tweet behavior of an individual by her or his own behavior in the past, we make the model dynamic by including the lagged dependent variable as regressor in the model. By making use of the panel structure of the data we can
correct for heterogeneity among the Twitter users. Because we have observations over different time periods for each individual we can specify individual-specific base preferences . The inclusion of these fixed effects creates two difficulties. First, the number of incidental parameters grows in the number of individuals and in the number of decisions, which makes estimating independent parameters for each individual infeasible. Second, because there are only four time periods, we have insufficient observations per individual to produce reliable estimates. By segmenting the individuals in latent classes we strongly reduce the number of parameters while still taking heterogeneity into account. Moreover, we are able to distinguish between segments of individuals with similar base preferences towards choice categories. We obtain the following expression for the dynamic panel data multivariate multinomial logit model;
10
MET | Volume 21 | Ă&#x;ETA Special | 2014
(1)
where i = 1,...,n, t = 2,...,T, si = 1,...,S and S<N is the number of latent classes.We define as the set of all possible realizations of Yit, and yit = (yit1,...,yitJ) as the realized outcome vector of Yit. Moreover,
(2) where I[yi,t-1,j = r] and I[yitl = r] are indicator functions. The parameters capture the differences in base preferences towards the tweet options per latent class. The dummy variable indicating which candidates are already voted out at the start of show t is represented in xt. Including this variable does not only accounts for changes in base preferences as result of the drop out of a team, but also factors in that events during the last show influence base preferences of individuals. The explanatory variable â&#x20AC;&#x153;Friendsâ&#x20AC;?, z itjr, considers for each individual i in time period t the number of friends who chooses a certain category r in decision j. The corre-
sponding parameter represents the effect of online social contagion and the parameter the effect of state dependence. We take the interdependencies between choosing category k at decision j and category r at decision l j in the same time period into account in the cross-dependence parameters . To identify all parameters we impose the standard identification restrictions of the multivariate multinomial logit model, see Bel and Paap (2014) for details. Parameter Estimation We have to estimate the parameters in a dynamic panel data multivariate multinomial logit model with an increasing number of latent classes. We estimate the parameters in these models by means of maximum likelihood estimation. We assume that yit conditional on regressors xt , z it , and yi,t-1 and parameters are independently and identically distributed. The estimation method is not as easy as it appears to be at first sight. The complexity of the model raises three major problems. First, the evaluation of the log-likelihood is computational complex due to the summation over all outcomes for all observations of all individuals. This multivariate multinomial logit structure makes maximum likelihood estimation impractical in larger problems, given that the outcome space grows exponentially in the number of decisions. We therefore opt for an alternative approach called composite likelihood. This method provides simplified estimation yielding significant computational time gains in large
problems at the expense of maximizing a slightly misspecified likelihood (Varin et al., 2011). The next two issues come along with the inclusion of a panel data structure in the model. The number of incidental parameters grows in the number of individuals and in the number of decisions. We explained that we solve this problem by implementing a latent class structure. Because direct maximization of the likelihood function of the model with latent classes does not work smoothly, we use an iterative estimation routine; the ExpectationMaximization (EM) algorithm (Dempster et al., 1977). Finally, due to the inclusion of the lagged dependent variable yi,t-1 in the model we cannot simply take the product over the time periods t = 2,...,T in the likelihood function. The dynamic panel structure of the model causes inconsistency in the standard maximum likelihood estimates by multiple latent classes. We use the Heckman approach to the problem of initial conditions (Heckman, 1987). Results To obtain new insights in tweet behavior we first consider the number of segments of Tweeters with different preferences. The optimum model provides three segments of Tweeters. We find one large segment including 66.4 percent of the individuals and two small segments with 14.2 and 19.4 percent of the individuals, respectively. Within a segment of Twitter users the individuals are assumed to have the same individual-specific characteristics, while the characteristics are different
Figure 1: Changes in odds ratios over time per tweet sentiment for each team. The first figure shows the development of the preferences of segment one and so on. The odds ratio becomes a number of times larger or smaller equal to the corresponding exponentiated coefficient, as a result of a one unit increase in the associated explanatory variable. The teams are referred to by color and the sentiment of the tweets per team by shades of the team colors and different shapes.
MET | Volume 21 | Ă&#x;ETA Special | 2014
11
across the distinct segments. The base preferences for specific dance teams follow on from these individualspecific characteristics. Because there can be differences in quality among performances of the participating teams and one team simply drops out every show, these base preferences are likely to change after each live show. The changes over time in relative preferences for tweeting a certain sentiment about the different teams as defined in the odds ratio are showed in Figure 1. Each box presents the development of the base preferences of a different segment of Tweeters. The first one shows the tendency of being positive about team Thom, which is on his peak in the final show. In the second box we find a similar pattern to the advantage of team Vincent. The third box is clearly the representation of the base preferences of the doubters between team Thom and team Vincent. We see that the preference of sending a tweet against sending no tweet increases fast for the two finalists but does not vary that much for the other candidates. Beside the fact that base preferences toward teams can change over time as a result of changes in the competition, we also take into account that the base preferences for teams can be correlated. Figure 2 shows an overview of the estimated cross-dependencies. To get a
good impression of the relations between the other teams we analyse the cross-dependencies without team Jaakko included; there is a large amount of uncertainty in these estimates due to the small number of observations after the early dropout. We see that almost all cross-dependencies are positive, except for the crossdependencies between team Thom and team Vincent. So this visualization indicates that there is only a strong competition between the two finalists of the talent show. Up to now, we have discussed the individual choice behavior of the Tweeters by their base preferences corrected for influences from the outside and the influence of own behavior in the past. The box on the left in Figure 3 considers the social contagion effect, that is the effect of the behavior of the network in the past. We find a clear social contagion effect, but this effect is only present between positive Tweets. The box on the right side of the figure shows the effect of own tweet behavior in the past on the tweet behavior in the present. We also find here the strongest effect between positive tweet behavior in the past and positive tweet behavior in the present.
Figure 2: Estimated cross-dependencies between the tweet categories per team, without team Jaakko. Red lines ndicate negative relations and green lines positive relations. The darker the colour of the line, the stronger the cross-dependency between the two connected tweet categories.
Discussion This study examined whether online social contagion is at work in a dataset concerning a talent show. First we constructed a model for individual choice behaviour on Twitter. The use of a multivariate multinomial logit model enables us to consider tweets on both topic as sentiment level. By differentiating motives which cause similar behaviour between individuals other than social contagion, we reduce the risk of overestimating the effects assigned to influence of network connections. An important factor is homophily, which arises from segments of individuals who not only share network ties but also individual-specific characteristics. Second, the tendency to be consistent in choice behaviour can result in groups which converge to certain behaviour patterns. We account for these motives to show specific tweet behaviour by taking habit persistence into account and by segmenting individuals in latent classes. We find significant effects for both habit persistence and differences in base preferences among segments of Twitter users, which indicates that it is important to
12
MET | Volume 21 | Ă&#x;ETA Special | 2014
Estimated Cross-Dependencies Without Team Jaakko
Figure 3: Changes in odds ratios as result of network influence and own behavior in the past. The figure on the left side shows the change in odds ratios for a team as consequence of the received tweets about that team with different sentiments via network ties. The figure on the right shows the same effect following from sending a tweet about the same team with a certain sentiment in the previous period. The odds ratio becomes a number of times larger or smaller equal to the corresponding exponentiated coefficient, as a result of a one unit increase in the associated explanatory variable.
consider these effects when examining the presence of contagion in a social network. When we assume that the sender of a tweet comprising a positive sentiment and the name of a candidate feels positive about this candidate, we can distinguish between fan clubs of finalists of the talent contest. Moreover, we find that there is only a strong competition between the two finalists. Although the performance of the sentiment classifier of the tweets is doubtful, we can distinguish between effects on sentiment level. The most remarkable result addresses the question whether online social contagion is at work; we find only evidence for contagion of positive messages about candidates. References [1] Aral, S. (2011). “Commentary-identifying social influence: A comment on opinion leadership and social contagion in new product diffusion.” Marketing Science 30 (2), 217–223. [2] Bel, K. and R. Paap (2014). “A Multivariate Model for Multinomial Choices.” Econometric Institute Report EI 2014-25, Erasmus University Rotterdam. [3] Christakis, N. A. and J. H. Fowler (2011). “Commentary-Contagion in Prescribing Behavior Among Networks of Doctors.” Marketing Science 30 (2), 213–216. [4] Dempster, A. P. et al. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal statistical Society 39 (1), 1–38. [5] Heckman, J. J. (1987). The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discrete data stochastic process and some Monte Carlo evidence. University of Chicago Center for Mathematical studies in Business and Economics. [6] Koster, S. (2012). “Modelling individual and collective choice behaviour in social networks: An approach combin-
MET | Volume 21 | ßETA Special | 2014
ing a nested conditional logit model with latent classes and an Agent Based Model.” MA thesis. the Netherlands: Erasmus University Rotterdam. [7] Lotan, G. et al. (2011). “The Arab Spring| the revolutions were tweeted: Information flows during the 2011 Tunisian and Egyptian revolutions.” International Journal of Communication 5, 31. [8] McGuire, W. et al. (1985). “Handbook of social psychology.” Handbook of social psychology. [9] Thelwall, M. et al. (2011). “Sentiment in Twitter events.” Journal of the American Society for Information Science and Technology 62 (2), 406–418. [10] Varin, C. et al. (2011). “An overview of composite likelihood methods.” Statistica Sinica 21 (1), 5–42.
13
A Multinomial Probit Model to Infer User Segme Sabine den Daas Erasmus University Rotterdam
A Bayesian framework of a multinomial probit model is extended with a segmentation approach that models simultaneously the choice of events and segments among users’ click streams patterns such that it captures the relation between segment memberships, event choice and descriptive characteristics. Subsequently, a Bayesian learning model is applied to update our believes about a new user’s segment and his/her near future stream of click preferences. Since insights in user segments make it possible to reach out to the Web site’s customers, the way these customers prefer, the results of the proposed model could be used to increase customer satisfaction and loyalty.
14
Introduction In an attempt to gain knowledge about their customers, almost every company tracks its users’ paths on its Web site by streaming every click on every check box, button etcetera, to their data warehouse. The growing velocity of obtaining and storing data makes analyses on enormous databases, which contain such click stream information, a subject of increasing importance to companies. Unfortunately, the majority of these companies do not yet have access to the appropriate methods that allow them to obtain the relevant information from their databases efficiently. Furthermore, if they have obtained data, most of them are not aware of the analyses they could do on this history (Brown et al., 2011). In this research we aim to contribute to a solution to the above mentioned unawareness of analysis possibilities in the Big Data field. Note that the click streams of users can be seen as a sequence of choices between multiple events. The primary goal of this research is to use a Big Data case to estimate the search and purchase patterns ofWeb site users and group these users according to their choice at every click. For this purpose, we develop a novel approach to infer Web site users’ search and purchase patterns from click stream data. We extent the Bayesian probit model suggested by Chib and Greenberg (1998) as multinomial probit with a segmentation model, which is able to model simultaneously the choice of events and segment among users’ behaviour and click preferences such that we build a model that estimates simultaneously the event choice and segment parameters in users’ click streams patterns such that it captures the relation between segment memberships, event choice and descriptive user characteristics. Segmentation methods from Kamakura et al. (1994) and Kamakura (2009) are building blocks for the segmentation part of this model. Secondly, a behavioral model of consumer sequential product search (Haubl et al., 2010) is reviewed and used in this research as a guideline for the inclusion of variables. Lastly, Greenberg (2008) provides computationally efficient simulation algorithms for Bayesian inference and model estimation. The second aim in this research is to forecast the group of users a new-coming user might belong to. We aim to forecast the group choice and the choice pattern per group in real time, where we aim to do so from as little information as possible, in a minimum number of
MET | Volume 21 | ßETA Special | 2014
ents in On-line Search and Purchase Patterns actions by the user (clicks). Here, a Bayesian learning model based on a method for latent segmentation in click stream analysis from Hauser et al. (2014) is applied to update our believes about a new userâ&#x20AC;&#x2122;s segment and his/her near future stream of click preferences. To show the practical relevance of this research we use a real big data case of the Dutch company Marktplaats.nl Since insights in user segments, make it possible to reach out to the Web siteâ&#x20AC;&#x2122;s customers, such that they are treated the way these customers prefer, the obtained model results could be used in practice to increase customer satisfaction and loyalty. Methodology User segment and paths We build a model in which every user i chooses between J event types at an event number t. The total number of events per month varies per user and is defined as Ti. Let Xitk be the kth event attribute variable at event t for user i, where and The vector Xit contains K event attribute values for user i at event t. Furthermore, we define the l th user specific descriptive variable for user i as Zil. The L variables in Zi are constant over time and capture user specific information. Finally, the observable event path is captured in y as follows: yijt = 1 if event t of user i is equal to an event of type j, and 0 otherwise. The probability that event t for user i is of type j, given that this user is member of segment s, is given by (Kamakura et al., 1994): (1) where Si is the segment choice for user i and Uijt the choice utility of event type j derived at event t by user i. Here, we assume that the choice utilities are latent functions of the event attributes of event t (Kamakura et al., 1994): (2) where is a parameter to be estimated and is the error term. We have to impose the parameter restric
tion that the expectation of UiJt equal to zero for all observations, hence = 0 for all k and s. Due to independence among the choice utilities, we can define the probabilities in Equation 1 as shown in Equation 3. For simplicity, we restrict all to one and take a diuse prior for . The a priori probability that a user is a member of latent segment s is defined as (Kamakura et al., 1994): (4)
Here, we assume that the latent segment variables of segment s by user i (Vis) are functions of the user characteristics of user i:
(5)
where is a parameter that defines the impact of the l th user characteristic on this prior probability for segment s and represents the error term. For identication purposes we restrict to 0 for all l. We assume that the error term is follows a normal distribution with a mean of zero and a standard deviation of one. We assume that there is little information on the distribtuion of : . For the ease of calculation, the complete data likelihood of user i given that this user is member of segment s can be rewritten in terms of the latent choice utilities. The conditional data likelihood of user i given that this user is member of segment s reads:
(6)
where Ui is the choice utilities vector of user i. The parameters in Equations 2 and 4 are obtained by MCMC simulation with a Gibbs sampler. Segment and Event Path Forecast of New Users The main purpose of introducing a slightly different framework than in the last paragraph is to forecast a path of events for each new user, taking into account past click stream patterns of both the new user and
(3)
MET | Volume 21 | Ă&#x;ETA Special | 2014
15
current users, without knowing the assigned segment beforehand. The following additional variables have to be defined to take all historical click stream data into account: = the event path, a vector of event types of user i up till and including event t; = vector of all event- and user attributes (xitk’s and Zil’s) for user i at event t; = matrix of all previous Mit’s including event t; = matrix of all estimated previous latent Uijt’s including event t; ws = vector of average characteristics of members in segment s. The characteristics in ws are assumed to be prior knowledge following from the common event attribute information for the members of segment s and average segment specific user and event information in the parameter estimates and . Given the observed event path of user i , the observed event attributes and characteristics of user i and ws for all s, the probability that user i belongs to segment s is defined as in Equation 7
Parameter estimation To sample the parameters in our model we use the Gibbs sampler. The sampler substitutes the computations of high dimensional integrals by sequences of low-dimensional random draws of single variables (Casella and George, 1992). Each variable is drawn in a predefined order, see Gelfand and Smith (1990) for details. Full Conditional Distribution The full conditional distribution of yijt is given by the probability that user i belongs to segment s and that simultaneously the choice utility at event t for this segment s is larger for j than all other events given the data and parameters. This is shown in Equation 9.
(9)
Sampling Distribution of the Choice Utilities The distribution of the choice utility Uijt follows from Equation 3. The posterior distribution of the choice utilities is a multinomial normal distribution truncated to interval Bijt. Since we assume that these utilities are independent, we draw them from the truncated normal distributions given in Equation 10.
where P0,s is the a priori probability that a user is member of segment s. We define f as the total number of forecasted events. Then the probability that the event types vector of user i till the last forecasted event at is equal to the event path , given the segment membership, follows from Equation 8.
(10) where we take the observations of our dependent variable into account by restricting the samples of Uijt to the interval Bijt, given in the system of equations in 11. if (11) if ; where Ui(-j)t the choice utilities for user i at event t without the jth event type.
(8) evaluated in the estimated mean of the utilities based on the segment information in . With the use of this equation, the forecasted event path for all users is obtained.
Sampling Distribution of the Latent Segments The latent segment probabilities, shown in Equation 4, follow from the probability that Vis is the maximum value of Vim for all m. The components of the linear regression on Vis are shown in Equation 5. The posterior distribution of Vis is proportional to the
16
MET | Volume 21 | ßETA Special | 2014
(7)
product of the distribution of the segment utilities and the choice utilities given these segment utilities, as shown in the following equation: (12) where Vi(-s) contains all latent segment variables except Vis. We have seen the last part of this equation as the conditional likelihood of user i given that this user is member of segment s , where Ui is sampled conditional on the latent segment choices (Si = s). For the ease of notation, we define Lm as this likelihood as if the user was a member of segment m. We restrict the updating scheme to choose between segment s and the segment with the highest segment utility without segment s, define this segment as k *. This restriction allows us to simplify Equation 12 to the two parted equation in 13 with the first equation up to the threshold Vik* and the second equation at the right hand side of this threshold: (13)
if
tion of Vis for all i; s one could simply assign every user to the segment with the maximum segment utility (Si = s). Sampling Distribution of the Listing and User Characteristics Parameters The last parameters to sample are the parameters and . Both can be obtained from a Bayesian sampler in linear regression. In respectively Equation 15 and 16 the distribution of and are given in matrix notation: (15) for with Xs a matrix with the listing characteristics for all users in segment s at all events t (t = 1,..., Ti) and Usj a matrix with choice utilities associated with the members of segment s for every event t.
for
(16)
with Vs a column of the segment utility values for segment s for every user.
scalings factor . The inverse CDF technique is used to sample from the distribution given in Equation 14. After the simula-
Data Data from the largest company in on-line customerto-customer services in the Netherlands, Marktplaats. nl, is used. The number of data points in this data case is enormous: together they sum up to 40GB of data for each day. We decrease the data to 10% of its original size by aggregation and proper selection. The aggregated data set contains observations of 50,000 users over one month in the car category only. These observations include 282,500 sessions and 2,839,000 events. In this research, 66.6% of the data is used as sample data and the other 33.3% of the data is used as test set. After variable selection, the following variables remain in our set of attributes: time since the last event, brand of the car ffreight, popular, lowpriced, middle-class, extraordinary, luxuriousg, age of the car in years, mileage in thousand kilometres, fuel (two dummies for respectively gasoline and diesel), price in thousand euros and a counter for the number
MET | Volume 21 | Ă&#x;ETA Special | 2014
17
if
From this system of equations follows the CDF of Vis given the segment utilities in a number x, shown in Equation 14. (14)
if
if
where
c
is
equal
to
the
of events at the same listing. Furthermore, as user characteristics the dummies for historical activities in the other categories are taken. Lastly, we summarised the page and photo views in two recency measures Results User Segments To determine the parameter relevance, we use the highest posterior density region (HPD) to calculate a 95% credible interval. Based on the mean absolute prediction error and hitrate, we found that the number of segment in this data set is three. The posterior probabilities that a user is in segment 1, 2, or 3 are respectively 55.9%, 32.0% and 12.1%. Furthermore, we aim to focus on the interpretation of differences between the segments and event types to make our results useful in practice. For these interpretation purposes we look into the marginal effects of both and . From the average marginal effects for Sell follows, for example, that selling items in other categories than Cars increases the probability on being in segment 3 with 17.6%. For example, the average marginal effect for Price on the event bid in segment 1 is -0.5%. The latter means that the probability of a bid decreases on average with 0.5% if the asking price increases with thousand euro. Furthermore, the intercept parameter of the event type bid for segment 2 (-1.20) indicates that users in segment 2 bid less on a listing than they save a listing to their favourites, the reference choice option.
Segment and Event Path Forecast of New Users By means of our segment inference on the test set we found 54% of the users in the Bargain Hunters segment, 33% in the Vehicle Enthusiast segment and 12% in the True Born Traders segment. Furthermore, we showed that Marktplaats.nl has to follow a new user at most 15 events after he/she enters the Cars category to infer his/her segment. The path forecast selects the best event choice for each segment, based on the average click preferences of the segment s each event t. By means of this forecast we expect that users in the first segment click on URLs, save some items as favourite and ASQ. On the other hand, we expect that users in the second segment tend to view only details at first, after 9 events we expect these users to be more seriously interested in cars. The forecast for the third segment indicates that these users tend to mainly sell in the Cars category, interspersed with viewing details, ASQ and bid. The forecast results are logically consistent with the segment characterisics obtained with the parameter interpretation of the first part of the presented model.
Combining all marginal effects and intercept information we conclude that the users in the first segment are mainly interested in `popular’ cars. Users in this segment are more sensitive to price while bidding, and more assertive than users in the other two segments. The second segment consists of users which tend to view details, but are less likely to do a bid or ASQ. Being active in the Other Vehicles category increases the probability of being in the Vehicle Enthusiasts’ segment. The third segment could be mainly described by the seller activity in both the Cars and other categories. Furthermore, the users in this segment are more likely to bid and ASQ on other listings. Activity in the Cars Other category increases the probability of being in segment 3.
Discussion and managerial implications Our two-parted Bayesian Framework has the advantage that it is suitable for use in a high dimensional Big Data case. Through a simulation study we confirm that our model satises both the need of an accurate segment membership assignment and the necessary condition of a realistic forecasting performance. We illustrate the usefulness of our approach using a real data case of Marktplaats.nl, the largest company in on-line customer-to-customer services in the Netherlands. Marktplaats.nl could use the gained knowledge about their segments’ in the Cars category to make an indication of the number of users with a specific prole and approach each segment differently. It is of the greatest importance that the obtained results give Marktplaats.nl the possibility to increase its customers satisfaction and loyalty of both the car suppliers and potential buyers. Further research is needed to improve estimation quality on the covariance structure, to implement a method to prevent outnumbering by two choice options and to allow the user characteristics to change over time.
18
MET | Volume 21 | ßETA Special | 2014
References [1] B. Brown, M. Chui, and J. Manyika. Are you ready for the era of `Big Dataâ&#x20AC;&#x2122;. McKinsey Quarterly, (4):2435, 2011. [2] G. Casella and E.I. George. Explaning the Gibbs sampler. The American Statistician, 46(3):167-174, 1992. [3] S. Chib and E. Greenberg. Analysis of multivariate probit models. Biometrika, 85(2):347-361, 1998. [4] A.E. Gelfand and A.F.M. Smith. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Assocation, 85(410):398409, 1990. [5] E. Greenberg. Introduction to Bayesian Econometrics. Washington University Press, New York, 1st ed edition, 2008. [6] J.R. Hauser, G. Liberali, and G.L. Urban. Website morphing 2.0: Switching costs, partial exposure, random exit, and when to morph. Management Science, forthcoming, 2014. [7] G. H aubl, B.G.C. Dellaert, and B. Donkers. Tunnel vision: Local behavioral in uences on consumer decisions in product search. Marketing Science, 29(3):438-455, 2010. [8] W.A. Kamakura. American time-styles: A nitemixture allocation model for time-use analysis. Multivariate Behavorial Research, 44(3):332-361, 2009. [9] W.A. Kamakura, M. Wedel, and J. Agrawal. Concomitant variable latent class models for cojoint analysis. International Journal of Research in Marketing, 11:451-464, 1994. Notes 1 This paper is a shortened version of the eponomous Master Thesis. For more details on the mentioned methods and more insights in the results and the obtained simulation study please consult the full report. 2 The data case is obtained during a master thesis internship at Gibbs Quantitative Research and Consulting.
MET | Volume 21 | Ă&#x;ETA Special | 2014
19
Flow formulations for the Time Window Assignm Kevin Dalmeijer Erasmus University Rotterdam
In this thesis, we consider ow formulations for the Time Window Assignment Vehicle Routing Problem (TWAVRP), the problem of assigning time windows for delivery before demand volume becomes known. Using commercial solver CPLEX as a basis, we develop a branch-and-cut algorithm that is able to solve all test-instances of Spliet and Gabor (2014) to optimality and is highly competitive with the state of the art solution methods. Furthermore, we introduce a novel set of TWAVRP specific valid inequalities, the precedence inequalities, and show that applying them yields a competitive algorithm as well.
20
Introduction The Time Window Assignment Vehicle Routing Problem, or TWAVRP, is a problem that arises naturally in distribution networks of retail chains where the retailers place weekly orders at the distribution center. In this case it is common for the distribution center and the retailer to agree upon a time window for delivery which will be fixed for a long time (e.g. one year). Knowing when delivery takes place is important for the retailer, as he has to adjust stang and control inventory accordingly. The TWAVRP is the problem of assigning time windows to the retailers such that the expected cost of servicing all clients is minimized under demand uncertainty. The TWAVRP is a generalization of the Vehicle Routing Problem with Time Windows (VRPTW). The VRPTW is the problem of routing vehicles such that deliveries are made at minimum cost, taking into account that deliveries have to be made within time windows set in advance and that vehicle capacity can not be exceeded. The TWAVRP generalizes the VRPTW by allowing for not only deterministic demand, but also for uncertain demand randomly drawn from a nite set of scenarios. Furthermore, the TWAVRP allows for changing the time windows, but only before the demand scenario becomes known. We will refer to these time windows as the endogenous time windows. The endogenous time windows are required to be within larger exogenous time windows. The exogenous time windows can represent the opening hours of the stores, as the time window for delivery is necessarily during opening hours. The quantity ordered by the retailers only becomes known shortly before delivery has to take place. This requires us to determine the time windows for delivery before the actual demand is revealed. By considering different demand scenarios and minimizing the expected costs we take the uncertainty in the quantity demanded into account when assigning the time windows. Spliet and Gabor (2014) model the TWAVRP as a set-partitioning type problem, meaning that they use binary variables to indicate whether a complete vehicle route is used or not. They solve the problem to optimality by using column generation in a branchprice-and-cut algorithm. Numerical experiments lead to the conclusion that instances up to 25 customers and 3 demand scenarios can be solved within one hour of computation time. In this thesis we model the TWAVRP as a flow type problem
MET | Volume 21 | Ă&#x;ETA Special | 2014
ment Vehicle Routing Problem
instead of a set-partitioning type problem to improve upon these results. Solution method We model the TWAVRP as a mixed integer linear program, based on the 2-index arc flow formulation. Let be the set of nodes representing the customers, and let be the set Vâ&#x20AC;&#x2122; with two additional nodes representing the distribution center. We can then model a vehicle route as a unit flow from to in the complete, directed graph . For each demand scenario , the flow on the directed arc between and is given by the decision variable . Let be the probability that scenario occurs, and let cij be the cost of traveling arc . We then get the following objective and constraints: (1)
subject to
(2)
(3)
(4)
(5)
(6)
(7)
Time window constraints (y-variables)
(8)
Time-of-service constraints
(9)
Capacity constraints
(10)
In the objective (1) we minimize the expected traveling costs over all scenarios. This is done by multiplying the costs per scenario with the probability that the scenario occurs. Constraints (2) ensure every customer node has MET | Volume 21 | Ă&#x;ETA Special | 2014
outow one. Combined with (3) this enforces that every customer is visited exactly once. Constraints (4) make sure that all flow sent into the network reaches the destination depot, that is, the number of trucks to leave the depot is the same as the number of trucks that return to the depot. The constraints (5) and (6) prohibit the use of flows out of the node representing the destination depot (5) and flows between the two depot nodes (6). Note that the previous constraints do not eliminate cycles in G, which will be taken care of in the time-of-service constraints. The x-variables are not the only decision variables, as we also have continuous variables yi depicting the start of the endogenous time window at client i. The time window constraints (8) are constraints that ensure the assigned endogenous time windows are valid, i.e. they are within the exogenous time windows. The time-of-service constraints (9), in turn, make sure that deliveries take place within the endogenous time windows. Finally, the capacity constraints (10) restrict the vehicle capacity. There are multiple sets of constraints that can be used to restrict the flows such that we only allow vehicle routes for which all clients can be visited within their time window. Furthermore, there are multiple ways to restrict the flows such that we respect vehicle capacity as well. An important part of this thesis involves presenting, testing and discussing a variety of constraints. Valid inequalities In this thesis, we present a variety of constraints that t the general mixed integer program (1)-(10). Surprisingly, this program turns out to be incredibly hard to solve: even instances with just 10 clients can not be solved to optimality in reasonable time. To overcome this problem, we introduce different classes of so-called valid inequalities. Valid inequalities are inequalities that are satised by the optimal integer solution (valid), but may not be satised by the fractional solutions to the LP relaxations. This allows for strengthening the LP relaxation in the following way: first, we solve the LP relaxation. Then, we find valid inequalities that are violated by the fractional solution and add those to the formulation. Next, we solve the LP relaxation again, but now including the violated inequalities. Repeating this procedure increases the objective, and thus gives stronger LP bounds. Rounded capacity cuts 21
The first class of valid inequalities that we use is the rounded capacity cuts (Lysgaard et al., 2004). The rounded capacity cuts assert that the outflow of each subset of V is greater than or equal to the minimum number of vehicles necessary to serve all clients in that subset. For example, if the demand of a set of clients is such that at least two vehicles are necessary to serve them all, we require the outflow of that set of clients to be at least two. If we let be the demand of client i in scenario , and we let Q be the vehicle capacity, the rounded capacity cuts are given by:
,
(11)
It is important to mention that identifying the rounded capacity cuts that are violated by a given fractional solution is a difficult problem. Hence, we use the CVRPSEP package of Lysgaard (2003) to find these violated inequalities heuristically. Precedence inequalities We also introduce the precedence inequalities; a novel set of inequalities specifically for the TWAVRP, first introduced in this thesis. The precedence inequalities are inspired by the fact that route segments that take a lot of time are precedence inducing in all scenarios. As an example, consider a vehicle route in scenario which first visits client u and 10 hours later visits client v. If we are required to assign time windows of 1 hour in width, it is clear that the time window for client u has to precede the time window for client v in time. This implies that if we want to use the above-mentioned route, it is no longer possible to use routes from v to u in any scenario, because the time window of u is before the time window of v. We will formalize this intuition in Theorem 1. Let tij be the time required to travel from i to j and let wi be the width of the time window for client i. For a partial route p from i to j, let Ap be the set of traversed arcs in graph G. We can now state the theorem: Theorem 1. For given nodes , for any feasible solution to the TWAVRP in which both path p is used in scenario and path q is used in scenario the following
22
holds:
(12)
We can use Theorem 1 to deduce valid inequalities. For example, we can easily find a subset of the precedence inequalities with the following argument: if we can find paths p and q for which (12) does not hold, Theorem 1 states that p and q can not both occur in the optimal solution. By the integrality of the x-variables we have that the following are valid inequalities for any : (13) With similar reasoning, we define the complete class of precedence inequalities and we prove correctness. In this summary, the precedence inequalities will be stated without proof in Theorem 2. First we introduce some necessary notation: let indicate the set of arcs in A that start in S and end in T, for S and T being nodes or sets of nodes and let be the shortest distance from to using only arcs from set and visiting all nodes in . We can now state Theorem 2: Theorem 2. Precedence inequalities For given scenarios , given nodes , given node sets and given arc sets , such that the following are valid inequalities: (14) Again we find that identifying the valid inequalities that violate a given fractional solution is difficult. In fact, we prove it to be NP-hard under mild assumptions. For this reason we experiment with searching through only certain subsets of the precedence inequalities. One of the proposed strategies can even be performed in polynomial time. Results and conclusion We test our branch-and-cut algorithm on the test-instances
MET | Volume 21 | Ă&#x;ETA Special | 2014
provided by Spliet and Gabor (2014). Without adding the precedence inequalities, we can already solve all testinstances to optimality. The total time necessary to do so is less than 3% of the time required by the branch-priceand-cut algorithm presented in Spliet and Gabor (2014) ran on the same machine. Separating subsets of the precedence in equalities reduces the size of the search tree by about a half and causes a speedup of another 10%. Tests on instances with more customers show that we can now solve instances with up to 35 customers and 3 scenarios within one hour of computation time, significantly improving upon the maximum number of clients for which the TWAVRP can be solved to optimality. It has become clear that, for solving the TWAVRP, flow formulations in branch-and-cut are competitive with setpartitioning type formulations in branch-price-and-cut. Furthermore the novel set of precedence inequalities has been proven to be effective. This makes separating the precedence inequalities an interesting subject for further research. Also applying the precedence inequalities to branch-cut-and-price can be of interest. Reference [1] Jens Lysgaard. CVRPSEP: A package of separation routines for the Capacitated Vehicle Routing Problem. Working paper, Aarhus School of Business, 2003. [2] Jens Lysgaard, Adam N Letchford, and Richard W Eglese. A new branch-and-cut algorithm for the capacitated vehicle routing problem. Mathematical Programming, 100(2):423 445, 2004. [3] Remy Spliet and Adriana F Gabor. The time window assignment vehicle routing problem. Transportation Science, page forthcoming, 2014.
MET | Volume 21 | Ă&#x;ETA Special | 2014
23
Evaluating the diversification benefits of new asset cla Derrick Olij Erasmus University Rotterdam
Diversification is a common used method to decline the risks and improve the returns of an asset portfolio. This risk is traditionally expressed with the variance of the portfolio. The last fifteen years several extreme events have made risk managers more focused on the risk of large negative returns (the so called tail risk). This tail risk is not well represented by the variance, therefore other risk measures have become more populair recently. In this thesis a framework is introduced that can be used to determine how an asset improves the riskreturn trade-off of a portfolio when the risk is not only expessed with the variance, but also when risk is expressed as the Conditional Value at Risk (the CVaR). As a case study the performance improvement of a stock bond portfolio by Real Estate Investment Trusts (REITs) is investigated.
24
Introduction In this article a framework is introduced to determine the diversification benefits that are caused by adding asset classes to a benchmark portfolio. A diversification benefit is defined as a significant improvement of the performance of an asset portfolio when it becomes possible to invest in certain new asset classes. Traditionally the performance of a portfolio is measured by the mean return and the variance of the portfolio. Since a few years ago, other risk measurements have become more popular amongst risk managers. This is mainly caused by two reasons: 1. The portfolios variance depends on both negative and positive return shocks, while a risk manager is mainly interested in the negative return shocks. 2. The variance commonly does not represent the tail risk of negative return shocks well. In this article therefore not only the variance, but also the Conditional Value at Risk (CVaR) is used as a risk measure. The CVaR is a measure that only takes into account negative shocks, represents the tail risk much better than the variance and is more suitable for optimization problems than other risk measures (Artzner et al., 1999). To implement the CVaR in the framework a new multivariate CVaR spanning test is introduced in this article. The framework presented in this article is used to investigate the diversification benefits of European and American Real Estate Investment Trusts (REITs). Although the REIT structure was already invented in the 60s in the US, the European REIT market has only recently grown to a significant market, because before 2003 no large country had a legal status that was comparable with the REIT. Since the start of the millennium several large European countries have introduced REITs. The growth of this market has lead to a rapid expansion of the total market capitalization and to a big increase in the number of available funds in Europe. The large increase in real estate investment opportunities have made investors more curious towards the influence these REITs will have on their asset portfolios. There are three main reasons that suggest an investigation on the effects of REITs on a portfolio is useful: The first reason is that several studies have shown that the correlation of international stock portfolios has increased (e.g. due to globalization). This reduces the diversification possibility of this asset class. Many studies have shown
MET | Volume 21 | Ă&#x;ETA Special | 2014
asses in a portfolio
that the Real Estate asset class can increase the diversification of a portfolio. Investing directly in Real Estate is difficult due to some problematic features: It does not have a central market, the transaction sizes are large, the transaction costs are high, it has a low liquidity and the need for local market knowledge is high. It is therefore more favorable to invest in Real Estate in an indirect securitized form by investing via REITs, which do not have the problematic features that are mentioned here. The second reason is that the sub-prime crisis in the US has shown that the linkage between the stock, bond and real estate market is clearly present. An investor wants to know whether the linkage between these asset classes changes during extreme events. This is especially important, because REITs contain properties of all the three asset classes mentioned: It has stock dynamics due to the fact that it is publicly tradable, it contains bond properties due to the obligated payout of profits and it contains properties of Real Estate since that is the main class the fund invests in. The third reason is that not much research has been done on the performance of European REITs for international investor, especially not from the perspective of a European investor. Most literature shows that currency fluctuations have a large impact on the performance of REITs. In this article the influence of REITs on several stock and bond benchmark portfolios is analysed from 2005 to 2014. It is investigated whether REITs improve the risk return trade-off of a benchmark portfolio of stocks and bonds. The improvement caused by the addition of REITs to a portfolio is evaluated from an in-sample and out-ofsample perspective. In-sample spanning tests are applied and outof-sample risk-return asset allocation optimization methods are used. The investigation of the in-sample performance of REITs will show what the potential of REITs is to improve the risk return trade-off of the benchmark portfolio. The out-of-sample analysis will illustrate a more realistic view of the performance of the REITs since, in real life investors do not have access to asset information in the future. The in and out-of-sample analyses are executed twice: First the variance is used as a risk measure, second the CVaR is used for that.
MET | Volume 21 | IĂ&#x;ETA Special | 2014
Methodology The investor Recent work of Lustig & Verdelhan (2008) and Lustig et al. (2011) has brought up new evidence that shows that foreign exchange returns contain systematic risk and are therefore possibly not random and independently distributed. This means that potential returns of international real estate could be under or over estimated, because they are influenced by these currency fluctuations. The performance of REITs are therefore evaluated for investors from three different regions (with three different currencies): The EMU zone (Euro), the UK (Pound) and the US (Dollar). To further reduce the influence of exchange rates, hedged returns will be used: (1) Where Pt is the asset price, St is the local currency of the asset versus the home currency of the investor at time t and Ft|t+1 the price of a one-period long forward contract at time t. Per region three investors are specified that all have a portfolio with a different level of the international exposure. The levels of international exposure of the investors portfolios are summarized in table 1. It is assumed that when an investor starts investing in foreign assets he first starts by buying bonds and stocks, and then considers including REITs in his portfolio.
In-sample analysis For the in sample investigation mean-variance and mean CVaR spanning tests are used. A spanning test uses the complete sample of observations. If spanning is rejected,
25
the efficient frontier of a portfolio has significantly improved due to the addition of a new asset class to the set of assets that are available for investing. The test uses two types of assets: The first type is the benchmark assets, which are the assets that are used to construct the investors initial benchmark portfolio from. In this article these are stocks and bonds. The second type of assets are the test assets. These assets are later added to the benchmark portfolio. Their influence on the benchmark portfolio is evaluated in the proposed framework.â&#x20AC;&#x2DC; In this article that are the REITs. Mean-Variance Spanning test Define . These are the raw returns of the N + K risky assets at time t. R1t is a K-vector of returns of the K benchmark assets (stocks and bonds) and R2t is a N-vector of N test asset returns (REITs). R2t can be projected on R1t : , (2) With: and Here 0N x K is a (N x K) matrix of zeros and 0N is a N-vector of zeros. Then let . Huberman & Kandel (1987) show that the null hypothesis for spanning consists of the following restrictions: and . Also the interception of the tangency and GMVP portfolio will be investigated. These tests show whether the riskreturn trade-off of the tangency portfolio has improved and whether the riskiness of the GMVP portfolio has decreased when REITs become available for investments. For the tangency and GMVP intersection tests the model restrictions are: and respectively. Here Rf is the risk-free rate. Ferson et al. (1993) use the multivariate expression of equation 2:
R2 = X B + E
(3)
T is the length of the time series, R2 is a T x N matrix of test assets (REITs) and X is the T x (K +1) matrix with rows . Further and E is the T x N matrix of . Than define and . The null hypothesis for mean variance spanning can than be written 26
in multivariate form as , with
is rewritten as: (4)
. Now
and A GMM Wald test is now used to test the null hypothesis. This test takes into account that the residuals of the spanning regression E are not normally distributed, are autocorrelated and contain conditional heteroskedasticity. The GMM moment conditions are: . It is assumed that the first four moments of Rt are stationary. The moments of the sample are given by: The estimate of B is found by minimizing St is the consistent estimate of . With the estimation of , can be determined. This is used to write down the GMM version of the Walt test:
(5)
Where:
(6)
Here
is the expected value of all assets, the expected return of the benchmark assets, V = Var [R] the variance of all assets and V11 the first element of the matrix V. Mean CVaR Spanning test The proposed method to test diversification benefits of the test assets is a multivariate mean CVaR spanning test. This is a instrumental variable (IV) quantile regression. The test is inspired by an article of Polbennikov & Melenberg (2005). The test evaluates the improvement due to the test assets (REITs) of the mean-CVaR efficient market portfolio, which is constructed from the available benchmark assets (stocks and bonds). The weighting of this efficient market portfolio (wz) is determined with the allocation algorithm of Rockafellar & Uryasev (2000):
s.t.
,
,
,
MET | Volume 21 | Ă&#x;ETA Special | 2014
, (7)
Here R1 is the raw returns of the benchmark assets and is the confidence level of the Value at Risk (VaR). Q is the number of scenarios that is used to estimate the CVaR. uz is the expected return of the investor. Again a univariate regression is used as basis for the spanning test:
with:
,
,
(8)
Here is the excess return of a test asset at time t. V is a semi-parametric instrument that represents the empirical CVaR and depends on FZ which is the empirical distribution of Z:
(13)
(14)
Where is a matrix of residuals and a matrix that is constructed as follows: (15)
(9)
Here j indicates the number of the column in the matrix. The significance of is tested with a Wald test that follows a distribution with N degrees of freedom.
(10)
(16)
Here I[.] is an indicator that is 1 when the indicator function between the brackets is true and 0 otherwise. The null hypothesis of the mean-CVaR spanning test is .
Eq. 8 is estimated multivariate with a IV estimation. When this regression is estimated multivariate, the estimated parameters become vectors: and are (1 x N) vectors of the estimated parameters. For this estimation two variables are introduced: W and U with as typical rows and . The multivariate IV estimators of eq. 8 is given by: (11) Mean CVaR spanning occurs when the intercept parameter is not significant. The significance of will be determined by a multivariateWald test. The variance that is necessary for that test can be obtained from the asymptotic result of the semi-parametric IV estimation of :
(12)
Out-of-sample For the out-of-sample allocations a rolling window approach is used. For every timestep t a portfolio of assets is constructed. The performance of this portfolio is evaluated with the true returns at t+1. The allocation is done twice per timestep: First portfolios only are only constructed from benchmark assets, second portfolios are constructed from both benchmark and test assets. The performance difference will be compared using the Certainty Equivalent returns (CEQ). The significance between the CEQ with and without REITs is evaluated using the bootstrap approach of the White reality check (White, 2000). The allocation is based on the two risk measures. Mean variance allocation The asset weighting in the constructed portfolios (wta) are chosen by maximizing the Sharpe ratio of the portfolios:
s.t. and The CEQ is the average of the realized utility:
(17)
Here indicates a Wishart distribution with N degrees of freedom. IN is a (N x N) identity matrix. G and are defined as follows:
(18) The realized mean variance utility at timestep t is defined as: (19)
MET | Volume 21 | Ă&#x;ETA Special | 2014
27
is the realized utility at time t, t is the realized return of the asset allocation at time t. is the expected of the portfolio at time t based on information on time t - 1. Here win is the length of the rolling window. Mean-CVaR allocation The mean CVaR allocation is executed with the algorithm of Rockafellar & Uryasev (2000) (see eq. 7). The performance is expressed with the CEQ (eq. 18). The realized utility, which is the input variable of the CEQ equation, is defined as:
(20)
gency portfolio it is counter intuitive that the frontier significantly improves while its components (the GMVP and tangency portfolio) do not. The part of the frontier between the GMVP and the tangency portfolio is constructed with positive weights in both portfolios while the part of the frontier above the frontier is constructed with negative weights in the GMVP and positive weights in the tangency portfolio. When the movement of the frontier is mainly located at the part of the frontier above the tangency portfolio, spanning can be rejected, while intersection can not. This situation is only interesting for an investor that can build up short positions. Short positions are difficult to obtain for REITs and it can therefore be concluded that the risk return trade-off of the portfolio of the EMU-B investor does not significantly improve.
Here is the realized return of the asset allocation w. is the risk aversion coefficient and the VaR at time t, based on the asset weighting at time t. Results In table 2 the results of the in-sample analyis is shown. The efficient CVaR frontier is linear and therefore for this analysis a expected return is required to construct the market portfolio. This expected return can be interpreted as the level of risk an investor is willing to take. The construction of the portfolios is done for three different annualized excess expected returns that all represent another risk level: 0.5 % (low risk), 2.5% (medium risk) and 5.0 % (high risk). Due to the relative short dataset of the European REITs a large confidence level of the VaR is used to estimate the CVaR efficient portfolios ( = 0.90). The number of scenarios that is used to estimate the CVaR Q = 100 weeks. The risk-return trade-off of the investors portfolio has improved when spanning is rejected. From table 1 it can be concluded that this is only the case for the EMU-B portfolio for mean variance spanning and the UK-A portfolio for mean-CVaR spanning, assuming a confidence level of 95%. The EMU-B investor holds both the assets from the EMU zone as well as the UK and has the Euro as its domestic currency. For this investor spanning is rejected, while intersection for the tangency and the GMVP portfolio are not. Thus the mean-variance frontier has significantly moved up, while the tangency and the GMVP portfolio have not. Since every point on the mean-variance frontier can be constructed from a combination of the GMVP and tan-
Conclusions In this paper a framework to measure the performance improvement of a portfolio can be measured when new asset classes become available for investment. The framework is used to measure the performance improvement of a stock-bond portfolio by REITs. It can be concluded that from the time period between 2005 and 2014 REITs did not significantly improve the risk-return trade-off of the benchmark portfolio of bonds and stocks, from both an in-sample as an out-of-sample perspective. Most literature
28
MET | Volume 21 | Ă&#x;ETA Special | 2014
that is available on REITs show that before 2005 US REITs positively influenced the risk-return trade-off of mixed-asset portfolios. The tests in this article suggest that US REITs do not show this is behavior during the 20052014 period. This suggests that the performance behavior of the US REITs has changed. It is very likely this was caused by the subprime crisis, which is a large event that, partly due to the limited size of the datasize, has a relatively large effect on the returns and the volatility of the assets that where used in this investigation.
growth risk. Review of Financial Studies, 25, 3731–3777. [6] Polbennikov, Simon, & Melenberg, Bertrand. 2005. Testing for mean-coherent regular risk spanning. CentER Discussion Paper, 2005-99. [7] Rockafellar, RT, & Uryasev, S. 2000. Optimization of conditional value-at-risk. Journal of risk, 2, 21–41. [8] White, H. 2000. A reality check for data snooping. Econometrica, 68(5), 1097–1126.
References [1] Artzner, P, Delbaen, F, Eber, JM, & Heath, D. 1999. Coherent measures of risk. Mathematical finance, 9(3), 203–228. [2] Ferson, WE, Foerster, SR, & Keim, DB. 1993. General Tests of Latent Variable Models and Mean-Variance Spanning. the Journal of Finance, 48(1), 131–156. [3] Huberman, G, & Kandel, Shmuel. 1987. Mean-Variance Spanning. The Journal of Finance, 42(4), 873–888. [4] Lustig, H., Roussanov, N., & Verdelhan, a. 2011. Common Risk Factors in Currency Markets. Review of Financial Studies, 24(11), 3731–3777. [5] Lustig, Hanno, & Verdelhan, Adrien. 2008. The crosssection of foreign currency risk premia and consumption
MET | Volume 20 | ßETA Special | 2013
29
Congratulations to the winners of the BETA!