The econometric discrete dependent variable multinomial Logit model by Giovanis Eleftherios

The econometric discrete dependent variable multinomial Logit model Eleftherios Giovanis

This paper examines the consumersâ&#x20AC;&#x2122; preferences to the local furniture market in the Province of Serres. We apply a multinomial logit model to investigate the probability of buying a furniture in the following four-monthly period. We analyze also the demographic characteristics and we conclude that they are playing a major role among other factors. The questionnaire that will be analyzed in the particular project is a subset of the prototype, while the questions that were included in the initial questionnaire were too many, as a result the analysis to be quite long. So we tried to concentrate and to be restricted at the most important factors that they practice a great influence to the consumersâ&#x20AC;&#x2122; choice decisions. Introduction According to the findings of the

sector-based study that was realised by ICAP

(COMCENTER , 2007) in Greece the majority of the productive furniture units are characterized by the small size, while usually are of familial nature and they do not have automated production. The productive units of medium and big size it is appreciated that they approach the 30% of the market share. The conclusions of this study are that the Greek enterprises present a decreasing export activity, while a shift of the market share to the super-markets has been marked, as well as to the importing enterprises via franchising. An other conclusion of the study is that the purchase and the furniture consuming are directly connected with the disposable income . So the problem that emerges is that an important part of the disposable income of Greek households is absorbed because of the obligations of the loans settlement. This fact results to the time change of the existing furniture replacement. The domestic furniture consumption marked an increasing course during the period 19982006 with an average annual of 4.6%. At the year of 2006 the living room furniture it is appreciated that they assembled the 47.0% of the total domestic market, the bedroom furniture covered the 27.0% of the total domestic market, while the dining room furniture assembled a percentage of 26.0% (COMCENTER , 2007) Jonkers (2006) in a report that was conducted in collaboration with the CBI Market Survey finds that one of the major threats in the Greek domestic furniture market is that the

Greek economy is quite dependent on furniture imports , based mainly on low prices, such they arise opportunities for the developing country exporters, because the imports from these countries are increasing at a faster rate than the imports from the developed countries. According to Jonkers the best opportunities are in living and dining room furniture, where domestic production is declining. In the same survey, imports increased by 80% in value between 2001 and 2005, while exports were increased only by 14%. The major developing countries exporters are China with Є 56 million, Turkey with Є 28.8 million , Indonesia with Є 16.4 million, Vietnam with Є 10.2 million, India with Є 4.9 million, Malaysia with Є 4.2 million and then smaller suppliers are followed, as Albania, Egypt and South Africa. As for the furniture exports the largest destination country is Cyprus, while Bulgaria, Germany and Romania are followed. The most important firm in Serres , and one of the most important in Greece, is the firm “DROMEAS” ABEEA , which was established in 1979 and it is sited at the Industrial Area outside Serres, about 80 km northeast of Thessalonica. Some of the firm’s achievements are the equipment of 10,000 seats for waiting area of Manila’s airport and 4,000 seats for two airports in Egypt. Smaller tasks bear its stamp in UK, Saudi Arabia and Australia. Also the firm undertook the 40.0% of the furniture production that Olympic Committee was needed (Interwood, 2007). Some other furniture firms and shops

that are taking place in the

Prefecture of Serres are “BLACK RED WHITE”, “Fratzana”, “Kioutsoukis”, “ARREDO”, and shops like “NEOSET”, “SATO” and others. The main role of this project is to present some of the most important Logit models, that can be used in the marketing survey researches and to choose the possible best model, while this model choice it’s not unique, but is depended in the kind of product or service, the questionnaire and sample design , the kind of the market , the city or the country , as also the demographic characteristics, where a specific research is taking place.

Data The data have been obtained by a marketing research that was realized by telephone interview on 12-15 February of 2008 and was conducted by the firm â&#x20AC;&#x153;Analysis Centerâ&#x20AC;?. The sample is 387 households and is being referred in the Prefecture of Serres in the region of Macedonia of Greece. In the first stage the sample design was random, but in the second stage data have been weighted based on age and sex. We must notice that the marketing survey is refereed to households, but we are concern and for the sex too, because we would like to obtain hypothesis test about the opinion and the preferences difference between the two sexes. The weightings have been made based on the demographics data provided by National Statistical Service of Greece. As concerning the urban weighting, is not necessary because the research is reported for the city of Serres and the Capitals of regional Municipalities, so we are concerning about only to urban population. We must notice that if the sample in the first sample was not random, but stratified, as the industries in a specific sector, or particular age category of particular sex, or a specific geographical region the weighted models would create problems, as low standard errors and consequently erroneous interpretation of test significance hypothesis. We must mention that itâ&#x20AC;&#x2122;s not possible to refer the name of the firm which gave the order of the specific marketing research for private rights, but we are just trying to give a guide of different approaches in the estimation of Logit models, as well the interpretation of the results.

Methodology

The first thing that we must point out is to explain why we must take the Logit and not the Probit model. In most application the two models are quite similar, while the main difference is that the logistic distribution has slightly fatter tails, as we can see in figure 2.1. Also there is no important reason to choose one model over the other. Actually many researchers prefer Logit model, because of its mathematical simplicity (Gujarati, 2004). 2

Figure 1 Probit and logit cumulative distributions

In the model , that we will take, we would like to estimate the probability of buying furniture in the next four-monthly period based on the kind of the furniture that consumers generally would prefer to buy, on the criteria they choose the shop, on how much money they intend to give , on demographics data as sex, age, income and profession. The multinomial logit model in its general theoretical form is:

Li = a + b1C1 + b2 C 2 + b3 C 3 + b4 C 4 + b5 C 5 + b6 Crit 1 + b7 Crit 2 + b8 Crit 3 + b9 Crit 4 + b10 Mon 2 + b11 Mon 3 + b12 Pr ice + b13Variety + b14 loy + b15 Inf 1 + b16 Inf 2 + b17 Inf 3 + b18 Inf 4 + b19 Inf 6 + b20 Sex + b21 age + b22 id + b23 Inc 1 + b24 Inc 2 + b25 Inc 3 + b26 Inc 4 + b27 Pf 1 + b28 Pf 2 + b29 Pf 3 + b30 Pf 4 + b31 Pf 5 + b32 Pf 6 + b33 Pf 8 ,where α is constant, C1 is a dummy variable and is referred to question 1 , table 3.1, presented in 3,where C1=1 for Living rooms and C1=0 otherwise where C2=1 , C3=1, C4=1, C5=1 for dining rooms, Children furniture, Garden furniture, Bedrooms and Office furniture respectively and zero otherwise. Crit is a dummy variable and is referred to question 2 , table 3.2 where Crit1=1 for Price and Crit1=0 otherwise where Crit2=1, Crit3=1, Crit4=1 for quality, variety and trade name respectively and zero otherwise. Mon2 is a dummy variable and is referred to question 3, table 3.3 where Mon2 =1 for 250-600 € and Mon2 =0 otherwise and Mon3 =1 for ≥ 600 € and Mon2 =0 otherwise.

Variables “Price”, “Variety” are

quantitative variables and are referred to question 4 , table 3.4. Loy is a dummy variable and is 3

referred to question 5, table 3.5 where

loy=1 for Serres and loy =0 otherwise. Inf1 is a

dummy variable and is referred to question 3.6 , table 6 where Inf1=1 for TV and Inf1=0 otherwise and so for other variables Inf. Variable Sex is a dummy variable where Sex=1 for male and Sex=0 for female. Variable age is quantitative variable and is presented in table 3.10. Variable “id” is a dummy variable where equals with 1 when the consumer lives in the Municipality of Serres and equals with 0 the consumer lives in the regional Municipalities of Serres Prefecture. The reason why we are taking this variable is to examine if the consumers are characterized by homogenous preferences according to location or if there is heterogeneity among them. Of course we could make the analysis more complicated and to cluster into groups the main geographic regions but we make the hypothesis that the preferences are homogenous, because in question 2 the “location” criterion assemble only 0.9%, so it doesn’t play a crucial role in the consumer choices.

Variables Inc are the dummy income variables

where Inc1=1 for income <500 € and Inc1=0 otherwise. The same procedure followed for the other variables of income, Inc2, Inc3 and Inc4, and are presented in table 3.8. Finally variable Pf (table 3.8) are the dummy profession variables, where Pf1 =1 for employees in Rural Sector and Pf1 =0 otherwise. The same procedure is followed for the other variables of employment , Pf2, Pf3, Pf4 ,Pf5, Pf6, and Pf8. The dependent polytomous variable is Li where is referred to question 7, table 3.7 and it is Li =1 for those who answered YES, Li =2 for those who answered NO and Li =3 for those who answered MAY BE. So for a dummy variable with S categories, this requires the calculation of S-1 equations, one for each category relative to the reference category. When using multinomial logistic regression, one category of the dependent variable is chosen as the comparison category. This category will be for Li =3. The probability is defined as

Pr( y i = j ) =

exp( X i β j ) J

1 + ∑ j exp( X i β J )

(1)

,and the log likelihood function can be written as

∑

( y i = j ) X i β j − log( ∑ j exp( X i β j ) j

(2)

,where for the ith individual, yi is the observed outcome (dependent variable) and Xi is a vector of explanatory variables , categorical or not, while j is the particular outcome and J refers to all outcomes, except the base category. The unknown parameters βj are estimated by maximum likelihood (Bartels, Boztug & Muller, 1999). The explanatory variables in relation (1) doesn’t include the script t because the cases are the same for each choice j. With this model we intend to explain if an unordered set of outcomes applies to the different individuals in our sample, which means that probabilities of all these outcomes depend on the same characteristics (Davidson & MacKinnon, 1999). In the section of the results we will show a simple estimation example.

Multinomial Logit relies in the assumption which called

independence from irrelevant alternatives (IIA) . This assumption claims that disturbances are independent and homoscedastic (Greene, 2002). Because the dependent variable includes 3 outcomes we will consider outcome 1 (YES) as the base reference category and we will estimate for the other two outcomes . So the probability for outcome 1 (YES) will be

Pr( y i = 1) =

exp( X i β 1 ) J

1 + ∑ j exp( X i β J )

(3)

, for outcome 2 (NO)

Pr( y i = 2 ) =

exp( X i β 2 ) J

1 + ∑ j exp( X i β J )

, and finally the probability for outcome 3 (MAY BE) is

(4)

Pr( y i = 3) =

1 J

1 + ∑ j exp( X i β J )

(5)

(Davidson & MacKinnon, 1999) A final matter that we must analyze is that from question 4 we took only variables price and variety. The reason why we have done this is that consumers seem to respond in the same way, which means that price and quality might be considered as a single variable, grouped to one. So we are trying to reduce the number of variables to avoid the multicollinearity problem. Because those variables of question 4 are actually hierarchical, the procedure of the cluster analysis is an agglomerative hierarchical method that begins with all variables separate, six in our case, each forming its own cluster. . In the first step, the two variables closest together are joined. In the next step, either a third variable joins the first two, or two other variables join together into a different cluster. This process continues until all clusters joined into one, but we decide to take two groups as it is more logical for our data. First we must find the similarity measures between the variables and this can be done with the commonly correlation coefficient distance measure

n∑ xy − ∑ x∑ y [n∑ x 2 − (∑ x) 2 ][n∑ y 2 − (∑ y) 2 ]

(6)

Ward’s cluster method objective is to minimize the sum of squares of the deviations from the mean value (Žiberna et al, 2004)

ESS = ∑ i

∑ ∑X k

ijk

− x ik

(7)

Ward’s clustering method results are presented in the figure 2.2, where we conclude that the first groups constitutes by price, quality, service and service after shopping and the second group constitutes by variety and delivery. The next step is taking the averages of each group and to obtain the new variables. 6

Figure 2 Wardâ&#x20AC;&#x2122;s clustering method D e n d r o g r a m w it h W a r d L in k a g e a n d Ab s o lu t e C o r r e la t io n C o e f f ic ie n t D is t a n c e

Similarity

-3 , 7 0

3 0,8 6

6 5,4 3

10 0,0 0 p ri c e

q u a l l i ty

se rv i c e

se rv i c e a f t e r sh o p p i n g

v a ri e t y

d e l i v e ry

V a ria b le s

Second method is principal components. First we find the covariance matrix of the six above variables. Then we find the eigenvalues of the covariance matrix in table 2.1. There are two components with eigenvalues greater than unit. Table 2.2 presents the first principal component eigenvector and we conclude again that we can obtain variables price, service, quality and service after shopping as one, and from the other side variety and delivery as another variable. The first method is the frequency weighted multinomial logistic regression based on age. The survey was conducted based on households but age plays an significant cluster variable because there isnâ&#x20AC;&#x2122;t great age difference between couples and from the age we can generate important significant. This is explained because the category of 30-50 years old presents the greatest majority and frequency, especially in the city. So this category has the greatest weight than the corresponding categories 18-24 old or 65 years and more, because couples that belong in the category of 30-50 years old are more likely to buy furniture, for various reasons as marriage, for replacement, because of deterioration or renovation, or to buy for their children, that they will live in other house or in other city for educational purposes, working or marriage. The probability is: 7

Pr( y i = j ) =

exp( X i β jWi ) J

1 + ∑ j exp( Wi X i β J )

(8.a.)

, while 8.a. can be written as

n − j −1 ) ) mj Pr( y i = j ) = J n − J −1 1 + ∑ j exp(( ) X iβ J ) mJ exp( X i β j (

(8.b.)

Where n is the number of observations, j is the specific outcome, J express all the outcomes, except the base category, and m is the number of cases (Langholz & Goldstein, 2001). So for example if there are three persons of 30 years old, where the cases m equals with three, who choose outcome 2 (NO), what is the probability based on the questions and the demographics data? The second model is the weighted robust multinomial Logit , where we obtain the same weight as in the case of the weighted multinomial Logit model. The problem that arise in the previous model is that MLE method and Rao’s score test can be misleading in the model misspecification because of misclassification errors or extreme data points, the well known outliers, in the sample (Pia & Feser, 2000). Pregibon (1982) suggests some tools that remove data from the sample. But the problem that arises is that, while this procedure is iterative, leaves the analyst with a considerably reduced sample. Robust is the well known HuberWhite sandwich variance estimator. Probabilities are defined as in the 8.a. The Huber-White variance estimator is ∧ ∧ 1 ∧ ∧ −1 ∧ ∧ VE = [ H (β Ε )] Φ(β Ε )[H (β Ε )]−1 n

∧

, where

∧

H (β Ε ) =

∧

1 ∂ logg( yi | xi , β Ε ) ] ∑[ ∧ ∧ n i=1 ∂ β Ε ∂ β΄Ε n

(9)

(9.a.)

∧

H is the Hessian matrix and ∧ ∧

Φ=

∧

∂ log g ( yi | xi , β ) ∂ log g ( yi | xi , β ) 1 [ ][ ] ∑ ∧ ∧ n i =1 ∂β ∂ β΄ n

(9.b.)

∧

,while if β Ε is the true MLE estimator then VE simplifies to {−[Η ( β Ε )]}−1 .(Greene, 2002). We notice that these standard errors, in the case we study, are robust for certain misspecifications of the distribution of dependent variable and not for heteroscedasticity. The reason why we claim that is that the assumption where disturbances are independent and homoscedastic is confirmed with Hausman’s test and we will analyze it in next part of the project. The third method is the replication method with Jackknife standard errors. Jackknife is a non nonparametric technique for estimating standard error of a statistic. The procedure is a systematically recomputation of the statistic estimation leaving out one observation at a time from the sample set. Thus, each subsample consists of n − 1 observations formed by deleting a different observation from the sample. The jackknife estimator and its standard error are then calculated from these truncated subsamples (Greene, 2002). For example, suppose θ is ∧

∧

the parameter of interest and let θ (1) , θ ( 2) ....θ ( n ) be estimations of θ based on n subsamples each of size n − 1. The jackknife estimator of θ is given by (Wolter, 2007) n

∑

∧

(i)

i=1

(10)

n ∧

and the jackknife estimate of the standard error of θ J is

∧

n −1 =[ n 9

∑

i =1

∧

(θ

∧

(i)

− θ J ) 2 ]1 / 2

(11)

The t-statistic can be defined as ∧

∧

n (θ ( i ) − θ J )

∧

∧ 1 n ∧ [ (θ ( i ) − θ J ) 2 ]1 / 2 ∑ n − 1 i =1

(12)

Results

We must notice that there isn’t something equivalent and available, in the literature , to be able to compare our results with other findings. Marketing research firms are dealing with these matters, but these results are not available in public. From the results that are presented in tables 1-3 in appendix we conclude that we reject the simple weighted Logit model because of the great number of the statistical insignificance of the variables, even if from table 4 and the Hausman test we conclude that the independence from irrelevant alternatives (IIA) hypothesis is true. Also we reject the weighted Logit model with robust White-Huber standard errors because of the heteroscedasticity presence and so the IIA assumption violation. So we accept as the best estimation the weighted multinomial Logit with Jackknife standard errors, which satisfies also the IIA assumption. So if we would like to make a probability prediction for a consumer of buying or not or not sure of buying in the next four-monthly period we will take the following probabilities. Pr( y i = 1 ) =

exp( L 1 ) 1 + Σ exp( L T )

, Pr(

yi = 2) =

exp( L 2 ) 1 + Σ exp( L T )

and Pr( y i = 3 ) =

1 1 + Σ exp( L T )

So for example if a consumer chose from question 1 the answer Living rooms, the main criterion of buying from a furniture shop is the price, is female ,she intends to spend 250-600 €, she marks all the characteristics of her previous shopping – price, quality and the others-

with 5, she is 30 years old, she prefers Serres , as the region of shopping, she prefers to be informed by leaflets, her income is 1001-1500 €, the profession is businessman and she lives in the Municipal of Serres, then by Table 3 in appendix the probabilities for the multinomial Logit with Jackknife standard errors will be. L1 = -24.646 + 1.977 – 2.446 + 5*0.837 + 5* 0.415 – 30*0.042 + 0.783 –3.635 – 2.05 + 26.610 -1.998 + 8.491 = 8.093 for outcome 1 and L2 = -22.896 - 18.582 – 1.496 + 5*0.541 + 5* 0.344 + 30*0.048 +0.269 +4.091 1.604 + 32.296 -1.266 + 16.870 = 8.559 for outcome 2 Pr( y i = 1 ) =

exp( L 1 ) exp( 8 . 093 ) = = 1 + Σ exp( L T ) 1 + exp( 8 . 093 ) + exp( 8 . 559 )

3271 . 48 = 38 . 50 % 8485 . 94

Pr( y i = 2 ) =

exp( L 2 ) exp( 8 . 559 ) = = 1 + Σ exp( L T ) 1 + exp( 8 . 093 ) + exp( 8 . 559 )

5213 . 46 = 61 . 40 % 8485 . 94 and

Pr( y i = 3 ) =

1 1 = = 0 .1 % 1 + Σ exp( L T ) 8485 . 94

Performance test of the proposed model The next step is to apply a Monte-Carlo simulation to test the performance evaluation and capability of the model we are presented. The expected coefficient value can be defined as (Janke, 2002)

1 X = N

∑ f (X i =1

)

(13)

, where X is the expectation value and the estimator X is a random number fluctuating around the theoretical expected value. The variance is 2

σ Χ 2 = (Χ ) − (Χ) 2 , where we can take the standard error

σ N

(14)

. We must mention that the formula of standard

error is important, because the standard error of a Monte-Carlo simulation analysis decreases with the square root of the sample size. Also if we would like for example a 50% error reduction, or a 50% increase in

accuracy, we must quadruple the number of random

drawings. As we already know, from relation (1)

π j ≡ Pr( y i = j ) =

exp( X i β j ) J

1 + ∑ j exp( X i β J )

(15)

So we can draw a predicted value y, from a multinomial distribution with parameters equal to πj and n=1. We simulated the model with 500 set of parameters and then we took relations

(13) and (14) to find the mean estimated parameters and their standard errors. We decided to simulate our estimations because our sample is finite so the parameter estimations are never certain (Tomz et al, 2000) and probably not reliable and efficient. More specifically the program draws simulations of the parameters from their asymptotic sampling distribution equal to the vector of the estimated parameters and variance equal to the variance-covariance matrix of estimates (Tomz et al, 2000).

From the results of table 7 we conclude that our

model is fairly good, because the estimated coefficients by Monte-Carlo simulation are very close to the estimated coefficients of the multinomial weighted Logit model with Jackknife standard errors.

Conclusions

We applied three different multinomial Logit models for the marketing research survey that was conducted in the Prefecture of Serres , for the case of the furniture market. The scope of the research was the probability estimation of buying furniture, in the next four-monthly period, based on the questionnaire and the demographic characteristics of the potential consumers. We found that the simple weighted multinomial Logit is suffering by many statistical insignificant variables, as there is a great possibility of the multicollinearity problem. From the other side the weighted multinomial Logit, with Huber-White robust standard errors presents heteroscedasticity and violates the IIA hypothesis. So we preferred to choose the weighted multinomial Logit, with jackknife standard errors. We applied a simple Monte-Carlo simulation and we concluded that the proposed model is quite a good option in our case. We must mention that there are also other good estimations, as the Principal Components (PC) logit or bootstrap, but the estimation are quite similar, with that of the model we propose here, so it’s not necessary to present the results. It’s just worthy of mentioning these methods, as PCA-logit or bootstrap, because in some other cases the estimations might be quite better.

References COMCENTER (2007),,“ The highly-fragmented furniture market in Greece” , I.C.A.P.

Bartels K., Boztug Y. & Muller M., (1999) “Testing the multinomial logit model”, working paper, University Potsdam, Humboldt-University at Berlin, Germany Davidson R. & MacKinnon G.J., (1999), “Econometric theory and methods,” Oxford University Press, New York ,pp. 460-462 Greene H.W., (2003), “Econometric Analysis,” Fifth edition, Prentice Hall, New Jersey, U.S.A. , pp. 518-521, 724, 924 Gujarati D., (2004), “Basic Econometrics,” Fourth edition, McGraw-Hill, U.S.A., pp. 614-615 Interwood magazine , (2007) , “Dromeas presentation,” pp. 12-21

Janke W., (2002), “Statistical Analysis of Simulations: Data Correlations and Error Estimation,” John von Neumann Institute for Computing, Julich, NIC Series, Vol. 10, pp. 423-445. Jonkers J. (2006), “The domestic furniture market in Greece,” CBI MARKET SURVEY, Centre for the promotion of imports from developing countries, The Netherlands Langholz B. & Goldstein L., (2001), “Conditional logistic analysis of case-control studies with complex sampling,” Biostatistics, 2(1), 63-84. Pia M. & Feser V., (2000), “Robust Logistic Regression for Binomial Responses”, working paper, University of Geneva. Pregibon, D. (1982). “Resistant fits for some commonly used logistic models with medical applications,” Biometrics 38, 485-498. Tomz M., Wittenberg J., King G., (2000), “Making the Most of Statistical Analyses: Improving Interpretation and Presentation,” American Journal of Political Science, Vol. 44, No. pp. 341–355 Wolter M. K. ,(2007), “Introduction to Variance Estimation,” behavioural sciences , Second Edition, Springer, 151-153

Statistics for Social and

Žiberna A., Kejžar N. & Golob P., (2004), “A Comparison of Different Approaches to Hierarchical Clustering of Ordinal Data” , Metodološki zvezki, 1(1), 57-73

TABLE 1 EIGENVALUES Eigenvalue Proportion Cumulative

2,2600 0,377 0,377

1,1140 0,186 0,562

0,9610 0,160 0,722

0,6817 0,114 0,836

0,5238 0,087 0,923

TABLE 2. 1st PC factor Variable price service quallity variety delivery Service after shopping

PC1 0,441 0,489 0,527 0,205 0,177 0,463

TABLE 3 1.From which furniture category to you intend generally to buy? Living rooms Dining rooms Children furniture Garden furniture Bedrooms Office furniture

Percent 54.2 11.2 9.3 2.8 19.2 3.3

TABLE 4 2. Which are the main criteria of buying from a furniture shop?

Price Quality Variety Trade name Location

Percent 56.2 31.2 8.7 3.0 0.9

0,4595 0,077 1,000

TABLE 5 3. How much money do you intend to give? Percent 19.2 26.8 54.0

≤ 250 € 250-600 € ≥ 600 €

TABLE 6 4. Mark between 1 and five (5 is the best and 1 is the worst) the following characteristics you faced in your previous furniture shopping. Service

Percent

0.7

3.2

19.4

41.2

35.5

Mean

St. deviation

4.07

0.86

3.95

1.11

3.74

1.20

4.17

1.06

Price 1

4.8

5.5

19.0

31.5

39.2

Quality 1

7.8

6.5

22.6

30.4

32.7

Variety 1

2.4

6.7

14.1

24.7

52.1

TABLE 6 (Continue) 4. Mark between 1 and five (5 is the best and 1 is the worst) the following characteristics you faced in your previous furniture shopping. Delivery

Percent

35.3

11.7

6.7

9.5

36.8

Mean

St. Deviation

3.0

1.76

3.93

1.15

Service after shopping 1

4.7

5.9

23.9

22.7

42.8

TABLE 7 5. For the specific shopping do you prefer the Serres shops or other regions? Percent 74.5 17.4 4.2 0.3 3.6

Serres Thessaloniki Drama Bulgaria Other region

TABLE 8 6. How would you like to be informed about the furniture products? Percent TV

24.8

Radio

1.4

Newspapers-magazines

11.0

Leaflets

53.2

Phone contact

1.7

Internet

7.9

TABLE 9 7. Will you buy furniture in the following four-monthly period? Percent 19.5 66.0 14.5

YES NO MAY BE

TABLE 10 Income distribution and profession activity Income

Percent

<500 € 501-1000 € 1001-1500 € 1501-2000 € >2000 € Profession

11.2 33.2 29.7 12.5 13.4

Rural Sector

6.1

Public Sector Employee

16.9

Private Sector Employee

16.3

Businessman

11.8

Student

3.9

Household

20.8

Unemployed

6.1

Pensioner

18.1

TABLE 11 Sex Percent 46.5 53.5

MALE FEMALE

TABLE 12 Age Mean St. Deviation Std. Error of Mean

Percent 47.0 14.4 0.78

Market = 1 C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2 INF3 Log likelihood

Coef.

-24.19117 (2258.291) -24.3191 (2258.291) -25.38924 (2258.291) -22.60833 (1907391) -24.28615 (2258.291) 2.309676* (.184604) -1.402899 (.) -5.21230* (.3421758) 22.21736 (2258.291) -1.20423* (.3642494) -1.69661* (.348464) 1.564757* (.1200148) .2101363* (.0860052) .8190109* (.1819961) -.0253447 (.3857449) 20.11875 (.) -3.28183* (.3923674)

-0.01

INF4

-0.01

INF5

-0.01

Sex

-0.00

Age

-0.01

12.51

INC1

INC2

-15.23

INC3

0.01

INC4

-3.31

PF1

-4.87

PF2

13.04

PF3

2.44

PF4

4.50

PF5

-0.07

PF6

PF8

-8.36

Market =1

constant

TABLE 1 Weighted multinomial Logit model Coef. z Market = 2 Coef. -2.743014* (.3507851) 22.83438

-7.82

-2.679168* (.1854444) -.0599014* (.0087728) -1.906601* (.2293723) -5.837068* (.4140417) -4.76893* (.2847018) -2.029722* (.2858869) -7.152252* (.3930177) 28.05087 (2258.291) 24.8759 (2258.291) 28.19031 (2258.291) 28.53592 (2258.291) 21.98089 (5227809) 26.89052 (2258.291) 30.49286 (2258.291) 3.03079

-14.45

-6.83

-8.31

-14.10

CRIT1

-16.75

CRIT2

-7.10

CRIT3

-18.20

CRIT4

0.01

MON2

0.01

MON3

0.01

Price

0.01

Variety

0.00

LOY

0.01

INF1

0.01

INF2

INF3

-22.16686 (2258.291) -19.07691 (2258.291) -21.85798 (2258.291) 10.60145 (1369884) -21.49782 (2258.291) -17.626 (2258.291) -19.67238 (2258.291) -24.18795 (2258.291) 2.673512 (.) -1.06954* (.3539236) -.6975685* (.347466) 1.16457* (.1035494) .3212617* (.0649514) .2840219 (.1461295) -1.556478* (.3376938) 20.32117* (.3704731) -3.631752* (.3446816)

Market = 2

-0.01

INF4

-0.01

INF5

-0.01

Sex

0.00

Age

-0.01

INC1

-0.01

INC2

-0.01

INC3

INC4

-3.02

PF1

-2.01

PF2

11.25

PF3

4.95

PF4

1.94

PF5

-4.61

PF6

54.85

PF8

-10.54

constant 2

Pseudo R = 0.4113

-3154.662

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes significant in 5% level, z denotes z-statistics

Coef.

-3.368385* (.3205891) 16.97649* (.3756762) -1.333601* (.1519156) .0313198* (.0069501) -1.112417* (.1915824) -.8446702* (.3298488) -.1101128 (.2469728) 1.624504* (.2553209) -1.967171* (.2919626) 29.74646* (.2330322) 29.64446* (.2612) 29.98161* (.2875892) 33.55159* (.5056188) 63.10384 (4506271) 29.36098* (.2271631) 31.04017 (.) 11.004 (.)

-10.51 45.19 -8.78 4.51 -5.81 -2.56 -0.45 6.36 -6.74 127.65 113.49 104.25 66.36 0.00 129.25 . .

Market = 1 C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2 INF3 Log likelihood

TABLE 2 Weighted multinomial Logit model with Huber-White robust standard errors Market = 1 Coef. z Market = 2 Coef. z

Coef.

-24.1911* (1.311869) -24.3191* (1.079975) -25.3892* (1.43201) -22.6083* (1.588602) -24.28615 (.) 2.309676* (.2087961) -1.402899 (.) -5.21230* (.3268497) 22.21736 (.) -1.20423* (.2631651) -1.69661* (.247516) 1.564757* (.1194442) .0859671* (.0859671) .8190109* (.1962168) -.0253447 (.3071136) 20.11875 (.) -3.28183* (.3378955)

-18.44

INF4

-22.52

INF5

-17.73

Sex

-14.23

Age

11.06

INC1

INC2

-15.95

INC3

INC4

-4.58

PF1

-6.85

PF2

13.10

PF3

2.44

PF4

4.17

PF5

-0.08

PF6

PF8

-9.71

constant

-2.743014* (.2599947) 22.83438 (.) -2.679168* (.1868817) -.0599014* (.009422) -1.906601* (.2064382) -5.837068* (.3919177) -4.76893* (.2479698) -2.029722* (.2660761) -7.152252* (.3635541) 28.05087* (2.772297) 24.8759 (.) 28.19031* (.7815903) 28.53592* (3.422268) 21.98089 (.) 26.89052* (2.056111) 30.49286 (.) 3.03079

-10.55

-14.34

-6.36

-9.24

-14.89

CRIT1

-19.23

CRIT2

-7.63

CRIT3

-19.67

CRIT4

10.12

MON2

MON3

36.07

Price

8.34

Variety

LOY

13.08

INF1

INF2

INF3

-22.16686 (.) -19.07691 (.) -21.85798* (1.377216) 10.60145 (.) -21.49782 (.) -17.626* (1.776402) -19.67238 (.) -24.18795 (.) 2.673512 (.) -1.06954* (.2122592) -.6975685* (.2015474) 1.16457* (.1153233) .3212617* (.0571129) .2840219 (.1456701) -1.556478* (.1797354) 20.32117* (.3147879) -3.631752* (.2628416)

Market = 2

INF4

INF5

-15.87

Sex

Age

-9.92

INC1

INC2

INC3

INC4

-5.04

PF1

-3.46

PF2

10.10

PF3

5.63

PF4

1.95

PF5

-8.66

PF6

64.56

PF8

-13.82

constant 2

-3154.662

Pseudo R = 0.4113

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes significant in 5% level, z denotes z-statistics

Coef.

-3.368385* (.1704796) 16.97649* (.3390611) -1.333601* (.1386592) .0313198* (.007253) -1.112417* (.1512474) -.8446702* (.336651) -.1101128 (.2396833) 1.624504* (.2580996) -1.967171* (.3080401) 29.74646* (.297904) 29.64446* (.2209823) 29.98161* (.2531853) 33.55159* (.3761956) 63.10384* (.4796068) 29.36098* (.2161207) 31.04017 (.) 11.004

-19.76 50.07 -9.62 4.32 -7.35 -2.51 -0.46 6.29 -6.39 99.85 134.15 118.42 89.19 131.57 135.85 . .

Market = 1 C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2 INF3 Log likelihood

TABLE 3 Weighted multinomial Logit model with jackknife robust standard errors Market = 1 Coef. t Market = 2 Coef. t

Market = 2

-66.25

INF4

-39.35

INF5

-59.16

Sex

0.08

Age

Coef.

-24.646 (.4233) -23.977 (.6381) -25.565 (.4561) -24.050* (298.313) -25.671 (.4825) 1.977 (.3763) -1.745 (.3833) -5.741 (.3854) 21.855 (.4987) -2.110 (.3318) -2.446 (.2924) 0.837 (.0640) .415 (.0867) .783 (.2189) 0.107* (.3587) 17.416 (.4322) -3.645 (.3517)

-58.22

INF4

-37.57

INF5

-56.05

Sex

-0.08

Age

-53.19

5.25

INC1

-4.55

INC2

-14.90

INC3

43.82

INC4

-6.36

PF1

-8.37

PF2

13.07

PF3

4.79

PF4

3.58

PF5

030

PF6

40.29

PF8

-10.36

constant

-3.635 (.2786) 21.162 (.4137) -2.870 (.1950) -.042 (.0090) -1.998 (.1749) -8.056 (.4807) -4.573 (.2258) -2.050 (.2235) -6.932 (.3836) 25.621 (.3814) 22.681 (.3734) 26.076 (.3689) 26.610 (.5446) 16.517* (307.3844) 25.329 (.3955) 28.457 (.4560) 8.491 (.8963)

-13.05

51.15

-14.71

-4.68

-11.42

-16.76

CRIT1

-20.25

CRIT2

-9.17

CRIT3

-18.07

CRIT4

67.16

MON2

60.74

MON3

70.67

Price

48.86

Variety

0.05

LOY

64.04

INF1

62.40

INF2

9.47

INF3

-22.986 (.3469) -18.925 (.4808) -22.645 (.3828) 16.071* (200.5572) -23.192 (.3508) -18.582 (.3514) -20.684 (.2961) -25.852 (.3686) 1.453 (.4287) -2.121 (.2937) -1.496 (.2394) 0.541 (.0510) .344 (.0393) .269* (.1756) -0.916 (.2278) 17.625 (.3227) -3.846 (.2502)

-56.10

-52.87

INC1

-69.84

INC2

-70.12

INC3

3.39

INC4

-7.22

PF1

-6.25

PF2

10.59

PF3

8.75

PF4

1.53

PF5

-4.02

PF6

54.61

PF8

-15.37

constant 2

-3154.662

Pseudo R 0.4113

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes insignificant in 5% level, t denotes t-statistics

Coef.

-4.091 (.1582) 15.380 (.2446) -1.583 (.1492) .048 (.0067) -1.266 (.1358) -3.207 (.4115) -.0002* (.218) 1.604 (.2173) -1.630 (.3110) 28.177 (.3529) 28.477 (.2871) 28.921 (.3169) 32.296 (.4236) 66.619* (120.3684) 28.703 (.3357) 29.865 (.4010) 16.870 (.6426)

-25.85 62.87 -10.61 7.14 -9.32 -7.80 0.00 7.38 -5.24 79.84 99.16 91.24 76.24 0.55 85.50 74.46 26.25

TABLE 4 Hausman's specification test for the weighted multinomial logit model

C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2

Coefficients (b) partial

Coefficients (B) all

(b-B) Difference

27.582 27.274 -19.141 26.550 19.831 21.965 26.933 -27.040 -0.811 -1.474 -1.2410 -0.334 0.064 2.971 4.271 4.284

25.257 24.948 -9.513 24.588 20.511 22.558 27.073 -12.659 1.069 0.697 -1.164 -0.321 -0.284 1.556 3.631 3.368

2.325 2.325 -9.628 1.962 -0.680 -0.593 -0.140 -14.381 -1.880 -2.172 -0.076 -0.013 0.348 1.414 0.639 0.916

0.098 0.202 1.25E+04 0.103 2.676 2.674 2.667 2.05E+04 2.690 2.687 0.031 0.028 0.091 0.205 0.160 0.192

Coefficients (b) partial INF3 INF4 INF5 Sex Age Id INC1 INC2 INC3 INC4 PF1 PF2 PF3 PF4 PF6 INF2

Test: H0 : difference in coefficients not systematic , Pr = 1.0000 , *Reject H1

-43.365 1.258 -0.027 1.860 -0.763 -1.173 -2.742 1.481 -34.677 -34.536 -34.874 -34.330 -36.476 27.582 27.274 4.284

Coefficients (B) all -29.976 1.333 -0.031 1.112 0.844 0.110 -1.624 1.967 -36.725 -36.620 -36.957 -36.337 -38.0162 25.257 24.948 3.368

(b-B) Difference -13.388 -0.075 0.004 0.747 -1.608 -1.283 -1.118 -0.485 2.044 2.084 2.082 2.006 1.539 2.325 2.325 0.916

3.10E+09 0.064212 0.003141 0.120582 0.154244 0.18215 0.214111 0.185999 0.026322 0.068746 . . 0.037052 0.098427 0.20215 0.192653

TABLE 5 Hausman's specification test for the weighted multinomial logit model

Coefficients Coefficients (b-B) (b) (B) Difference partial all C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2

27.582 27.274 -19.141 26.550 19.831 21.965 26.933 -27.040 -.811 -1.474 -1.241 -.334 .064 2.971 4.271 4.284

25.254 24.948 -9.513 24.588 20.511 22.558 27.073 -12.659 1.069 .697 -1.164 -.321 -.284 1.556 3.631 3.368

2.325 2.325 -9.628 1.962 -.680 -.593 -.140 -14.381 -1.880 -2.172 -.076 -.013 .348 1.414 .639 .9160

.145 .200 . .066 .263 .159 . . .113 .108 . .022 .062 .164 .184 .160

INF3 INF4 INF5 Sex Age Id INC1 INC2 INC3 INC4 PF1 PF2 PF3 PF4 PF6 INF2

Test: H0 : difference in coefficients not systematic , Pr = 0.0000 , *Reject H0

with Huber-White robust standard errors

Coefficients (b) partial

Coefficients (B) all

-43.364 1.258 -.027 1.860 -.763 -1.173 -2.742 1.481 -34.676 -34.536 -34.874 -34.330 -36.476 27.582 27.274 -43.364

-29.976 1.333 -.031 1.112 .844 .110 -1.624 1.967 -36.724 -36.620 -36.957 -36.337 -38.016 25.257 24.948 -29.976

(b-B) Difference -13.388 -.075 .004 .747 -1.608 -1.283 -1.118 -.485 2.044 2.084 2.082 2.006 1.539 2.325 2.325 -13.388

.317 .047 . .113 .103 .190 .254 .145 .150 .162 . .173 . .145 .200 .317

TABLE 6 Hausman's specification test for the weighted multinomial Logit model with Jackknife standard errors Coefficients Coefficients (b-B) (b) (B) Difference partial all C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2

27.58222 24.15519 27.27403 -19.14167 26.55099 19.83164 21.96522 26.93368 -27.04027 -.8112801 -1.474704 -1.241008 -.3347701 .0642479 2.971465 -48.46552

25.25704 22.16708 24.94816 -9.513089 24.588 20.51188 22.55825 27.07383 -12.65906 1.06954 .6975685 -1.16457 -.3212617 -.2840219 1.556478 -33.32116

2.325183 1.988104 2.32587 -9.628579 1.962987 -.680234 -.593033 -.1401496 -14.38121 -1.88082 -2.172272 -.0764379 -.0135083 .3482699 1.414987 -15.14435

.2701009 . .2703368 . .1360996 .2246666 .1694605 .1340384 . .1148009 .1102908 . .0229542 .0649966 .1675352 .

INF3 INF4 INF5 Sex Age Id INC1 INC2 INC3 INC4 PF1 PF2 PF3 PF4 PF6 INF2

Coefficients (b) partial

Coefficients (B) all

(b-B) Difference

INF3 INF4 INF5 Sex Age id INC1 INC2 INC3 INC4 PF1 PF2 PF3 PF4 PF6 INF3

4.271423 4.284463 -43.36495 1.258049 -.0270006 1.86003 -.7636035 -1.173496 -2.742845 1.481673 -34.67769 -34.53645 -34.8748 -38.96175 -82.42541 4.271423

3.631752 3.368385 -29.97649 1.333601 -.0313198 1.112417 .8446702 .1101128 -1.624504 1.967171 -36.72252 -36.62053 -36.95767 -40.52765 -72.06926 3.631752

Test: H0 : difference in coefficients not systematic , Pr = 0.1126 , *Reject H1

.6396704 .9160783 -13.38846 -.0755513 .0043192 .747613 -1.608274 -1.283609 -1.11834 -.485498 2.044834 2.084071 2.082875 1.565903 -10.35615 .6396704

Market = 1 C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2 INF3

Coef. -24.635 (.3857) -23.9114 (.6307) -25.5435 (.4383) -28.4515* (284.75) -25.6378 (.4543) 1.9965 (.3759) -1.726 (.3801) -5.7675 (.3623) 21.8952 (.5212) -2.1460 (.3175) -2.46375 (.2921) .8361 (.0650) .4138 (.009) .7924 (.2149) 0.1179* (.360) 17.3932 (.4343) -3.6455 (.3493)

TABLE 7 MONTE-CARLO SIMULATION Market = 1 Coef. Market = 2 Coef. INF4 INF5 Sex Age id INC1 INC2 INC3 INC4 PF1 PF2 PF3 PF4 PF5 PF6 PF8 constant

-3.6452 (.2741) 21.1507 (.4331) -2.86 (.1940) -.04276 (.0090) -2.00327 (.1818) -8.0812 (.4886) -4.5711 (.2262) -2.033 (.2134) -6.9244 (.3731) 25.6271 (.3897) 22.656 (.3735) 26.06 (.3732) 26.6119 (.5485) 26.4515* (296.0381) 25.3177 (.4018) 28.4231 (.4448) 8.5198 (.8868)

C1 C2 C3 C4 C5 CRIT1 CRIT2 CRIT3 CRIT4 MON2 MON3 Price Variety LOY INF1 INF2 INF3

Note: .(market=3 is the base outcome) , st. errors in parentheses, * denotes insignificant in 5% level.

-22.9653 (.330) -18.8615 (.4767) -22.6257 (.369) 3.852* (201.2064) -23.1686 (.3477) -18.5965 (.3454) -20.6993 (.2911) -25.8888 (.3595) 1.4347 (.4523) -2..1625 (.2904) -1.5261 (.2363) .544 (.053) .3448 (.0397) .2823* (.1715) -.8961 (.2233) 17.602 (.3371) -3.8446 (.2320)

Market = 2 INF4 INF5 Sex Age id INC1 INC2 INC3 INC4 PF1 PF2 PF3 PF4 PF5 PF6 PF8 constant

Coef. -4.085 (.1412) 15.39 (.2485) -1.5663 (.1462) .04814 (.0067) -1.2552 (.1385) -3.2441 (.4225) -.0062* (.2252) 1.6044 (.2132) -1.6485 (.3170) 28.1815 (.36078) 28.4753 (.2908) 28.9281 (.3250) 32.3052 (.4336) 69.009* (120.3516) 28.7143 (.3418) 29.8485 (.4049) 16.8616 (.645)