Simulation study to check the performance of various unequal probability sampling estimators Nadeem Shafique Butt nadeemshafique@hotmail.com College of Statistical and Actuarial Sciences University of the Punjab, Lahore Muhammad Qasier Shahbaz qshahbaz@gmail.com Department of Mathematics COMSATS Institute of Information Technology, Lahore
ABSTRACT A large number of unequal probability sampling estimators are available in literature. Theoretical comparison of available unequal probability sampling estimators is a hard task, so survey statisticians have conducted empirical studies to discuss the performances of various estimators. These empirical comparisons were based on limited number of populations available in the literature on survey sampling. The aim of this paper is to perform a simulation study on various popular unequal probability sampling estimators. This simulation study has been conducted by generating populations with given correlation structure from the Bi-Variate Normal Distribution. The study attempt to obtain a minimum variance estimator in unequal probability sampling for population with specific correlation structure. KEY WORDS : Unequal Probability Sampling, Bi-variate Normal Data, Simulation
1.
INTRODUCTION
Unequal probability sampling has been a popular method of sample selection for estimation of population characteristics. The paper deals with comprehensive comparison of some procedures of unequal probability sampling by Simulation. The available procedures are not mathematical comparable because of complex nature of equations and conditions. Lot of empirical comparisons has been done from time to time to obtain a minimum variance estimator or selection procedure for unequal 1Â Â
probability sampling. These empirical comparisons have been limited to a specific set of populations and no correlation structures have been specified in these empirical comparison. In this article we have carried out a simulation study to search for an optimum selection procedure that can be used with the Horvitz – Thompson (1952) estimator. This simulation study has been carried out by generating random data from bi-variate normal population with specific correlation structure. The methods that are compared are given in the following section.
2.
METHODS USED FOR COMPARISION
In this section the methods used for comparison are given. These methods are selected as they have been widely used in many empirical studies by number of survey statisticians. The methods are listed below: 2.1:
Horvitz-Thompson Estimator (1952) The Horvitz – Thompson (1952) estimator is given as: / Estimate : y HT =∑
i∈ S
Yi πi
(2.1)
N ⎛ y ⎞ / Variance: Var ( yHT (π iπ j − π ij ) ⎜⎜ πyi − π j ⎟⎟ ) = 12 ΣΣ i =1 j =1 j ⎠ ⎝ i j ≠i
2
(2.2)
The Horvitz – Thompson (1952) estimator has been used under following selection procedures: 2.2.1: Yates Grundy (1953) draw by draw Procedure: ⎡
π i = p i ⎢1 + ⎢⎣
pi ⎤ ⎥ 1 − pi ⎥ ⎦
(2.3)
⎡ 1 1 ⎤ + ⎥ ⎣⎢1 − pi 1 − p j ⎦⎥
(2.4)
N
pj
∑ 1− p j =1
− j
π ij = pi p j ⎢
2.2.2: Brewer (1963) Procedure:
π i = 2 pi
π ij =
N 2 pi p j ⎡ 1 pj 1 ⎤ with + = 1 + k ⎥ ⎢ ∑ k ⎢⎣1 − 2 pi 1 − 2 p j ⎥⎦ j =1 1− 2 p j
2
(2.5)
2.2.3: Yates – Grundy Rejective Procedure (1953) πi =
2 pi (1 − pi )
(2.6)
N
1−
∑p
2 j
j =1
π ij =
2 pi p j
(2.7)
N
1−
∑p
2 j
j =1
2.2.4: Shahbaz and Hanif Procedure (2003) πi =
πij = 2.2
pi ⎡ 1 ⎢ + d ⎢1 − pi ⎣
N
N ⎤ pi ⎥;= d =∑ 1− 2 p j ⎥ i =1 1 − 2 pi ⎦
∑ (1− p )(
pj
j =1
j
)
pi p j ⎡ 1 1 + ⎢ d ⎢⎣ (1 − pi )(1 − 2 pi ) 1 − p j 1 − 2 p j
(
)(
)
(2.9)
Murthy (1957) Estimator: Estimate : t symm =
(
Var t symm
)
⎡ yi ⎤ yj 1 (1 − pi )⎥ 1− p j + ⎢ 2 − p i − p j ⎣⎢ p i pj ⎦⎥
(
(
1 N Pi P j 1 − Pi − Pj = 2 i = 1 j = 1 2 − Pi − P j
ΣΣ j ≠i
)
) ⎛⎜ Yi
Yj ⎞ ⋅ − ⎟ ⎜ Pi Pj ⎟ ⎠ ⎝
3
(2.10)
2
The simulation study is given in the following section.
⎤ ⎥ ⎥⎦
(2.8)
(2.11)
3.
SIMULATION STUDY
In this section the results of simulation study have been given. The study has been carried out by drawing random data from bi-variate normal distribution. The random data has been drawn by using various values of correlation coefficient. After drawing the random data; the variance of Horvitz-Thompson Estimator (1952) has been computed under various selection procedures alongside variance of Murthy estimator. The procedure has been replicated various number of times. After this replication the average variance of each method has been computed for various values of correlation coefficient. The results of the study have been given in the table 3.1. Following abbreviations have been used in table 3.1 Vyg
Variance of Yates Grundy (1953) draw by draw procedure
Vr
Variance of Yates Grundy (1953) rejective procedure
Vb
Variance of Brower (1963) procedure
Vs
Variance of Shahbaz and Hanif (2003)
Vm
Variance of Murthy (1957) estimator
Raho Population Correlation Coefficient
4Â Â
Table 3.1: Average Variances of Various Estimators for different number of replicates and different Correlation Structure Replicate Vyg Vb 10 Vr Vm Vs Vyg Vb 25 Vr Vm Vs Vyg Vb 50 Vr Vm Vs Vyg Vb 100 Vr Vm Vs Vyg Vb 250 Vr Vm Vs Vyg Vb 500 Vr Vm Vs Vyg Vb 1000 Vr Vm Vs Vyg Vb 2500 Vr Vm Vs Vyg Vb 5000 Vr Vm Vs Vyg Vb 10000 Vr Vm Vs Vyg Vb Vr 25000 Vm Vs Vyg Vb 50000 Vr Vm Vs
0.1 367814 373242 363146 371720 378142 322869 326943 319415 325910 330631 341313 346060 337277 344594 350346 289818 293382 286812 292543 296622 300222 303650 297332 302834 306763 292662 295813 290013 295015 298685 305239 308630 302379 307752 311712 297927 301263 295116 300398 304298 304084 307459 301237 306574 310529 299232 302605 296388 301757 305671 299785 303119 296975 302269 306150 300857 304207 298033 303354 307254
0.2 314255 316477 312431 315654 318557 257688 260214 255588 259669 262540 276598 278795 274806 278257 280822 271476 274201 269199 273500 276695 282311 284474 280524 283813 286476 274989 277466 272928 276743 279742 276436 278862 274421 278101 281097 277918 280361 275888 279597 282613 276182 278567 274203 277840 280766 275627 278038 273625 277287 280261 277551 280010 275509 279261 282274 277028 279462 275008 278714 281704
0.3 261634 263982 259699 262921 266149 230817 232414 229532 231733 233922 254699 255863 253799 255244 256992 257316 259036 255921 258304 260648 258937 260428 257743 259765 261846 255172 256786 253867 256091 258313 253448 254996 252205 254375 256463 255518 257121 254226 256473 258636 257168 258784 255865 258116 260309 254391 255938 253146 255269 257405 253927 255512 252649 254841 257011 254536 256116 253262 255451 257610
0.4 208861 209528 208400 209056 210238 217205 217912 216683 217440 218645 257192 258298 256336 257662 259379 241046 241583 240684 240934 242171 228193 229203 227416 228570 230198 225084 225681 224666 225111 226312 230203 230899 229700 230312 231618 235577 236327 235032 235726 237094 229185 229870 228694 229279 230579 234400 235090 233905 234473 235804 231460 232170 230949 231566 232902 232143 232853 231631 232253 233586
Rho 0.5 203721 203734 203826 203417 203865 221015 219976 222033 219487 219196 187098 187003 187280 186648 187024 198177 198224 198240 197638 198373 206228 206066 206472 205578 206035 215332 215180 215570 214647 215158 210368 210241 210587 209661 210241 210328 210231 210520 209683 210256 208605 208435 208859 207890 208397 208839 208680 209085 208128 208651 209898 209748 210136 209204 209728 209521 209379 209752 208833 209365
* More shaded circle represent the larger variance 5
0.6 222588 221543 223634 220460 220761 188407 187447 189344 187132 186703 181670 180741 182592 180232 180041 172609 171567 173621 171117 170759 184452 183428 185448 182983 182645 194098 193002 195159 192501 192156 187332 186398 188253 185929 185691 188142 187211 189059 186718 186507 188005 187049 188944 186544 186323 187887 186939 188821 186428 186220 187516 186512 188499 186012 185744 187660 186676 188625 186173 185925
0.7 155648 154544 156703 154251 153684 157279 155622 158841 155044 154299 158427 156402 160295 155995 154747 164012 162354 165557 161914 161005 166560 164680 168303 164203 163146 165507 163766 167131 163251 162348 164127 162338 165790 161869 160884 166078 164328 167708 163853 162907 167797 166012 169458 165516 164561 167005 165229 168658 164747 163786 165986 164196 167651 163717 162740 166735 164940 168405 164456 163480
0.8 146938 145238 148521 144740 143848 158859 156558 160969 155907 154644 155054 152696 157224 152052 150742 153776 150788 156489 150374 148297 148726 146109 151109 145622 143932 150412 147704 152871 147249 145446 145648 143169 147909 142693 141109 145403 142804 147772 142317 140640 144873 142305 147211 141831 140169 145727 143134 148090 142662 140977 145536 142944 147897 142470 140788 145375 142779 147739 142306 140620
0.9 128165 124730 131255 124219 121835 112280 108731 115476 108379 105751 123453 120007 126555 119548 117105 130060 126860 132939 126232 124180 128352 124901 131458 124408 121995 125342 121854 128482 121376 118919 125993 122559 129083 122072 119671 125413 122030 128460 121536 119183 125584 122196 128633 121713 119345 125005 121637 128037 121155 118803 124950 121572 127991 121092 118730 125122 121739 128168 121258 118893
1 127014 121554 131881 121104 116948 121020 116633 124943 115901 112912 105956 101695 109763 101196 98089 110205 105909 114038 105375 102277 108019 103788 111800 103262 100203 104594 100526 108229 100004 97079 102516 98435 106162 97955 94978 104538 100419 108217 99924 96930 105463 101291 109190 100794 97756 104743 100623 108425 100118 97132 106295 102125 110021 101607 98593 105320 101168 109029 100662 97651
4.
CONCLUSIONS
The results of the simulation study have been given in section 3 of the article. The table 3.1 contains the average variance of various selection procedures under different correlation structure. The table has been constructed by using various numbers of replicates from the bivariate normal distribution. From the table we can see that Yates– Grundy (1953) rejective procedure outperform other selection procedures involved in the study for populations having correlation between 0.1 to 0.5. The procedure given by Shahbaz–Hanif (2003) outperform other procedures involved in the study for populations having correlation of 0.6 to 1.0, and this procedure is closely followed by the Murthy (1957) estimator. The other procedures do not perform batter. From the table 3.1; we can, therefore, conclude that the Yates–Grundy (1953) rejective procedure is a batter choice for estimation of population total by using the Horvitz– Thompson (1952) estimator for populations having a correlation coefficient of below 0.5. The Shahbaz – Hanif (2003) procedure is better for populations having high correlation; that is correlation of greater than or equal to 0.6. REFERENCE
1.
Brewer, K. R. W. (1963) “A model of systematic sampling with unequal probabilities”, Aust. J. Stat. 5, 5 – 13.
2.
Horvitz, D. G. and Thompson, D. J. (1952) “A generalization of sampling without replacement from a finite universe”, J. Amer. Stat. Assoc. 47, 663 – 685.
3.
Murthy, M. N. (1957) “Ordered and unordered estimators in sampling without replacement”, Sankhya, 18, 379 – 390.
4.
Shahbaz, M. Q. and Hanif, M. (2003) “A simple procedure for unequl probability sampling without replacement and a sample of size 2”, Pak. J. Stat. 19(1), 151- 160
5.
Yates, F. and Grundy, P. M. (1953) “Selection without replacement from within strata with probability proportional to size”, J. Roy. Stat. Soc., B, 15, 153 – 161.
6