Sampling Strategies for Finite Population Using Auxiliary Information
A Generalized Family Of Estimators For Estimating Population Mean Using Two Auxiliary Attributes 1Sachin
Malik, †1Rajesh Singh and 2Florentin Smarandache 1
Department of Statistics, Banaras Hindu University Varanasi-221005, India
2
Chair of Department of Mathematics, University of New Mexico, Gallup, USA †Corresponding author, rsinghstat@gmail.com
Abstract This paper deals with the problem of estimating the finite population mean when some information on two auxiliary attributes are available. A class of estimators is defined which includes the estimators recently proposed by Malik and Singh (2012), Naik and Gupta (1996) and Singh et al. (2007) as particular cases. It is shown that the proposed estimator is more efficient than the usual mean estimator and other existing estimators. The study is also extended to two-phase sampling. The results have been illustrated numerically by taking empirical population considered in the literature.
Keywords
Simple random sampling, two-phase sampling, auxiliary attribute, point biserial correlation, phi correlation, efficiency.
1. Introduction There are some situations when in place of one auxiliary attribute, we have information on two qualitative variables. For illustration, to estimate the hourly wages we can use the information on marital status and region of residence (see Gujrati and Sangeetha (2007), page-311). Here we assume that both auxiliary attributes have significant point biserial correlation with the study variable and there is significant phi-correlation (see Yule (1912)) between the auxiliary attributes. The use of auxiliary information can increase the precision of an estimator when study variable Y is highly correlated with auxiliary variables X. In survey sampling, auxiliary variables are present in form of ratio scale variables (e.g. income, output, prices, costs, height and temperature) but sometimes may present in the form of qualitative or nominal scale such as sex, race, color, religion, nationality and geographical region. For example, female workers are found to earn less than their male counterparts do or non-white workers are found to earn less than whites (see Gujrati and Sangeetha (2007), page 304). Naik and Gupta (1996) introduced a ratio estimator when the study variable and the auxiliary attribute are positively correlated. Jhajj et al. (2006) suggested a family of estimators for the population mean in single and two-phase sampling when the study variable 9
Rajesh Singh ■ Florentin Smarandache (editors)
and auxiliary attribute are positively correlated. Shabbir and Gupta (2007), Singh et al. (2008), Singh et al. (2010) and Abd-Elfattah et al. (2010) have considered the problem of estimating population mean Y taking into consideration the point biserial correlation between auxiliary attribute and study variable.
2. Some Estimators in Literature In order to have an estimate of the study variable y, assuming the knowledge of the population proportion P, Naik and Gupta (1996) and Singh et al. (2007) respectively, proposed following estimators:
P t 1 y 1 p1
(2.1)
p t 2 y 2 P2
(2.2)
P p 1 1 t 3 y exp P p 1 1
(2.3)
p P2 t 4 y exp 2 p 2 P2
(2.4)
The Bias and MSE expression’s of the estimator’s t i (i=1, 2, 3, 4) up to the first order of approximation are, respectively, given by
Bt 1 Yf1C 2p1 1 K pb1
Bt 2 Yf1K pb2 Cp
(2.5)
2
Bt 3 Yf1
Bt 4 Yf1
C 2p
(2.6)
2
1 K pb 2 2 4
C 2p
2
(2.7)
1 K pb 2 2 4 2
MSE t 1 Y f1 C 2y C 2p1 1 2K pb1 2
MSE t 2 Y f1 C 2y C 2p1 1 2K pb 2 2
(2.8)
(2.9) (2.10) 10
Sampling Strategies for Finite Population Using Auxiliary Information 2 1 MSE t 3 Y f1 C 2y C 2p1 K pb2 4
(2.11)
2 1 MSE t 4 Y f1 C 2y C 2p2 K pb2 4
(2.12) 2
1 1 1 N , S2 j where, f1 ji Pj , n N N 1 i1 pb j
S y j S y S j
, Cy
K pb1 pb1
s 12
Cy C p1
Sy Y
, Cp j
S j Pj
Sy j
1 N yi Y ji Pj , N 1 i1
; ( j 1,2),
, K pb 2 pb 2
Cy Cp 2
.
s 1 n 1i p1 2i p 2 and 12 be the sample phi-covariance and phi n 1 i 1 s 1 s 2
correlation between 1 and 2 respectively, corresponding to the population phi-covariance and phi-correlation S12
and
S12 S1 S2
1 N 1i P1 2i P2 N 1 i 1
.
Malik and Singh (2012) proposed estimators t5 and t6 as
P t 5 y 1 p1
1
P2 p2
2
(2.13) 1
P p1 p P2 exp 2 t 6 y exp 1 P1 p1 p 2 P2
2
(2.14)
where 1 , 2 , 1 and 2 are real constants. The Bias and MSE expression’s of the estimator’s t 5 and t 6 up to the first order of approximation are, respectively, given by 2 1 2 2 2 1 2 2 B( t 5 ) Yf1 C p 1k pb1 C p 2 k pb 2 1 2 k 1 2 2 2 2 2
11
(2.15)
Rajesh Singh ■ Florentin Smarandache (editors)
2 1 2 2 1 2 2 2 B( t 6 ) Yf1 C p K pb1 C p K pb 2 1 2 K 2 4 2 2 4 1 4 2
MSE(t 5 ) Y f 1 C2y C2p1 12 21K pb1 C2p2 22 2 2 K pb2 21 2 K
(2.16)
(2.17)
2 β2 β2 β β MSE(t 6 ) Y f1 C 2y C 2p1 1 β1K pb1 C 2p2 2 1 2 K φ β 2 K pb1 2 4 4
(2.18)
3. The Suggested Class of Estimators Using linear combination of t i i 0,1,2, we define an estimator of the form 3
t p witi H i 0
(3.1)
3
Such that, w i 1 and w i R
(3.2)
i 0
Where, α1
L P L 2 L 3 P2 L 4 t 0 y , t1 y 1 1 L1p1 L 2 L 3 p 2 L 4
α2
β1
(L P L 6 ) (Lp1 L 6 ) (L7 p 2 L 6 ) (L7 P2 L8 ) and t 2 exp 5 1 exp (L1P1 L 2 ) (L5 p1 L 6 ) (L7 p 2 L 2 ) (L7 P2 L8 )
β2
where w i i 0,1,2 denotes the constants used for reducing the bias in the class of estimators, H denotes the set of those estimators that can be constructed from t i i 0,1,2 and R denotes the set of real numbers (for detail see Singh et. al (2008)). Also, Li i 1,2,...,8 are either real numbers or the functions of the known parameters of the auxiliary attributes. Expressing tp in terms of e’s, we have α1 α2 w 0 w1 1 φ1e1 1 φ 2e2 1 β1 t p Y 1 e0 w 2exp θ1e1 1 θ1e1 exp θ e 1 θ e 1 β2 2 2 2 2
where,
12
(3.3)
Sampling Strategies for Finite Population Using Auxiliary Information
L 3 P2 φ2 L 3 P1 L 4 L 5 P1 θ1 2L 5 P2 L 6 L 7 P2 θ2 2L 7 P2 L 8 φ1
L1P1 L1P1 L 2
After expanding, Subtracting Y from both sides of the equation (3.3) and neglecting the term having power greater than two, we have
t
Y Ye 0 w 1 α1φ1e1 α 2 φ 2 e 2 w 2 β1θ1e1 β 2θ 2 e 2
p
(3.4) Squaring both sides of (3.4) and then taking expectations, we get MSE of the estimator t p up to the first order of approximation, as
MSE t p Y f w 12 T1 w 22 T2 2w1 w 2 T3 2w1T4 2w 2 T5 2
(3.5)
where,
w1
L 2 L 4 L3L5 L1L 2 L23
w2
L1L 5 L 3 L 4 L1L 2 L23
(3.6)
and
L 2 θ12β12 c 2p1 θ 22β 22 c 2p1 2β1β 2 φ1θ 2 k φ C 2p2 L 3 α1β1θ1C 2p1 α 2β 2 θ 2 C 2p2 α 2β1φ 2 θ1k φ C 2p2 α1φ1θ 2β 2 k φ C 2p2 L 4 α1φ1k pb1 C 2p1 α 2 φ 2 k pb2 C 2p2 2 2 L 5 β1θ1k pb1 C p1 β 2 θ 2 k pb2 C p2 L1 φ12 α12 C 2p1 φ 22 α 22 C 2p2 2α1α 2 φ1φ 2 k φ C 2p2
13
(3.7)
Rajesh Singh ■ Florentin Smarandache (editors)
4. Empirical Study Data: (Source: Government of Pakistan (2004))
The population consists rice cultivation areas in 73 districts of Pakistan. The variables are defined as: Y= rice production (in 000’ tonnes, with one tonne = 0.984 ton) during 2003,
P1 = production of farms where rice production is more than 20 tonnes during the year 2002, and P2 = proportion of farms with rice cultivation area more than 20 ha during the year 2003. For this data, we have N=73, Y =61.3, P1 =0.4247, P2 =0.3425, S 2y =12371.4, S 21 =0.225490, S22 =0.228311,
pb1 =0.621, pb2 =0.673, =0.889. Table 4.1: PRE of different estimators of Y with respect to y . CHOICE OF SCALERS, when w 0 0 w 1 1 w 2 0 L3
L4
PRE’S
1
0
179.77
α1
α2
0
1
1
0
1
0
1
1
1
1
1
1
156.28
-1
1
1
0
1
0
112.97
1
1
C p1
pb1
C p2
pb2
178.10
1
1
NP1
K pb1
NP2
K pb 2
110.95
-1
1
NP1
f
NP2
f
112.78
-1
1
N
K pb1
N
K pb 2
112.68
-1
1
NP1
P1
NP2
P2
112.32
1
1
n
P1
n
P2
115.32
-1
1
N
pb1
N
pb2
112.38
-1
1
n
P1
n
P2
113.00
-1
1
N
P1
N
P2
112.94
L1
L2
162.68
When, w 0 0 w1 0 w 2 1 14
Sampling Strategies for Finite Population Using Auxiliary Information
β1
β2
L5
L6
L7
L8
PRE’S
1
0
1
0
1
0
141.81
0
1
1
0
1
0
60.05
1
-1
1
0
1
0
180.50
1
-1
1
1
1
1
127.39
1
-1
1
1
1
0
170.59
1
-1
C p1
pb1
C p2
pb2
143.83
1
-1
NP1
K pb1
NP2
K pb 2
179.95
1
-1
NP1
f
NP2
f
180.52
1
-1
N
K pb1
N
K pb 2
180.56
1
-1
NP1
P1
NP2
P2
180.53
1
-1
n
P1
n
P2
179.49
1
-1
N
pb1
N
pb2
180.55
1
-1
n
P1
n
P2
180.36
1
-1
N
P1
N
P2
180.57
When, w 0 0 w1 0 w 2 1 also Li i 1,2,...,8 1
PREt p =183.60
α 1 α 2 β1 β 2 1
5. Double Sampling It is assumed that the population proportion P1 for the first auxiliary attribute 1 is unknown but the same is known for the second auxiliary attribute 2 . When P1 is unknown, it is some times estimated from a preliminary large sample of size n on which only the attribute 1 is measured. Then a second phase sample of size n (n< n ) is drawn and Y is observed. Let pj
1 n ji , ( j 1,2). n i 1
The estimator’s t1, t2, t3 and t4 in two-phase sampling take the following form
p' t d1 y 1 p1
(5.1) 15
Rajesh Singh ■ Florentin Smarandache (editors)
P t d 2 y 2' p2
(5.2)
p ' p1 t d 3 y exp 1' p p 1 1
(5.3)
td4
p '2 P2 y exp ' p 2 P2
(5.4)
The bias and MSE expressions of the estimators td1, td2, td3 and td4 up to first order of approximation, are respectively given as
Bt d 1 Yf 3 C 2p1 1 k pb1
Bt d 2 Yf 2 C 2p 2 1 K pb 2 Bt d 3 Yf 3
(5.6)
1 K
C 2p 2
Bt d 4 Yf 3
(5.5)
pb 2
4
C 2p 2 4
(5.7)
1 K pb 2
(5.8)
MSE t d1 Y f1C 2y f 3 C 2P1 1 2K pb1 2
MSE t d 2 Y f1C 2y f 2 C 2p 2 1 2K kp2 2
2 C 2p1 1 4K pb1 MSE t d 3 Y f1C y f 3 4
2
2
MSE t d 4 Y f1C 2y f 3
(5.10)
(5.11)
1 4K pb1 4
C 2p
(5.9)
1
(5.12)
where, 2
!
2
n 1 n ji p j , S' j 2 '1 ji p 'j , S n 1 i 1 n 1 i 1 2 J
f2
1 n
'
1 , N
f3
1 1 . n n' 16
Sampling Strategies for Finite Population Using Auxiliary Information
The estimator’s t5 and t6, in two-phase sampling, takes the following form
p' t d5 y 1 p 1
t d6
m1
P2 p' 2
p1' p1 y exp p' p 1 1
n1
m2
(5.13)
p '2 P2 exp p' P 2 2
n2
(5.14)
Where m1 , m 2 , n1 and n 2 are real constants. The Bias and MSE expression’s of the estimator’s t d5 and t d6 up to the first order of approximation are, respectively, given by
m2 m m2 m Bt d5 Y f 3C 2p1 1 1 m1K pb1 f 2 C 2P2 2 2 m 2 k pb2 2 2 2 2
(5.15)
n2 n n n2 n n Bt d 6 Y f 3 1 1 1 K pb1 C 2p1 f 2 2 2 2 K pb2 8 2 8 2 8 8
MSE t d 5 Y f1C 2y f 3C 2p1 m12 2m1K pb1 f 2 C 2p2 2 m 22 2m 2 K pb2
(5.16)
(5.17)
n2 n2 2 MSE t d 6 Y f 1C 2y f 3 1 n 1 K pb1 C 2p1 f 2 2 n 2 K pb 2 C 2p 2 4 4
(5.18)
6. Estimator tpd in Two-Phase Sampling Using linear combination of t di i 0,1,2, we define an estimator of the form 3
t pd h i t di H i 0
3
Such that,
h i 0
i
(6.1)
1 and h i R
(6.2)
where, m1
L p' L L P L 4 t 0 y , t d1 y 1 1 2 3 2 L1p1 L 2 L 3 p' 2 L 4
m2
n1
and t d2
(L p' L ) (Lp1 L 6 ) (L7 p' 2 L 6 ) (L7 P2 L 8 ) exp 5 1 6 exp (L1p'1 L 2 ) (L5 p1 L 6 ) (L7 p' 2 L 2 ) (L7 P2 L 8 )
n2
where h i i 0,1,2 denotes the constants used for reducing the bias in the class of estimators,
H denotes the set of those estimators that can be constructed from t di i 0,1,2 and R 17
Rajesh Singh ■ Florentin Smarandache (editors)
denotes the set of real numbers (for detail see Singh et. al. (2008)). Also, Li i 1,2,...,8 are either real numbers or the functions of the known parameters of the auxiliary attributes. Expressing tpd in terms of e’s, we have
t p Y1 e 0 h 0 h 1 1 φ1e'1 1 1 φ1e1
m1
m
1 φ 2 e'2 -m
2
h 2 exp θ1 e'1 e1 1 θ1 e'1 e1
1 n1
expθ 2 e'2 1 θ 2 e'2
n2
(6.3)
After expanding, subtracting Y from both sides of the equation (6.3) and neglecting the terms having power greater than two, we have
t
pd
Y Ye 0 h 1 m1φ1e'1 m1φ1e1 m 2 φ 2 e'2 h 2 n 1θ1e'1 n 1θ1e1 n 2 θ 2 e'2
(6.4) Squaring both sides of (6.4) and then taking expectations, we get MSE of the estimator t p up to the first order of approximation, as
MSE t pd Y h12 R 1 h 22 R 2 2h1h 2 R 3 2h1R 4 2h 2 R 5 2
(6.5)
where,
h1
R 2 R 4 R 3R 5 R 1R 2 R 32
h2
R 1R 5 R 3 R 4 R 1R 2 R 32
(6.6)
and
R 2 θ12 n 12 f 3C 2p1 θ 22 n 22 f 2 C 2p2 R 3 m 2 n 2 f 2 φ 2 θ 2 C 2p2 - n 1m1φ1θ1f 2 k φ C 2p1 R 4 m1φ1f 3 k pb1 C 2p1 m 2 φ 2 f 2 k pb2 C 2p2 R 5 n 1θ1f 3 k pb1 C 2p1 n 2 θ 2 f 2 k pb2 C 2p2 R 1 φ12 m12 f 3C 2p1 φ 22 m 22 f 2 C 2p2
(6.7)
Data: (Source: Singh and Chaudhary (1986), p. 177).
The population consists of 34 wheat farms in 34 villages in certain region of India. The variables are defined as: y = area under wheat crop (in acres) during 1974.
p1 = proportion of farms under wheat crop which have more than 500 acres land during 1971. and 18
Sampling Strategies for Finite Population Using Auxiliary Information
p 2 = proportion of farms under wheat crop which have more than 100 acres land during 1973. For this data, we have N=34, Y =199.4, P1 =0.6765, P2 =0.7353, S 2y =22564.6, S 21 =0.225490, S22 =0.200535,
pb1 =0599, pb2 =0.559, =0.725. Table 6.1: PRE of different estimators of Y with respect to y CHOICE OF SCALERS, when h 0 0 h1 1 h 2 0 L3 m1 m2 L2 L4 L1 0 1 1 1 1
1 0 1 1 1
1 1 1 C p1
1
1
1
1
1
1
1
1
1
PRE’S
1
0
0 1 0 pb1
1 1 C p2
1 0 pb2
108.16 121.59 142.19 133.40 144.78
NP1
K pb1
NP2
K pb 2
136.90
NP1 N
f
NP2 N
f
133.30
K pb 2
135.73
P2
137.09
P2
138.23
K pb1
P1
1
NP1 n
P1
NP2 n
1
1
N
pb1
N
pb2
135.49
1
1
n
P1
n
P2
138.97
1
1
N
P1
N
P2
135.86
L6
L7
L8
PRE’S
When, h 0 0 h1 0 h 2 1 L5 n2 n1 1 0 1 1 1 1
0 -1 -1 -1 -1 -1
1 1 1 1 1 C p1
0 0 0 1 1 pb1
1 1 1 1 1 C p2
0 0 0 1 0 pb2
130.89 108.93 146.63 121.68 127.24 123.43
1
-1
NP1
K pb1
NP2
K pb 2
145.49
1
-1
f
146.57
-1
NP2 N
f
1
NP1 N
K pb 2
145.84
1
-1
P2
145.43
1
-1
P2
145.03
NP1 n
K pb1
P1 P1
NP2 n 19
Rajesh Singh ■ Florentin Smarandache (editors)
1
-1
N
pb1
N
pb2
145.92
1
-1
n
P1
n
P2
144.85
1
-1
N
P1
N
P2
145.80
When, h 0 0 h1 0 h 2 1 also Li i 1,2,...,8 1 m1 m 2 n 1 n 2 1 PREt pd =154.28
7. Conclusion In this paper, we have suggested a class of estimators in single and two-phase sampling by using point bi serial correlation and phi correlation coefficient. From Table 4.1 and Table 6.1, we observe that the proposed estimator tp and tpd performs better than other estimators considered in this paper.
References 1. Abd-Elfattah, A.M. El-Sherpieny, E.A. Mohamed, S.M. Abdou, O. F., 2010, Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Appl. Mathe. and Compt. doi:10.1016/j.amc.2009.12.041 2. Government of Pakistan, 2004, Crops Area Production by Districts (Ministry of Food, Agriculture and Livestock Division, Economic Wing, Pakistan). 3. Gujarati, D. N. and Sangeetha, 2007, Basic econometrics. Tata McGraw – Hill. 4. Jhajj, H.S., Sharma, M.K. and Grover, L.K., 2006 , A family of estimators of population mean using information on auxiliary attribute. Pak. Journ. of Stat., 22(1), 43-50. 5. Malik, S. And Singh, R. ,2012, A Family Of Estimators Of Population Mean Using Information On Point Bi-Serial And Phi-Correlation Coefficient. Intern. Jour. Stat. And Econ. (accepted). 6. Naik,V.D and Gupta, P.C., 1996, A note on estimation of mean with known population proportion of an auxiliary character. Jour. Ind. Soc. Agri. Stat., 48(2), 151-158. 7. Shabbir, J. and Gupta, S., 2007, On estimating the finite population mean with known population proportion of an auxiliary variable. Pak. Journ. of Stat., 23 (1), 1-9. 8. Singh, D. and Chaudhary, F. S., 1986, Theory and Analysis of Sample Survey Designs (John Wiley and Sons, NewYork). 9. Singh, R., Cauhan, P., Sawan, N. and Smarandache, F., 2007, Auxiliary information and a priori values in construction of improved estimators. Renaissance High press. 10. Singh, R. Chauhan, P. Sawan, N. Smarandache, F., 2008, Ratio estimators in simple random sampling using information on auxiliary attribute. Pak. J. Stat. Oper. Res. 4(1) 47–53. 11. Singh, R., Kumar, M. and Smarandache, F., 2010, Ratio estimators in simple random sampling when study variable is an attribute. WASJ 11(5): 586-589. 12. Yule, G. U., 1912, On the methods of measuring association between two attributes. Jour. of The Royal Soc. 75, 579-642.
20