A Generalized Family of Estimators for Estimating Population Mean Using Two Auxiliary Attributes by Florentin Smarandache

Sachin Malik, Rajesh Singh Department of Statistics, Banaras Hindu University Varanasi-221005, India

Florentin Smarandache University of New Mexico, Gallup, USA

A Generalized Family of Estimators for Estimating Population Mean Using Two Auxiliary Attributes

Published in: Rajesh Singh, F. Smarandache (Editors) SAMPLING STRATEGIES FOR FINITE POPULATION USING AUXILIARY INFORMATION The Educational Publisher, Columbus, USA, 2015 ISBN 978-1-59973-348-7 pp. 9 - 20

Sampling Strategies for Finite Population Using Auxiliary Information

Abstract This paper deals with the problem of estimating the finite population mean when some information on two auxiliary attributes are available. A class of estimators is defined which includes the estimators recently proposed by Malik and Singh (2012), Naik and Gupta (1996) and Singh et al. (2007) as particular cases. It is shown that the proposed estimator is more efficient than the usual mean estimator and other existing estimators. The study is also extended to two-phase sampling. The results have been illustrated numerically by taking empirical population considered in the literature.

Keywords

Simple random sampling, two-phase sampling, auxiliary attribute, point biserial correlation, phi correlation, efficiency.

1. Introduction There are some situations when in place of one auxiliary attribute, we have information on two qualitative variables. For illustration, to estimate the hourly wages we can use the information on marital status and region of residence (see Gujrati and Sangeetha (2007), page-311). Here we assume that both auxiliary attributes have significant point biserial correlation with the study variable and there is significant phi-correlation (see Yule (1912)) between the auxiliary attributes. The use of auxiliary information can increase the precision of an estimator when study variable Y is highly correlated with auxiliary variables X. In survey sampling, auxiliary variables are present in form of ratio scale variables (e.g. income, output, prices, costs, height and temperature) but sometimes may present in the form of qualitative or nominal scale such as sex, race, color, religion, nationality and geographical region. For example, female workers are found to earn less than their male counterparts do or non-white workers are found to earn less than whites (see Gujrati and Sangeetha (2007), page 304). Naik and Gupta (1996) introduced a ratio estimator when the study variable and the auxiliary attribute are positively correlated. Jhajj et al. (2006) suggested a family of estimators for the population mean in single and two-phase sampling when the study variable 9

Rajesh Singh ■ Florentin Smarandache (editors)

and auxiliary attribute are positively correlated. Shabbir and Gupta (2007), Singh et al. (2008), Singh et al. (2010) and Abd-Elfattah et al. (2010) have considered the problem of estimating population mean Y taking into consideration the point biserial correlation between auxiliary attribute and study variable.

2. Some Estimators in Literature In order to have an estimate of the study variable y, assuming the knowledge of the population proportion P, Naik and Gupta (1996) and Singh et al. (2007) respectively, proposed following estimators:

P  t 1  y 1   p1 

(2.1)

p  t 2  y 2   P2 

(2.2)

P p  1 1 t 3  y exp P p   1 1

(2.3)

 p  P2   t 4  y exp 2  p 2  P2 

(2.4)

The Bias and MSE expression’s of the estimator’s t i (i=1, 2, 3, 4) up to the first order of approximation are, respectively, given by



Bt 1   Yf1C 2p1 1  K pb1

Bt 2   Yf1K pb2 Cp



(2.5)

Bt 3   Yf1

Bt 4   Yf1

C 2p

(2.6)

1   K pb 2   2 4 

C 2p

(2.7)

1   K pb 2   2 4  2













MSE t 1   Y f1 C 2y  C 2p1 1  2K pb1 2

MSE t 2   Y f1 C 2y  C 2p1 1  2K pb 2 2

(2.8)

(2.9) (2.10) 10

Sampling Strategies for Finite Population Using Auxiliary Information 2  1  MSE t 3   Y f1 C 2y  C 2p1   K pb2  4  

(2.11)

2  1  MSE t 4   Y f1 C 2y  C 2p2   K pb2  4  

(2.12) 2

1 1 1 N , S2 j  where, f1    ji  Pj  , n N N  1 i1  pb j 

S y j S y S j

, Cy 

K pb1   pb1

s 12 

Cy C p1

Sy Y

, Cp j 

S j Pj

Sy j 





1 N  yi  Y  ji  Pj , N  1 i1

; ( j  1,2),

, K pb 2   pb 2

Cy Cp 2

s 1 n 1i  p1  2i  p 2  and    12 be the sample phi-covariance and phi n  1 i 1 s 1 s 2

correlation between 1 and  2 respectively, corresponding to the population phi-covariance and phi-correlation S12 

and   

S12 S1 S2

1 N  1i  P1  2i  P2  N  1 i 1

Malik and Singh (2012) proposed estimators t5 and t6 as

P  t 5  y  1   p1 

1

 P2   p2

  

2

(2.13) 1

 P  p1   p  P2   exp 2  t 6  y exp 1  P1  p1   p 2  P2 

2

(2.14)

where 1 ,  2 , 1 and  2 are real constants. The Bias and MSE expression’s of the estimator’s t 5 and t 6 up to the first order of approximation are, respectively, given by   2 1   2 2  2 1 2 2  B( t 5 )  Yf1 C p    1k pb1   C p     2 k pb 2  1 2 k     1 2  2 2 2    2   

(2.15)

Rajesh Singh ■ Florentin Smarandache (editors)

  2 1   2   2  1 2  2 2  B( t 6 )  Yf1 C p  K pb1  C p  K pb 2  1 2 K     2  4 2 2 4  1  4    2









MSE(t 5 )  Y f 1 C2y  C2p1 12  21K pb1  C2p2  22  2 2 K pb2  21 2 K 

(2.16)



(2.17)

2   β2   β2 β β  MSE(t 6 )  Y f1 C 2y  C 2p1  1  β1K pb1   C 2p2  2  1 2 K φ  β 2 K pb1  2  4   4  

(2.18)

3. The Suggested Class of Estimators Using linear combination of t i i  0,1,2, we define an estimator of the form 3

t p  witi  H i 0

(3.1)

Such that,  w i  1 and w i  R

(3.2)

i 0

Where, α1

 L P  L 2   L 3 P2  L 4  t 0  y , t1  y  1 1     L1p1  L 2   L 3 p 2  L 4 

α2

β1

 (L P  L 6 )  (Lp1  L 6 )   (L7 p 2  L 6 )  (L7 P2  L8 )  and t 2  exp 5 1  exp   (L1P1  L 2 )  (L5 p1  L 6 )   (L7 p 2  L 2 )  (L7 P2  L8 ) 

β2

where w i i  0,1,2 denotes the constants used for reducing the bias in the class of estimators, H denotes the set of those estimators that can be constructed from t i i  0,1,2 and R denotes the set of real numbers (for detail see Singh et. al (2008)). Also, Li i  1,2,...,8 are either real numbers or the functions of the known parameters of the auxiliary attributes. Expressing tp in terms of e’s, we have  α1  α2    w 0  w1 1  φ1e1  1  φ 2e2     1 β1 t p  Y 1  e0    w 2exp θ1e1 1  θ1e1     exp θ e 1  θ e 1 β2  2 2 2 2  





where,

(3.3)

Sampling Strategies for Finite Population Using Auxiliary Information

   L 3 P2  φ2  L 3 P1  L 4   L 5 P1  θ1  2L 5 P2  L 6    L 7 P2  θ2  2L 7 P2  L 8  φ1 

L1P1 L1P1  L 2

After expanding, Subtracting Y from both sides of the equation (3.3) and neglecting the term having power greater than two, we have

t



 Y  Ye 0  w 1 α1φ1e1  α 2 φ 2 e 2   w 2 β1θ1e1  β 2θ 2 e 2 

(3.4) Squaring both sides of (3.4) and then taking expectations, we get MSE of the estimator t p up to the first order of approximation, as



MSE t p   Y f w 12 T1  w 22 T2  2w1 w 2 T3  2w1T4  2w 2 T5 2

 (3.5)

where,

w1 

L 2 L 4  L3L5 L1L 2  L23

w2 

L1L 5  L 3 L 4 L1L 2  L23

     

(3.6)

and

  L 2  θ12β12 c 2p1  θ 22β 22 c 2p1  2β1β 2 φ1θ 2 k φ C 2p2   L 3  α1β1θ1C 2p1  α 2β 2 θ 2 C 2p2  α 2β1φ 2 θ1k φ C 2p2  α1φ1θ 2β 2 k φ C 2p2   L 4  α1φ1k pb1 C 2p1  α 2 φ 2 k pb2 C 2p2   2 2 L 5  β1θ1k pb1 C p1  β 2 θ 2 k pb2 C p2    L1  φ12 α12 C 2p1  φ 22 α 22 C 2p2  2α1α 2 φ1φ 2 k φ C 2p2

(3.7)

Rajesh Singh ■ Florentin Smarandache (editors)

4. Empirical Study Data: (Source: Government of Pakistan (2004))

The population consists rice cultivation areas in 73 districts of Pakistan. The variables are defined as: Y= rice production (in 000’ tonnes, with one tonne = 0.984 ton) during 2003,

P1 = production of farms where rice production is more than 20 tonnes during the year 2002, and P2 = proportion of farms with rice cultivation area more than 20 ha during the year 2003. For this data, we have N=73, Y =61.3, P1 =0.4247, P2 =0.3425, S 2y =12371.4, S 21 =0.225490, S22 =0.228311,

 pb1 =0.621,  pb2 =0.673,   =0.889. Table 4.1: PRE of different estimators of Y with respect to y . CHOICE OF SCALERS, when w 0  0 w 1  1 w 2  0 L3

PRE’S

179.77

α1

α2

156.28

-1

112.97

C p1

 pb1

C p2

 pb2

178.10

NP1

K pb1

NP2

K pb 2

110.95

-1

NP1

NP2

112.78

-1

K pb1

K pb 2

112.68

-1

NP1

NP2

112.32

115.32

-1

 pb1

 pb2

112.38

-1

113.00

-1

112.94

162.68

When, w 0  0 w1  0 w 2  1 14

Sampling Strategies for Finite Population Using Auxiliary Information

β1

β2

PRE’S

141.81

60.05

-1

180.50

-1

127.39

-1

170.59

-1

C p1

 pb1

C p2

 pb2

143.83

-1

NP1

K pb1

NP2

K pb 2

179.95

-1

NP1

NP2

180.52

-1

K pb1

K pb 2

180.56

-1

NP1

NP2

180.53

-1

179.49

-1

 pb1

 pb2

180.55

-1

180.36

-1

180.57

When, w 0  0 w1  0 w 2  1 also Li i  1,2,...,8  1

PREt p  =183.60

α 1  α 2  β1  β 2  1

5. Double Sampling It is assumed that the population proportion P1 for the first auxiliary attribute 1 is unknown but the same is known for the second auxiliary attribute  2 . When P1 is unknown, it is some times estimated from a preliminary large sample of size n  on which only the attribute 1 is measured. Then a second phase sample of size n (n< n  ) is drawn and Y is observed. Let pj 

1 n   ji , ( j  1,2). n i 1

The estimator’s t1, t2, t3 and t4 in two-phase sampling take the following form

 p'  t d1  y 1   p1 

(5.1) 15

Rajesh Singh ■ Florentin Smarandache (editors)

P  t d 2  y 2'   p2 

(5.2)

 p '  p1   t d 3  y exp 1' p  p 1   1

(5.3)

td4

 p '2  P2    y exp '  p 2  P2 

(5.4)

The bias and MSE expressions of the estimators td1, td2, td3 and td4 up to first order of approximation, are respectively given as



Bt d 1   Yf 3 C 2p1 1  k pb1





Bt d 2   Yf 2 C 2p 2 1  K pb 2 Bt d 3   Yf 3



(5.6)

1  K 

C 2p 2

Bt d 4   Yf 3

(5.5)

pb 2

C 2p 2 4

(5.7)

1  K pb  2



(5.8)



MSE t d1   Y f1C 2y  f 3 C 2P1 1  2K pb1 2







MSE t d 2   Y f1C 2y  f 2 C 2p 2 1  2K kp2 2



 2 C 2p1 1  4K pb1 MSE t d 3   Y f1C y  f 3 4 





2

MSE t d 4   Y f1C 2y  f 3 

(5.10)



 

(5.11)

 1  4K pb1   4 

C 2p



(5.9)





(5.12)

where, 2





n 1 n  ji  p j  , S' j 2  '1   ji  p 'j , S   n  1 i 1 n  1 i 1 2 J

f2 

1 n



1 , N

f3 

1 1  . n n' 16

Sampling Strategies for Finite Population Using Auxiliary Information

The estimator’s t5 and t6, in two-phase sampling, takes the following form

 p' t d5  y 1 p  1

t d6

   

 P2     p'   2

 p1'  p1    y exp  p'  p   1 1

(5.13)

 p '2  P2   exp  p'  P  2  2

(5.14)

Where m1 , m 2 , n1 and n 2 are real constants. The Bias and MSE expression’s of the estimator’s t d5 and t d6 up to the first order of approximation are, respectively, given by

  m2 m   m2 m  Bt d5   Y f 3C 2p1  1  1  m1K pb1   f 2 C 2P2  2  2  m 2 k pb2  2 2  2   2  

(5.15)

  n2 n n   n2 n  n Bt d 6   Y f 3  1  1  1 K pb1 C 2p1  f 2  2  2  2 K pb2  8 2 8 2   8    8









MSE t d 5   Y f1C 2y  f 3C 2p1 m12  2m1K pb1  f 2 C 2p2 2 m 22  2m 2 K pb2

(5.16)



(5.17)

  n2   n2  2 MSE t d 6   Y f 1C 2y  f 3  1  n 1 K pb1 C 2p1  f 2  2  n 2 K pb 2 C 2p 2   4   4   

(5.18)

6. Estimator tpd in Two-Phase Sampling Using linear combination of t di i  0,1,2, we define an estimator of the form 3

t pd   h i t di  H i 0

Such that,

h i 0

(6.1)

 1 and h i  R

(6.2)

where, m1

 L p'  L   L P  L 4  t 0  y , t d1  y  1 1 2   3 2   L1p1  L 2   L 3 p' 2  L 4 

and t d2

 (L p'  L )  (Lp1  L 6 )   (L7 p' 2  L 6 )  (L7 P2  L 8 )   exp 5 1 6  exp   (L1p'1  L 2 )  (L5 p1  L 6 )   (L7 p' 2  L 2 )  (L7 P2  L 8 ) 

where h i i  0,1,2 denotes the constants used for reducing the bias in the class of estimators,

H denotes the set of those estimators that can be constructed from t di i  0,1,2 and R 17

Rajesh Singh ■ Florentin Smarandache (editors)

denotes the set of real numbers (for detail see Singh et. al. (2008)). Also, Li i  1,2,...,8 are either real numbers or the functions of the known parameters of the auxiliary attributes. Expressing tpd in terms of e’s, we have



t p  Y1  e 0  h 0  h 1 1  φ1e'1  1 1  φ1e1 

 m1

1  φ 2 e'2 -m





 h 2 exp θ1 e'1 e1 1  θ1 e'1 e1 

1 n1

expθ 2 e'2 1  θ 2 e'2 

(6.3)

After expanding, subtracting Y from both sides of the equation (6.3) and neglecting the terms having power greater than two, we have

t



 Y  Ye 0  h 1 m1φ1e'1 m1φ1e1  m 2 φ 2 e'2   h 2 n 1θ1e'1 n 1θ1e1  n 2 θ 2 e'2 

(6.4) Squaring both sides of (6.4) and then taking expectations, we get MSE of the estimator t p up to the first order of approximation, as



MSE t pd   Y h12 R 1  h 22 R 2  2h1h 2 R 3  2h1R 4  2h 2 R 5 2

 (6.5)

where,

h1 

R 2 R 4  R 3R 5 R 1R 2  R 32

h2 

R 1R 5  R 3 R 4 R 1R 2  R 32

     

(6.6)

and

  R 2  θ12 n 12 f 3C 2p1  θ 22 n 22 f 2 C 2p2   R 3  m 2 n 2 f 2 φ 2 θ 2 C 2p2 - n 1m1φ1θ1f 2 k φ C 2p1   R 4  m1φ1f 3 k pb1 C 2p1  m 2 φ 2 f 2 k pb2 C 2p2   R 5  n 1θ1f 3 k pb1 C 2p1  n 2 θ 2 f 2 k pb2 C 2p2    R 1  φ12 m12 f 3C 2p1  φ 22 m 22 f 2 C 2p2

(6.7)

Data: (Source: Singh and Chaudhary (1986), p. 177).

The population consists of 34 wheat farms in 34 villages in certain region of India. The variables are defined as: y = area under wheat crop (in acres) during 1974.

p1 = proportion of farms under wheat crop which have more than 500 acres land during 1971. and 18

Sampling Strategies for Finite Population Using Auxiliary Information

p 2 = proportion of farms under wheat crop which have more than 100 acres land during 1973. For this data, we have N=34, Y =199.4, P1 =0.6765, P2 =0.7353, S 2y =22564.6, S 21 =0.225490, S22 =0.200535,

 pb1 =0599,  pb2 =0.559,   =0.725. Table 6.1: PRE of different estimators of Y with respect to y CHOICE OF SCALERS, when h 0  0 h1  1 h 2  0 L3 m1 m2 L2 L4 L1 0 1 1 1 1

1 0 1 1 1

1 1 1 C p1

PRE’S

0 1 0  pb1

1 1 C p2

1 0  pb2

108.16 121.59 142.19 133.40 144.78

NP1

K pb1

NP2

K pb 2

136.90

NP1 N

NP2 N

133.30

K pb 2

135.73

137.09

138.23

K pb1

NP1 n

NP2 n

 pb1

 pb2

135.49

138.97

135.86

PRE’S

When, h 0  0 h1  0 h 2  1 L5 n2 n1 1 0 1 1 1 1

0 -1 -1 -1 -1 -1

1 1 1 1 1 C p1

0 0 0 1 1  pb1

1 1 1 1 1 C p2

0 0 0 1 0  pb2

130.89 108.93 146.63 121.68 127.24 123.43

-1

NP1

K pb1

NP2

K pb 2

145.49

-1

146.57

-1

NP2 N

NP1 N

K pb 2

145.84

-1

145.43

-1

145.03

NP1 n

K pb1

P1 P1

NP2 n 19

Rajesh Singh ■ Florentin Smarandache (editors)

-1

 pb1

 pb2

145.92

-1

144.85

-1

145.80

When, h 0  0 h1  0 h 2  1 also Li i  1,2,...,8  1 m1  m 2  n 1  n 2  1 PREt pd  =154.28

7. Conclusion In this paper, we have suggested a class of estimators in single and two-phase sampling by using point bi serial correlation and phi correlation coefficient. From Table 4.1 and Table 6.1, we observe that the proposed estimator tp and tpd performs better than other estimators considered in this paper.

References 1. Abd-Elfattah, A.M. El-Sherpieny, E.A. Mohamed, S.M. Abdou, O. F., 2010, Improvement in estimating the population mean in simple random sampling using information on auxiliary attribute. Appl. Mathe. and Compt. doi:10.1016/j.amc.2009.12.041 2. Government of Pakistan, 2004, Crops Area Production by Districts (Ministry of Food, Agriculture and Livestock Division, Economic Wing, Pakistan). 3. Gujarati, D. N. and Sangeetha, 2007, Basic econometrics. Tata McGraw – Hill. 4. Jhajj, H.S., Sharma, M.K. and Grover, L.K., 2006 , A family of estimators of population mean using information on auxiliary attribute. Pak. Journ. of Stat., 22(1), 43-50. 5. Malik, S. And Singh, R. ,2012, A Family Of Estimators Of Population Mean Using Information On Point Bi-Serial And Phi-Correlation Coefficient. Intern. Jour. Stat. And Econ. (accepted). 6. Naik,V.D and Gupta, P.C., 1996, A note on estimation of mean with known population proportion of an auxiliary character. Jour. Ind. Soc. Agri. Stat., 48(2), 151-158. 7. Shabbir, J. and Gupta, S., 2007, On estimating the finite population mean with known population proportion of an auxiliary variable. Pak. Journ. of Stat., 23 (1), 1-9. 8. Singh, D. and Chaudhary, F. S., 1986, Theory and Analysis of Sample Survey Designs (John Wiley and Sons, NewYork). 9. Singh, R., Cauhan, P., Sawan, N. and Smarandache, F., 2007, Auxiliary information and a priori values in construction of improved estimators. Renaissance High press. 10. Singh, R. Chauhan, P. Sawan, N. Smarandache, F., 2008, Ratio estimators in simple random sampling using information on auxiliary attribute. Pak. J. Stat. Oper. Res. 4(1) 47–53. 11. Singh, R., Kumar, M. and Smarandache, F., 2010, Ratio estimators in simple random sampling when study variable is an attribute. WASJ 11(5): 586-589. 12. Yule, G. U., 1912, On the methods of measuring association between two attributes. Jour. of The Royal Soc. 75, 579-642.