A Brief Review of Fuzzy Soft Classification and Assessment of Accuracy Methods for Identification by Co. SEP

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

www.as-se.org/ssms

A Brief Review of Fuzzy Soft Classification and Assessment of Accuracy Methods for Identification of Single Land Cover Priyadarshi Upadhyay*1, S. K. Ghosh2, Anil Kumar3 1,2

Department of Civil Engineering, Indian Institute of Technology Roorkee, India. Indian Institute of Remote Sensing, Dehradun, India.

priyadarshiu@gmail.com; 2scangfce@iitr.ernet.in; 3anil@iirs.gov.in

Abstract Identification/classfication of a specific/single class through the traditional classification scheme is not an easy task. In this study, several fuzzy set theory based soft classification methods have been studied to classify a specific class present in the remote sensing imagery. Further, the methods capable for the assessment of the accuracy of these classifiers have also been reviewed. Among the various fuzzy set theory based methods, the Possibilistic c-Means (PCM), Noise Clustering (NC ) and Noise Clustering with Entropy (NCE) have been found suitable for identification specific class. Amongst the various accuracy assessment methods only the error matrix, entropy and Receiver Operating Characteristic (ROC) are capable to assess the accuracy of the soft classified outputs from the aforementioned classifier. Keywords PCM; NC; NCE; Error Matrix; Entropy and ROC

Introduction Land cover, a physical material present on the surface of the earth, is important for many scientific, resource management and policy purposes and for a range of human activities (Cihlar, 2000). Remote sensing is an efficient, economic and faster way to retrieve the information about the land cover in wide scale. To utilize the land cover information for different mapping purposes, it is necessary to have a criterion of identifying land cover into discrete and distinct groups have similar characteristics, so a proper classification procedure is required. However, in certain cases, the user is more interested in a specific or single land cover only and has nothing to do with the other classes (Foody et al., 2006; Li and Guo, 2010; Li et al., 2011). For example, a user is interested to update the transport system, so he/she will be interested to extract only road features from the satellite data. Other features like the water body, agricultural land and forest will be of no use for that user. To extract the specific land cover class from the traditional supervised classification, it is necessary to have the information of all land cover types at the training stage, in other words, classes should be exhaustively defined (Boyd et al., 2006; Foody et al., 2006; Sanchez-Hernandez et al., 2007 ; Li and Guo, 2010; Li et al., 2011). This not only increases the cost of classification and labour (Foody et al., 2006; Li et al., 2011) but may also produce substantial error (Foody et al., 2006) in the output. Thus, for the specific class extraction, conventional supervised classification (or hard classification) method is inappropriate (Foody et al., 2006). In addition to this, conventional supervised classification may also face the problem of mixed pixel (Upadhyay et al., 2013a). Therefore, soft classifiers which are capable to handle the mixed pixel problem as well to perform the classification, when the user has the information (or training data) of single land cover only, have been reviewed in this study. Further, among various assessment of accuracy methods, only few are capable for the single class, and the same has been reviewed in this study. Fuzzy Soft Classification Method To incorporate problem of mixed pixel, in the past, researchers have proposed many â&#x20AC;&#x2DC;softâ&#x20AC;&#x2122; classification techniques that decompose a pixel into its class proportions. Some of the well known are statistical classifier like Maximum Likelihood Classifier (MLC), Linear Mixure Model (LMM) (Sanjeevi and Barnsley, 2000; Lu et al., 2004), fuzzy set theory like Fuzzy c-Mean (FCM) (Dunn, 1973; Bezdek, 1981; Bezdek et al., 1984), Possibilistic c-Means (PCM) (Krishnapuram and Keller, 1993, 96), and Noise Clustering (NC) (Dave, 1991) and some are based on Support Vector Machine (SVM) (Vapnik, 1995) and neural network (Li and Eastman, 2006; Li, 2008).

www.as-se.org/ssms

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

The MLC which is a statistical based classifier, has generally been adopted for the hard classification output. However, the output of MLC may be softened, which may depict the partial and multiple class membership for each pixel, as reported by a number of studies (Bruzzone et al., 2002; Eastman and Laney, 2002; Guerschman et al., 2003; Shalan et al., 2003; Lu et al., 2004). LMM is also a statistical based classifier and is based on the assumption that the spectral response of a pixel is the linear sum of spectral response of classes weighted by their corresponding proportional area. This method can be used to produce class proportions (i.e. soft classification) which sum to one for a pixel (Ibrahim, 2004). The neural network classification is based on the machine learning theory, whearas the SVM is based on the decision boundary. Fuzzy theory based classification is a soft classification technique that deals with vagueness, ambiguity and uncertainty in class definition. Mixed pixels are normally found along gradient or in boundaries between two or more mapping units (Kumar et. al., 2006). A number of studies have used the fuzzy set theory based classification to handle the mixed pixel (Foody, 2000; Shalan et al., 2003; Ibrahim et al., 2005; Ghosh et al., 2011; Upadhyay et al., 2012). However, their use for specific class identification is less explored. Various fuzziness based soft classifications such as Fuzzy c-Mean (FCM) (Bezdek, 1981; Bezdek et al., 1984), FCM with Entropy (FCME) (Miyamoto, 2008), Possibilistic c-Means (PCM) (Krishnapuram and Keller, 1993, 96), PCM with Entropy (PCME) (Miyamoto, 2008); Noise Clustering (NC) (Dave, 1991) and NC with Entropy (NCE) (Miyamoto et al., 2008) have been considered in this study. All the above classification approaches are based on the basic fuzzy clustering algorithm. Originally, all the above have been developed as an unsupervised classifier, yet can be modified to be used in the supervised mode by providing the information class (or cluster in case of unsupervised) means directly from the training dataset (Foody, 2000). Detailed descriptions of these classification algorithms are given in the following sections. Fuzzy c-Means (FCM) Soft classification approaches can help in quantifying the uncertainties in the areas of transition between different types of land cover (Ibrahim et al., 2005). FCM (Bezdek, 1981; Bezdek et al., 1984) originally proposed by Dunn (1973) is one of the popular approaches for the fuzzy set theory based soft classification. It is an iterative clustering method that can be employed to partition pixels of a satellite image into different class membership values. Each pixel in the satellite image is related with every information class by a function, known as membership function. The value of membership function known simply as membership, varies between zero and one. The memberships close to one means pixel is more representative of that particular information class, while memberships close to zero means pixel has little similarity with the information class (Bezdek et al., 1984). The net effect of such a function is to produce fuzzy c-partition of a given data (or satellite image in case of remote sensing). The objective function of the FCM classifier is given by: c N

J fcm (U ,V ) = ∑ ∑ ( µki ) D( xk , vi ) k 1 =i 1 =

(1)

where 2 D( xk , v= d= i) ki

xk - vi = ( xk - vi )T A( xk - vi ) A 2

(2)

Subject to constraints; c

∑ µki = 1 for all k

i =1 N

(3)

∑ µki > 0 for all i

(4)

0 ≤ µki ≤ 1 for all k,i

(5)

k =1

where U= N × c matrix, V = (v1 ........vc ) is the collection of vector of cluster centres vi , µki is a class membership values of a pixel, dki is distance in feature space between xk and vi , xk is vector (or feature vector) denoting spectral response of a pixel k , vi is a vector (or prototype vector) denoting the cluster center of class i , c and N are total number of clusters and pixels respectively, A is the weight matrix and m is weighted exponent (or fuzzifier) such as 1 < m < ∞ . When m → 1 the membership function is hard, and when m → ∞ the memberships are maximal fuzzy (Krishnapuram & Keller 1993). The weight matrix A controls the shape of the optimal information class (Bezdek et al., 1984). Generally, it takes the following norm:

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

www.as-se.org/ssms

Euclidean Norm Diagonal Norm

A=I A = Di−1

(6) (7)

A = Ci−1 Mahalonobis Norm (8) where I is the identity matrix, Di is the diagonal matrix having the diagonal element as eigen values of covariance matrix and Ci is given by : N

Ci = ∑ ( xk − ci )( xk − ci )T

(9)

k =1

where N

ci = ∑

k =1

(10)

By considering the objective function of the FCM from Equation (1), the membership value can be calculated as follows: −1

1    c  D( xk , vi )  m −1  µ ki =  ∑    , j =1   D( xk , v j )    

where

(11)

D( xk , v j ) = ∑ D( xk , vi ) i =1

where µ ki represents the realization of the class membership value µki . From Equation (1) the center of the information class v i can be computed as: N

vi =

( ) x ∑ (µ )

∑ µ ki

k =1 N

(12)

k =1

where v i represents the realization of the information class center value vi . Entropy Based Fuzzy c-Means (EFCM) From entropy theory, given by Shannon (1948), entropy for an orderly arranged data point is less and for disorderly arranged data point, it is higher. In an orderly arrangement if we try to visualize the complete dataset from an individual point, then for most individual data point, there are some data point close to it (i.e. they probably belongs to same information class) and others are away from it. Similarly, in a disorderly arrangement, most of the data points are scattered randomly, therefore the data point with minimum entropy is a good candidate for the information class (or cluster) center (Yao et al., 2000). In case of noise data being present, it is not valid until they are removed prior to the determination of the information class center. In fuzzy clustering, entropy is evaluated at each data point and the data point with the least entropy is selected as the first information class center. Then, the center of this information class and all the data points which are similar to it within a defined threshold are removed (Yao et al., 2000; Chattopadhyay et al., 2011). Similarly, the next second information class center is selected from the remaining data points and consequently others till no data point is left. Based upon the ideas of entropy based fuzzy clustering, Miyamoto et al., (2008) suggested to assign m as ‘1’ and to add another term to consider entropy K (u ) along with a regularizing parameter ( ν ) to the Equation (1) to generate results of the Entropy based Fuzzy c-Means (EFCM). It may be noted that, in the entropy based approach, the fuzzification is carried out with the entropy not by m. The objective function for EFCM approach can be expressed by: c N

c N

k 1 =i 1 =

= J efcm (U ,V ) ∑ ∑ ( µki )D( xk , vi ) + ν ∑ ∑ µki log µki

ν is the ‘regularizing parameter’

(13)

By considering the objective function of the EFCM from Equation (13), the membership value can be calculated as follows:  D xk , v i   exp  −   ν   (14) µ ki =  D xk , v j  c  ∑ exp  −   ν j =1  

(

)

(

)

www.as-se.org/ssms

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

Possibilistics c-Means (PCM) To derive accurate estimates of sub-pixel land cover composition from FCM, it is necessary to have the information of all classes in the training stage of the classification (Foody, 2000). Krishnapuram and Keller (1993) added a new term to the FCM approach to introduce the Possibilistic c-Means (PCM) approach. The specificity of this new term is that, it emphasizes (or assign high membership values) the representative feature point and de-emphasizes (or assign low membership values) the unrepresentative feature point present in the data. Further, the presence of untrained classes does not affect the classification outputs in the PCM (Foody, 2000). The objective function for the PCM classifier (Krishnapuram and Keller, 1993) is given by: c N

=i 1

k 1 =

m J= ∑ ∑ ( µki ) D( xk , vi ) + ∑ ηi ∑ (1 − µki ) pcm (U ,V ) k 1 =i 1 =

(15)

Subject to constraints; max µki > 0 for all k

(16)

∑ µki > 0

for all i

k =1

(17)

0 ≤ µki ≤ 1 for all k, i where ηi is a suitable positive number, m is again a weighting exponent (or fuzzifier) such as 1 < m < ∞ .

(18)

The first term in the objective function of PCM demands that the distance between the feature vectors and prototype vector should be as low as possible. On the other hand, the second term forces the membership function µki to be as large as possible. However, interpretation of m is different for FCM and PCM (Krishnapuram and Keller, 1996). Increasing values of m , in case of FCM, represents increased sharing of pixels in remote sensing image (or data) among all information classes, whereas for PCM, increasing values of m represents the increased possibility of all pixels completely belonging to a given information class. By considering the objective function of the PCM from Equation (15), the membership value can be calculated as follows: 1 µ ki = (19) 1

(

)

1 + D( xk , vi ) ηi ( m−1)

where µ ki is the realization of membership function µki and ηi can be calculated as follows: N

η= K × ∑ µkim D( xk , vi ) ∑ µkim i

(20)

k 1= k 1 =

where K is a constant and generally kept as 1.

ηi is also known as the bandwidth parameter (Foody, 2000), it is a distance at which the membership to a class equals to 0.5. Entropy Based Possibilistics c-Means (EPCM) As suggested by Miyamoto et al.,(2008), the Entropy based Possibilistic c-Means (EPCM) is identical to the EFCM. Therefore, the objective function and membership function can be represented by Equations (13) and (14) respectively. Noise Clustering (NC) In FCM, noisy points (i.e. outliers) are grouped with information classes with same overall membership value of one. The idea of proper handling of noisy points was first proposed by Ohashi, 1984 (Dave and Krishnapuram, 1997). According to Dave (1991) noise classes (or outliers) can be segregated from the core information class (or cluster). They do not degrade the quality of clustering analysis. The main concept of the NC algorithm is to introduce a single noise information class (c+1) that will contain all noise data points. The objective function of the NC can be obtained by adding another term to FCM for (c+1)th noise information class as follows (Dave, 1991): c N

= J nc (U ,V ) ∑ ∑ ( µki ) D( xk , vi ) + ∑ ( µk ,c +1 ) δ m

k 1 =i 1 =

k 1 =

(21)

where U = N × c + 1 matrix, V = (v1 ........vc ) . The noise information class has no center and the dissimilarity Dk ,c +1 between xk and this noise information class can be expressed as (Miyamoto et al., 2008):

Dk ,c +1 = δ

(22)

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

www.as-se.org/ssms

where δ > 0 is a fixed parameter. The constraints imposed in the objective function of Equation (21) are given by:

U = µ : c +1 µ = 1,1 ≤ k ≤ N ;  ki ∑ kj   j =1 µf =    µ ∈ [ 0,1] ,1 ≤ k ≤ N ,1 ≤ i ≤ c + 1 ki  

(23)

From the objective function of NC, the membership values of information class and noise are given by Equations (24) and (25) respectively, while Equation (26) gives the mean value of information classes. −1

1 1    c  D( xk , vi )  m−1  D( xk , vi )  m−1  = + µ ki  ∑      ,1 ≤ i ≤ c δ j =1  D( xk , v j )        

 c  δ µ k ,c +1  ∑  =  j =1  D xk , v j  

(

vi =

( ) x ∑ (µ )

−1

(25)

∑ µ ki

k =1 N

k =1

)

1   m−1   + 1    

(24)

(26)

,1 ≤ i ≤ c

The noise class will always remain at a constant distance from all data point. This constant distance is referred to as noise distance and is represented by parameter δ , also known as ‘resolution parameter’. If δ is assigned a very small value, then most of the points will get classified as noise points, while for a large value of δ most of the points will be classified into other information classes compared with the noise class (Dave, 1991; Rehm et al., 2007).Thus, the importance of (c+1)th class is to take the effect of outliers for classification. Noise Clustering with Entropy (NCE) The objective function for NCE approach can be expressed by (Miyamoto et al., 2008): c N

= J nce (U ,V ) ∑ ∑ ( µki )D( xk , vi ) + ∑ ( µk ,c +1 )δ k 1 =i 1 = c +1 N

k 1 =

(27)

+ ν ∑ ∑ µki log µki k 1 =i 1 =

where U = N × c + 1 matrix, V = (v1 ........vc ) and ν is the ‘regularizing parameter’. The constraints imposed in the above objective function of can be enumerated as: U = µ : c +1 µ = 1,1 ≤ k ≤ N ;  ki ∑ kj   j =1 µf =    µ ∈ [ 0,1] ,1 ≤ k ≤ N ,1 ≤ i ≤ c + 1 ki  

(28)

From the objective function of NCE (Equation 27), the membership values of information class and noise are given by Equations (29) and (30) respectively, while Equation (31) gives the mean value of information classes.

(

)

 D xk , v i   exp  −   ν   = µ ki   D x v , j c k  + exp  − δ ∑ exp  −   ν j =1  ν  

(

)

,1 ≤ i ≤ c

 δ exp  −   ν µ k ,c +1 =  D xk , v j  c  + exp  − δ ∑ exp  −   ν j =1  ν  

(

)

(29)

  

,1 ≤ i ≤ c

(30)

  

www.as-se.org/ssms

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

vi =

∑ µ ki xk

k =1 N

∑ µ ki

,1 ≤ i ≤ c

(31)

k =1

Fuzzy Set Theory Based Classifier’s Application for a Single Information Class In general, it is observed that while classifying any satellite data, the analyst is interested in identifying or extracting a single information class (Sengar et al., 2013; Upadhyay et al., 2013a; Upadhyay et al., 2013b). However, the analyst have to devote time and effort to extract the information of no interest along with the interest class. In this study, it is proposed to use a fuzzy set theory based classifier such that only the class of interest is extracted and the rest is simply ignored or discarded. In this section, the behavior of fuzzy set theory based classifiers for a single information class (or cluster) has been presented. It may be noted that the uniqueness of this approach is that the training data for that class will be provided. By considering, the membership function of FCM from Equation (11) for a single information class present in the dataset, then D( xk , vi ) = D( xk , v j ) and µ ki = 1 , which means that the membership of all features will be equal to one. Therefore, all the pixels in a remote sensing image will belong to a single information class which is not true. Thus, the FCM algorithm fails, when the single information class of remote sensing data is extracted/generated. On the other hand, the PCM algorithm follows FCM for in the initial iterations. Thus, by considering Equation (20), the bandwidth parameter ( ηi ) for single information class can be given by: N

η= K × ∑ D( xk , vi ) N and µki = 1 i k =1

(32)

Therefore, µ ki for PCM can be calculated by Equation (19), which may have an acceptable value on contrary to FCM. By considering the membership function of EFCM from Equation (14), for single information class then D( xk , vi ) =

D( xk , v j ) and µ ki = 1 .Thus, EFCM algorithm, which is identical to the EPCM is not suitable to the single land cover identification Further, for extraction of a single information class, the membership values of NC classifier can be calculated by

(

)

substituting D = ( xk ,ν i ) D xk ,ν j ≅ D ( xk ,ν c ) in Equations (24) and (25). Therefore, the membership values for the single information class in case of NC classifier can be obtained as:

and

1    D( xk , vc )  m−1   µ kc= 1 +      δ   

−1

(33)

−1

1    m−1  δ  (34) µ k ,c +1  1 = +   D ( xk , vc )       From Equation (33) and (34), it is clear that for a single information class, the membership values for both good cluster and noise cluster remains significant with µ kc = 1 − µ k ,c +1 . Thus, the membership value of a noise point in a

good cluster is not forced to one (Dave and Krishnapuram, 1997). Similarly, the membership values and the class centre using NCE classifier for a single information class can be expressed as follows: µ kc

and

µ k ,c +1

 D ( xk , vc )  exp  −  ν   =  D ( xk , vc )   δ exp  −  + exp  −  ν  ν  

(35)

 δ exp  −   ν =  D ( xk , vc )   δ exp  −  + exp  −  ν  ν  

(36)

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

www.as-se.org/ssms

where µ kc = 1 − µ k ,c +1 . Thus, it can be concluded that amongst all the fuzzy set theory based classifiers, only a few are capable of single land cover classification. The utility of different fuzzy based classifiers for mixed pixel and for single land cover identification is shown in Table 1. TABLE1 LIST OF FUZZY SET THEORY BASED CLASSIFIER FOR SINGLE LAND COVER IDENTIFICATION.

Fuzzy Set Theory Based Classifier FCM EFCM PCM EPCM NC NCE

Mixed pixel      

Single Class Identification

  

Assessment of Accuracy In general, it is observed that while classifying any satellite data, the land cover identified through the remote sensing technique is represented by thematic map. Checking how accurately a class is shown in a thematic map depends upon the accuracy achieved during the classification of remote sensing data. Many efforts have been made to check the quality of thematic mapping from the Error Matrix or the Confusion Matrix (Congalton et al., 1983; Congalton, 1991; Foody, 2002). Congalton et al., (1983) used the Cohen’s Kappa statistics for the assessment of classification output from the remote sensing data. The Kappa coefficient makes some compensation for the chance agreement (Foody, 2002). Although the Error Matrix and Kappa measure have proven more accurate measure of classification accuracy, yet these have their own limitations while working with coarse spatial resolution satellite image. Foody (2002) raised the question with regards to the assessment of accuracy for large areas, especially at regional or global scale, where the coarse spatial resolution remote sensing image may have sufficiently large number of mixed pixels. Therefore, the accuracy assessment techniques sought to be improved while working with the coarse resolution satellite image.In addition to this, Foody (2008) has raised questions regarding the widely accepted target accuracy of 85% given by the Anderson (1976). According to Foody (2008), 85% target accuracy is acceptable for the broad land cover classes at a small scale and the target suggested by Anderson (1976) for his mapping application, is not universally applicable. For the assessment of accuracy of the soft classified output, there is no any standard method available unlike as in the hard-classifiers, where assessment methods such as Error Matrix and Kappa coefficient are available. In case of reference and output data as fraction images or membership grades, the data has to be hardened in order to use the conventional error matrix for assessment of accuracy. The hardening of soft classified data leads to loss of information (Binaghi et al., 1999). To evaluate the accuracy of soft classification various methods are available like Entropy (Maselli et al., 1994), Cross Entropy (Foody, 1995), Euclidean and the L1- distance (Foody and Arora, 1996), and Correlation Coefficients (Maselli et al., 1996). All these methods are treated as indirect methods of accuracy assessment, since the evaluation of accuracy is interpretative rather than a representation of an actual value as denoted by traditional error matrix (Ibrahim, 2004). Fuzzy ERror Matrix (FERM) Binaghi et al., (1999) proposed the Fuzzy ERror Matrix (FERM) for the assessment of soft classified data. The FERM is based on the fuzzy set theory and is a generalization of traditional confusion matrix. In case of hard classified data and reference data, it is similar to the traditional confusion matrix. FERM takes the fraction soft classified images as input instead of traditional hard classified images. It accounts for the diagonalization characteristic (a perfect matching case, agreement up to pixel level) of sub pixel confusion matrix. The derived indices of FERM (Producer, User and Overall accuracy) are based on diagonal elements and total grades of soft reference and soft classified datasets. These derived indices have nothing to do with the off diagonal entries in the FERM. The layout of the fuzzy error matrix is similar to the traditional matrix. The exception is that the elements of a FERM can be any non negative real numbers instead of non negative integer numbers. The rows of the FERM generally define the soft classified data and the columns define the soft reference data. The elements of FERM represent the class proportion corresponding to reference data (i.e. soft reference data) and classified outputs (i.e. soft classified image) respectively. The fuzzy minimum operator (MIN) (Table 4) is used to construct the FERM and to determine the matrix elements in which the degree of membership in fuzzy interaction in between the classified and reference

www.as-se.org/ssms

(

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

)

partition Ci  R j is computed as (Binaghi et al., 1999; Stehman et al., 2007) in Equation = M ( i, j ) C= i  Rj

(

∑ min ski , rkj

x∈X

)

(37)

TABLE 2 LAYOUT OF A FUZZY ERROR MATRIX

Soft Classification Class 1 Class 2

…

Soft Reference Data Class 2 … M(1,2) … M(2,2) … . . . . . .

Class 1 M(1,1) M(2,1) . . .

Class c M(1,c) M(2,c) . . .

Total Grades C1 C2 . . .

Class c M(c,1) M(c,2) … M(c,c) Cc Total Grades R1 R2 Rc … th th Definition of terms: M (i,j) is the member of FERM in the i class in soft classified output and j class of soft reference data, Ci is the sum of class proportions of class i in the classified output and Rj is the sum of class proportions of class j from the reference data.

Composite Operator Based FERM Although, FERM is one of the most basic approaches for assessment of accuracy for soft classification, yet it is not used as a standard accuracy measure. The reason behind this is that the cross comparison in FERM is not consistent with the traditional confusion matrix. For the cross comparison to be consistent, it requires to have a diagonal matrix when a map is compared to itself, and that its marginal totals match with the total of membership grades from the reference and assessed data. However, the composite operators based accuracy measure proposed by Pontius and Cheuk, (2006) can be used for the computation of cross comparison matrix. For these composite operators, certain fundamental properties on agreement and disagreement have been established so that meaningful matrix entries can be made (Silván-Cárdenas and Wang 2008). The agreement and disagreement measure for the composite operator is given by:  A ski , rkj , if i = j  C ski , rkj =  ' '  D ski , rkj , if i ≠ j

(

( (

)

) )

(38)

' s= ski − min ( ski , rki ) ki

(39)

' rkj= rkj − min skj , rkj

(40)

(

)

where A and D denotes the agreement and disagreement operators respectively, ski and rkj denotes the classified and reference grades of class i and j respectively at pixel k and ski' and rkj' denote the over and underestimation errors at pixel k . The operators A and D satisfy the basic properties as mentioned in the Table 3 for agreement and disagreement measure. In this study, the MIN-MIN, MIN-PROD, MIN-LEAST composite operators (Pontius and Cheuk, 2006; Silván-Cárdenas and Wang 2008) are used for the assessment of soft classified outputs. These composite operators were derived from three basic operator viz. minimum operator (MIN), product operator (PROD) and LEAST operator. The MIN operator is a fuzzy set intersection operator and it measures the maximum sub-pixel class overlap, PROD operator measures the expected sub pixel class overlap and the LEAST operator measures minimum possible subpixel class overlap between classified and referenced sub-pixel partition. Expressions for basic and composite operators have been mentioned in Table 4 and 5 respectively. TABLE 3 BASIC PROPERTIES FOR AGREEMENT AND DISAGREEMENT MEASURES

Property

(Silván-Cárdenas and Wang, 2008).

Definition C ( s, r ) = C ( r,s )

Agreement

Disagreement

Commutativity

Yes

Positivity

s > 0 ∨ r > 0 ⇒ C ( s, r ) > 0

Yes

Nullity

s = 0 ∧ r = 0 ⇒ C ( s, r ) = 0

Yes

Upper Bound

C ( s, r ) ≤ C ( r,r )

Yes

Homogeneity

C ( as, ar ) = aC ( r,s )

Yes

C ( s, r ) denotes a comparison (agreement and disagreement) measure between grades s and r , and a is a positive number.

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

TABLE 4 THREE BASIC OPERATORS

Operator ID

Form

MIN

min ski , rkj

PROD

ski × rkj

LEAST

max ski + rkj − 1, 0

(

) )

(Silván-Cárdenas and Wang, 2008).

Traditional Interpretation

Sub-pixel Interpretation

Fuzzy Set Intersection

Maximum Overlap

Joint Probability

Expected Overlap

Minimum Overlap

TABLE 5 THREE COMPOSITE OPERATORS

Operator ID

Agreement ( i = j )

MIN-MIN

min ( ski , rki )

www.as-se.org/ssms

(Silván-Cárdenas and Wang, 2008)

Disagreement ( i ≠ j )

min

(

ski' , rkj'

Sub-pixel confusion

)

MIN-PROD

min ( ski , rki )

ski' × rkj' c rni' i =1

MIN-LEAST

min ( ski , rki )

 ski' + rkj'    max  c '  r , 0 −  ∑ ki   i =1 

Constrained maximum

∑

Constrained expected

Constrained minimun

Since the MIN operator satisfies all basic properties mentioned in Table 5, so all composite operators use MIN operator for an agreement case. The MIN-PROD operator uses MIN for diagonal (agreement) and a normalized PROD for off-diagonal (disagreement) cells thus combined the fuzzy set view with a probabilistic view (SilvánCárdenas and Wang 2008). The MIN-MIN operator uses MIN for both agreement and disagreement case. Similarly, MIN-LEAST operator uses MIN for diagonal cells and normalized LEAST for off diagonal cells. The solution corresponds to different subpixel class overlap by aforementioned composite operators was constrained to unmatched sub-pixel fraction. Sub-pixel Confusion Uncertainty Matrix (SCM) Although, it has been observed that the MIN operator is an appropriate candidate for the measure of agreement for sub-pixel confusion matrix, yet it fails when one accounts for the measure of disagreement. This can be solved by using composite operators based measures. However, this disagreement between off diagonal elements produces uncertainty in sub-pixel distribution, leading to underspecified problem termed as sub-pixel area allocation problem. To account for this problem, Silván-Cárdenas and Wang (2008) proposed a cross comparison matrix known as Subpixel Confusion Uncertainty Matrix (SCM). It uses the confusion intervals in terms of centre value ± maximum error to account for this uncertainty. These confusion intervals express the possible confusion among classes and are formed by the MIN-MIN and MIN-LEAST composite operators. For the unique solution of area allocation problem, these confusion intervals should be tight. TABLE 6 GENERAL STRUCTURE OF SCM

Soft Classification

…

Class 1 P11 P21±U21 . . .

Class c Total Grades

Pc1±Uc1 P+1±U+1

Class 1 Class 2

(Silván-Cárdenas and Wang, 2008). Soft Reference Data Class 2 P12±U12 P22 . . . Pc2±Uc2 P+2±U+2

… … … . . .

Class c P1c±U1c P2c±U2c . . .

…

Pcc P+c±U+c

…

Total Grades P1+±U1+ P2+±U2+ . . . Pc+±Uc+ P++±U++

Silván-Cárdenas and Wang (2008) represented the confusion interval in the form Pij ± Uij , where Pij represent the center value of the interval and Uij interval half-width (Table 6). These values and are computed as: Pij = Uij =

where

PijMIN − MIN + PijMIN − LEAST 2 PijMIN − MIN − PijMIN − LEAST 2

i, j = 1, 2,....c (41) Amongst the aforementioned error matrix based accuracy measures, only FERM is applicable to assess the accuracy

www.as-se.org/ssms

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

of PCM classification, in case of single land cover identification (Table 7). However, the composite operators (MINMIN, MIN-PROD and MIN-LEAST) and SCM can not be used, because all these account for the diagonal as well as the off diagonal values of the error matrix. Further, in case of single land cover identification through the NC and NCE method, all the assessment methods mentioned above are applicable (Table 7). The reason is provision of a separate noise class for both NC and NCE soft classified outputs. TABLE 7 ASSESSMENT OF ACCURACY METHODS FOR FUZZY SET THEORY BASED CLASSIFIER FOR SINGLE LAND COVER IDENTIFICATION.

Assessment of Accuracy Methods FERM Composite Operator based FERM SCM ROC

PCM 



NC    

NCE    

Receiver Operating Characteristic (ROC) for the Assessment of Accuracy of Single Land Cover Identification The Receiver Operating Characteristic (ROC) which is based on the Neyman-Pearson detection theory is used for the evaluation of detection performance in signal processing, communication and medical diagnosis (Chang et al., 2001; Wang et al., 2005; Miyamoto et al., 2008; Chang, 2010). The ROC curve is used to illustrate the performance of a binary classifier system. The detection of Neyman-Pearson detection curve is measured by the area under its corresponding curve. The area is denoted by Az and bounded between ½ to 1. For better detection, it should be closer to 1 (Wang et al., 2005). The 2-D ROC curve is plotted by the False Alarm Rate (FAR) on one axis (x-axis) and True Positive (TP) rate in another axis (y-axis). On the other hand, the 3-D ROC curve is plotted by taking the False Alarm Rate (FAR) on x-axis, Detection Threshold (τ) in y- axis and True Positive (TP) rate in z-axis. The 2-D ROC can be used for hard decision produced by the classifier, whereas 3-D ROC for the soft decision (Wang et al., 2005). The TP and FAR can be defined as follows: TP =

Total number of target pixels detected as target Total number of target pixels present in the sample

FAR =

Total number of background pixels detected as target Total number of background pixels present in the sample

(42) (43)

In case of single land cover identification, the single class to be identified is the target class, whereas, the remaining classes can be combinedly considered as background class. This method is applicable for the assessment of accuracy of the all fuzzy set theory based classification methods, for single land cover identification (Table 7). Conclusion In this study, various fuzzy set theory based soft classification methods have been reviewed. Amongst these fuzzy set theory based classification methods, only PCM, NC and NCE have been found to be suitable for the identification of single land cover. Further, amongst various assessment of accuracy methods, only FERM and ROC are applicable for assessing the identification of single land cover using the PCM classifier, whereas for NC and NCE classifiers, FERM, composite operator based FERM, SCM and ROC are found to be suitable for assessing the identification of single land cover from remote sensing imagery. REFERENCES

[1]

Anderson, J.R., Hardy, E.E., Roach J.T. and Witmer, R.E. “A land use and land cover classification system for use with remote sensor data.” U.S. Geological Survey Professional Paper 964, 1976.

[2]

Bezdek, J.C. “Pattern recognition with fuzzy objective function algorithms.” Plenum, New York, USA, 1981.

[3]

Bezdek, J.C., Ehrlich, R. and Full, W. “FCM: The Fuzzy C-Means Clustering algorithm.” Computers & Geosciences 10 (1984): 191-203.

[4]

Binaghi, E., Brivio, P. A., Chessi, P. and Rampini, A. “A fuzzy set based accuracy assessment of soft classification.” Pattern Recognition Letters 20(1999): 935-948.

[5]

Boyd, D.S., Sanchez‐Hernandez, C. and Foody, G.M. “M apping a specific class for priority habitats monitoring from satellite

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

www.as-se.org/ssms

sensor data.” International Journal of Remote Sensing 27(2006): 2631-2644. [6]

Bruzzone, L., Cossu, R. and Vernazza, G. “Combining parametric and nonparametric algorithms for a partially unsupervised classification of multitemporal remote sensing images.” Information fusion 3(2002): 289-297.

[7]

Chang, C.I. “Multi parameter receiver operating characteristic analysis for signal detection and classification.” IEEE Sensors Journal 10(2010): 423-442.

[8]

Chang, C.I., Ren, H., Chiang, S.S. and Ifarraguerri, A. “An ROC analysis for subpixel detection.“ IEEE

International

Geoscience and Remote Sensing Symp., Australia, July 24-28, 2001. [9]

Chattopadhyay, S., Pratihar D.K. and Sarkar, S.C.D. “A comparative study of fuzzy c-means algorithm and entropy-based fuzzy clustering algorithms.” Computing and Informatics 30(2011): 701-720.

[10] Cihlar, J. “Land cover mapping of large areas from satellites: status and research priorities.” International Journal of Remote Sensing 21(2000): 379-387. [11] Congalton, R.G. “A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data.” Remote Sensing of Environment 37(1991): 35-46. [12] Congalton, R.G., Oderwald, R.G. and Mead, R.A. “Assessing Landsat classification accuracy using discrete multivariate statistical techniques.” Photogrametric Engineering & Remote Sensing 49 (1983): 1671-1678. [13] Dave, R.N. “Characterization and detection of noise in Clustering.” Pattern Recognition Letters 12(1991): 657-664. [14] Dave, R.N., Krishnapuram, R. “Robust clustering methods: unified view.” IEEE Transactions on Fuzzy Systems 5 (1997): 270293. [15] Dunn, J.C. “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters.” Cybernetics and Systems 3 (1973), 32–57. [16] Eastman, J.R. and Laney, R.M. “Bayesian soft classification for subpixel analysis: a critical evaluation.” Photogrammetric Engineering& Remote Sensing 68 (2002): 1149-1154. [17] Foody, G.M. “Cross entropy for the evaluation of the accuracy of a fuzzy land cover classification with fuzzy ground data.” ISPRS Journal of Photogrammetry and Remote Sensing 50(1995): 2-12. [18] Foody, G.M. “Estimation of sub-pixel land cover composition in the presence of untrained classes.” Computers & Geosciences 26(2000): 469-478. [19] Foody, G.M. “Status of land cover classification accuracy assessment.” Remote Sensing of Environment 80(2002): 185-201. [20] Foody, G.M. “Harshness in image classification accuracy assessment.” International Journal of Remote Sensing 29(2008): 3137-3158. [21] Foody, G.M., Mathur, A., Sanchez-Hernandez, C. and Boyed, D.S. “Training set size requirements for the classification of a specific class.” Remote Sensing of Environment 104 (2006): 1-14. [22] Foody, G.M. and Arora, M.K. “Incorporation of mixed pixel in training, allocation and testing stage of supervised classification.” Pattern Recognition Letters, 17(1996): 1389-1398. [23] Ghosh, A., Mishra, N.S. and Ghosh, S. “Fuzzy clustering algorithms for unsupervised change detection in remote sensing images. Information Sciences 181 (2011): 699-715. [24] Guerschman, J.P., Paruelo, J.M., DI Bella, C., Giallorenzi, M.C. and Pacin, F. “Land cover classification in the Argentine Pampas using multitemporal machines for land cover classification.” International Journal of Remote Sensing 23(2003): 725749. [25] Ibrahim, M.A. “Evaluation of Soft Classification for Remote Sensing Data.” PhD thesis, Indian Institute of Technology, Roorkee, India, 2004. [26] Ibrahim, M.A., Arora, M.K. and Ghosh, S.K. “Estimating and accommodating uncertainty through the soft classification of remote sensing data.” International Journal of Remote Sensing 26(2005): 2995-3007. [27] Krishnapuram, R. and Keller, J.M. “A possibilistic approach of clustering.” IEEE Transaction of Fuzzy Systems 1(1993): 429-437.

www.as-se.org/ssms

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

[28] Krishnapuram, R. and Keller, J.M. “The Possibilistic C-Means Algorithm: Insights and Recommendations.” IEEE Transactions on Fuzzy Systems 4 (1996): 385-393. [29] Kumar, A., Ghosh, S.K., Dadhwal, V.K. “Sub-pixel land cover mapping: SMIC System.” ISPRS International Symposium on Geospatial Databases for Sustainable Development, Goa, India, September 27-30, 2006. [30] Li, W. and Guo, Q. “A maximum entropy approach to one class classification of remote sensing imagery.” International Journal of Remote Sensing 31(2010): 2227-2235. [31] Li, W., Guo, Q. and Elkan, C. “A Positive and Unlabeled Learning Algorithm for One-Class Classification of Remote-Sensing Data.” IEEE Transactions on Geoscience and Remote Sensing 49 (2011): 717-725. [32] Li, Z. “Fuzzy ARTMAP based neuro-computational spatial uncertainty measures.” Photogrammetric Engineering & Remote Sensing 74 (2008): 1573-1584. [33] Li, Z. and Eastman, J.R. “The nature of and classification of unlabelled neurons in the use of Kohonen’s Self-Organizing Map for supervised classification.” Transactions in GIS 10(2006): 599-613. [34] Lu, D., Mausel, P., Batistella, M. and Moran, E. “Comparison of land cover classification methods in the Brazilian Amazon basin.” Photogrammetric Engineering & Remote Sensing 70 (2004): 723-732. [35] Maselli, F., Conese, C. and Petkov, L. “Use of probability entropy for the estimation and graphical representation of the accuracy of maximum likelihood classifications.” ISPRS Journal of Photogrammetry and Remote Sensing 49(1994):13-20. [36] Maselli, F., Rodulf, A. and Conese, C. “Fuzzy classification of spatially degraded thematic mapper for the estimation of subpixel components.” International Journal of Remote Sensing 17(1996): 537-551. [37] Miyamoto, S., Ichihashi, H. and Honda, K. “Algorithms for fuzzy clustering, studies in fuzziness and soft computing.” Springer, 2008. [38] Ohashi, Y. “Fuzzy clustering and robust estimation in 9th Meet.” SAS Users Grp. Int., HollywoodBeach, FL, 1984. [39] Pontius Jr, R.G. and Cheuk, M.L. “A generalized cross-tabulation matrix to compare soft-classified maps at multiple resolutions.” International Journal of Geographical Information Science 20(2006): 1-30. [40] Rehm, F., Klawonn, F., Kruse, R. “A novel approach to noise clustering for outlier detection.” Soft Computing 11 (2007): 489– 494. [41] Sanjeevi, S. and Barnsley, M.J. “Spectral unmixing of Compact Airborne Spectrographic Imager (CASI) data for quantifying sub-pixel proportions of parameters in a coastal dune system.” Journal of the Indian Society of Remote Sensing, 28 (2000): 187-204. [42] Sanchez-Hernandez, C., Boyed, D.S. and Foody, G.M. “One-Class Classification for Mapping a Specific Land-Cover Class: SVDD Classification of Fenland.” IEEE Transactions on Geoscience and Remote Sensing, 45(2007): 1061-1073. [43] Sengar, S.S., Kumar, A., Ghosh, S. K. and Wason, H.R., “Liquification Identification using IRS-1D Temporal Indices Data.” Journal of Indian Society of Remote Sensing 41(2013): 355–363. [44] Shalan, M.A., Arora M.K. and Ghosh S.K. “An evaluation of fuzzy classification from IRS 1C LISS III data.” International Journal of Remote Sensing 24 (2003) 3179-3186. [45] Shannon, C.E. “A mathematical theory of communication.” At & T Tech J 27(1948): 379–423. [46] Silván-Cárdenas, J.L. and Wang, L. “Sub-pixel confusion–uncertainty matrix for assessing soft classifications.” Remote Sensing of Environment 112(2008): 1081–1095. [47] Stehman, S.V., Arora, M.K., Kasetkasem,T. and Varshney, P.K. “Estimation of fuzzy error matrix accuracy measures under stratified random sampling.” Photogrammetric Engineering & Remote Sensing 73(2007): 165–173. [48] Upadhyay, P., Kumar, A., Roy, P.S., Ghosh, S. K. and Gilbert, I. “Effect on specific crop mapping using WorldView-2 multispectral add-on bands: soft classification approach.” Journal of Applied Remote Sensing 6(2012): 063524. [49] Upadhyay, P., Ghosh, S.K. and Kumar, A. “Moist Deciduous Forest Identification using Temporal MODIS Data- a comparative study using fuzzy based classifiers.” Ecological Informatics 18(2013 a): 117–130.

Studies in Surveying and Mapping Science (SSMS) Volume 2, 2014

www.as-se.org/ssms

[50] Upadhyay, P., Kumar, A. and Ghosh, S.K. “Fuzzy Based Approach for Moist Deciduous Forest Identification using MODIS Temporal Data.” Journal of Indian Society of Remote Sensing 41(2013b): 777–786. [51] Vapnik, V. “The Nature of Statistical Learning Theory.” Springer-Verlag., New York, 1995. [52] Wang, J. Chang, C.I., Yang, S.C., Hsu, G.C., Hsu, H.H., Chung et al. “3D ROC analysis for medical diagnosis evaluation.” Proceedings of 27th Annual International Conference. IEEE Engineering Medical Biological Society (EMBS), Shanghai, China (2005): 7545-7548. [53] Yao, J., Dash, M., Tan, S.T. and Liu, H. “Entropy-Based Fuzzy Clustering and Fuzzy Modeling.” Fuzzy Sets and System 113 (2000): 381-388.

Author’s Biographies Priyadarshi Upadhyay is a Ph.D. student at Department of Civil Engineering of Indian Institute of Technology Roorkee, Roorkee, India. He received his M.Tech. Degree in Remote Sensing from B.I.T. Mesra Ranchi, India in 2007 and M.Sc. Degree in Physics from Kumaon University Nainital, India in 2004. He has worked with DTRL(Ministry of Defence)Delhi, India during 2007-2008. His research interest includes the Soft Computing, Time series Image analysis, Microwave remote sensing and Surveying. He has published more than 10 research papers in the area of remote sensing both in the peer reviewed international journals as well as in the international conferences. Dr. Sanjay Kumar Ghosh is Professor at Department of Civil Engineering, Indian Institute of Technology, Roorkee – 247667, Uttarakhand, India. He received his B.E. Civil and M.E Civil, both from University of Roorkee (now IIT Roorkee) in the year 1980 and 1982 respectively. He was awarded his Ph.D degree from University of Strathclyde, Glasgow in 1991, under the Commonwealth Fellowship program of the British Council U.K. He has published more than 100 papers in various journals and conferences and guided 12 Ph.D thesis and 8 are in progress under his guidance. Further he has also guided 66 M.Tech thesis and 2 are under progress. His current interests are in the area of remote sensing, image processing and GIS applications. Dr. Anil Kumar is Scientist/Engineer ‘SF’ at Indian Institute of Remote Sensing, Indian Space Research Organization, Dehradun, India working from 1998 onwards. He received his B.Tech degree in Civil Engineering from University of Lucknow, India and Master of Engineering degree in Photogrammetry and Remote Sensing India and Ph.D in Soft Computing from Indian Institute of Technology Roorkee. He has published more than 30 papers in various journals and conferences and guided 1 Ph.D thesis and 4 are in progress under his guidance. His research interests include Soft Computing, Digital Photogrammetry, GPS and LiDAR.