Detection of Global Salient Region via High Dimensional Color Transform and Local Spatial Support

Detection of Global Salient Region via High Dimensional Color Transform and Local Spatial Support 1T.

Jeyapriya 2G. Rajasekaran 1 P.G Scholar 2Assistant Professor (Sr. Grade) 1,2 Department of Information Technology 1,2 Mepco Schlenk Engineering College, Sivakasi 626005, India Abstract This paper proposes novice automatic salient region detection in an image which includes both the global and local features. The main motivation behind this approach is to construct a saliency map by utilizing a linear combination of colors in a high dimensional color space. In general, the human perception is highly complicated and non-linear and in response to that, the salient region consists of distinct colors compared to the background. The estimation of an optimal construction of a saliency map was done by agglomerating the low-dimensional colors to the high-dimensional feature vectors. Furthermore, a relative location and color contrast between super pixels are utilized to improve the performance. It was tested under three distinct datasets to evaluate the applicability and practicability of our proposed method. Keyword- Salient Region Detection, super pixel, Trimap, random forest, color feature, high-dimensional color transform __________________________________________________________________________________________________

I. INTRODUCTION Salient Region Detection is to detect the important region in an image in terms of the saliency map. In previous studies, many methods are applied to detect salient region. Color is very important visual cue in Salient Region Detection Techniques. This work contain Segmentation [20], object recognition [21]. Novel approach is applied in this work. This approach uses the Tree-based Classifier to estimate the location of salient region. This classifier classifies each superpixel as background, foreground and unknown region. These regions form the initial Trimap. HDCT method separates the background and foreground region for saliency map. HDCT and local learning methods are proposed from the Trimap. Global based HDCT method is to find color feature. This method joins many representative color spaces. Map the low dimensional color space into high dimensional color feature by using HDCT. Random forest method [50] applied in local learning based method. This method performs the relative location and color contrast between superpixels. A random forest classifier to classifies the saliency of a superpixel by comparing the distance and color contrast of a superpixel to the K-nearest foreground super pixels and the K-nearest background super pixels. Join the saliency maps from the HDCT-based method and the local learning-based method by weighted combination. The key contributions of this work are summarized as follows:  HDCT based method is to evaluate the linear merging of background and foreground region.  Propose a learning based method that consider local spatial relation and color contrast between super pixels.  Proposed method can improve the performance of other methods for salient region detection, by using their results as the initial saliency trimap.

II. RELATED WORK A survey and a benchmark comparison of state-of-the-art salient region detection algorithms are available in [3] and [4] respectively. Local-contrast-based models recognize salient regions by detecting rarity of image features in a small local region. Itti et al. [5] proposed a saliency detection method that utilizes visual filters called “centre-surround difference” to compute local color contrast. Harel et al. [6] proposed a graph-based visual saliency (GBVS) model; this model is based on the Markovian approach on an activation map. This model explores the variance of centre-surround feature histograms. Many methods determine saliency in superpixel level instead of pixel level; because that l is reduce the computation time. [34] Decomposed an image into compact and perceptually homogeneous elements, and then considered the uniqueness and spatial distribution of these elements in the CIE Lab color to detect salient regions. These models predict only the part of the object. They tend to give non-uniform weight to the same salient object when different features presented in the same salient object.

Fig. 1: Overview of our algorithm: (a) Input Image. (b) Over segmentation to superpixels. (c) Initial Salient Trimap.(d) Global Salient region via HDCT. (e) Local Salient region detection via random forest. (f) Our final saliency map

Global-contrast-based models use color contrast with admiration to the whole image to determine salient regions. These models can determine salient regions of an image uniformly with low computational complexity. Achanta et al. [7] proposed a frequency-tuned approach to detect the centre-surround contrast using the color and luminance in the frequency domain as features. Li et al. [43] allowed that the unique refocusing capability of light fields can robustly handle challenging saliency detection problems such as alike foreground and background in a single image. Global-contrast-based method give reliable results at low computational cost as they mostly contemplate a few specific colors that separate the foreground and the background of an image. Statistical-learning-based models have also been inspected for saliency detection. Wang et al. [15] proposed a method that jointly approximately the segmentation of objects learned by a trained classifier called the auto-context model to intensify an appearance-based energy minimization framework for salient region detection. Yang et al. [36] ranked the alike of image regions with foreground cues and background cues using graph-based manifold ranking based on affinity matrices and successfully Conducted saliency detection. Borji and Itti [16] used local and global based learning for salient in many color spaces RGB and LAB then joined them into final saliency map for salient region detection. These methods are usually more accurate and simple detection structure. In this method have more computational time, so superpixel wise salient detection is used to overcome this problem.

III. INITIAL SALIENT TRIMAP GENERATION Salient Trimap method to determine the location of salient region in an image. This method performs an image in superpixel level. Salient Trimap contain background region, foreground region and unknown region. This method calculates the feature vector of image, such as color feature, histogram feature and location feature. A. Superpixel Saliency Feature First over segment the input image to form super pixels X={X1, . . .,XN}. SLIC superpixel [1] method is used for over segment the image. This method is required low computational cost and high performance. Set the number of super pixels to N=500. Combine various information, that are used in saliency detection. For saliency detection to build feature vector. Concatenate the location of x and y superpixel into feature vector. Then concatenate the color feature using various color space representations. Next concatenate the histogram feature. The histogram feature of the ith superpixel DH is measured using the chisquare distance between other superpixels histograms. It is defined as đ?‘


(â„Žđ?‘–đ?‘˜ − â„Žđ?‘—đ?‘˜ )2 , (â„Žđ?‘–đ?‘˜ + â„Žđ?‘—đ?‘˜ )

đ??ˇđ??ťđ?‘– = ∑ ∑ đ?‘—=1 đ?‘˜=1


Where b is the number of histogram bins. In this work used eight bins for each histogram.Global contrast of the ith superpixel DHi is given by N

DGi = ∑ d(ci , cj ),



Where d(ci , cj ) denotes the Euclidean distance between the ith and jth superpixels color values, ci and cj. To compute the color contrast by using RGB, CIELAB, hue and saturation of eight color channels. The local contrast of the color feature DLi is defined by N p

DLi = ∑ ωi,j d(ci , cj )



1 1 exp(− 2 ||pi − pj ||22 ), (4) Zi 2σp Where pi ∈ [0,1] × [0,1] denotes the normalized position of the ith superpixel and Zi is the normalization term. The weight function to Provide many weight to neighboring superpixel. In this work set σ2p = 0.25. Use the superpixel area, histogram of gradient (HOG) and singular value feature(SVF) for texture and shape feature. The HOG gives display feature using pixel gradient information. The SVF is determining the blurred region from test image. The SVF is feature based eigenimages[25], which degenerate an image by a weighted summation of a number of Eigen images, where each weight is the singular value accessed by singular value degeneration. The Eigen images corresponding to the largest singular values detect the overall framework of the original image, and other smaller singular values illustrate detailed information. p

ωi,j =

B. Initial Salient Trimap via Random Forest Classification Compute the feature vector for the each superpixel, then to check whether every region is salient using by classification algorithm. This work is used the Random Forest classification method. This model operates by constructing multiple decision trees at training time. Random forest model combines the boostrap aggregating idea and random feature selection. These two ideas is to reduce the generalization error. Few features are randomly selected from the decision tree. Previous method [2] used regression method for every superpixel and classification via adaptive thresholding. Classification method is to classify each superpixel as background and foreground. Three-class classification method is to generate a trimap from output of random forest, instead of a binary classification, which detect the reliable foreground and background region. Check whether each superpixel belongs to foreground candidate, background candidate, or unknown regions using the response value extracted from the classifier. In this work used threshold values Tf ore= 1 and Tback= −1. If a superpixel’s response value exceeds Tfore, then it set to the foreground; however, if the value is lower than Tback, then it set to the background, otherwise it is considered as unknown.

IV. SALIENCY ESTIMATION FROM TRIMAP In this work present global salient region detection via HDCT and learning based method. Pixels in the salient region have independent and identical color distribution. A linear combination of high dimensional color channels, separate salient regions and backgrounds. Color contrast of local feature can reduce the gap between an in autonomous and identical color distribution model implied by HDCT and true distributions of realistic images. A. Global Saliency Estimation via HDCT Goal of this method is to find the linear combination of color feature in HDCT space. The color of salient region and background are separated. First concatenate the nonlinear RGB color space for to build HDCT space. Concatenated the CIELAB color space and the hue and saturation channel in the HSV color space. Comprised color gradients in the RGB space. Salient region and background have different amount of color contrast. That is handled by color gradient. 11 different color channels are used in HDCT space. Substitute power –law transformation to each color coefficient after normalizing the coefficient between [0,1], three gamma values are used here [0.5, 1.0 and 2.0]. This resulted in a high-dimensional matrix to represent the colors of an image: γ γ γ γ R11 R12 R13 G1 1 … γ γ γ γ R 21 R 22 R 23 G21 … .. .. .. .. ∈ RN×l , (5) K = .. . . . . . γ γ γ γ [ R N1 R N2 R N3 GN1 … ] In which Riand Gidenote the test image’s i thsuperpixel’s mean pixel value of the R color channel and G color channel, respectively. Obtain an HDCT matrix K with l = 11 ×3 = 33 by using 11 color channels. To calculate the effectiveness of the many color channel and power-law transformations. This work used 2,500 images in MSRA-B dataset. To obtain saliency map, handle the foreground and background candidate color samples in trimap to evaluate an optimal linear combination of color coefficient to separate the salient region color and background color. Define this problem as al2regularized least squares problem that minimizes min ||(U − K͂α)||22 + λ||α||22 , α


where∝ ∈ R is the coefficient vector, λ is a weighting parameter and K͂ is a M × l matrix with every row of K͂ corresponding to color samples in the foreground/background regions: l

GFS11 . .




R FS1 1 R FS2 1 R FS3 1 . . . . . . KĚƒ =


R FS1 f R FS2 f R FS3 f Îł1 R BS 1

Îł Îł2 R 3 R BS 1 BS1



Îł GBS1 1

‌ . . ‌ , ‌ . . ‌]


. . . . . . . . Îł3 Îł1 Îł1 Îł2 [R BSb R BSb R BSb GBSb Where FSi and BSi denote the ith foreground candidate superpixel and jth background superpixel. M is the number of color samples, f and b denotes number of foreground and background region, such that M=f+b.U isan M dimensional vector with value equal to 0 and 1 if a color samples belongs to the foreground and background candidate respectively: U = [ 1 1 . . .1 0 0 . . . 0 ]đ?‘‡ ∈ đ?‘…đ?‘€Ă—đ?‘™ . (8) f_1_’s


The l2 regularized least squares problem is a well-conditioned problem that can be readily minimized with respect to âˆ? as âˆ?∗ = đ??žÍ‚Í‚đ?‘‡ đ??žÍ‚Í‚ + đ?œ†đ??ź)−1 đ??žÍ‚Í‚đ?‘‡ U. Îť=0.05 produce the best result. After we obtain Îąâˆ—, the saliency map can be constructed as đ?‘†đ??ş (đ?‘‹đ?‘– ) = ∑đ?‘™đ?‘—=1 đ??žÍ‚đ?‘–đ?‘— âˆ?đ?‘—∗ , đ?‘– − 1,2, ‌ ‌ . . đ?‘ , (8) This denotes linear combination of the color coefficient of HDCT. The l2 normalizer in the least square formulation. Saliency map is more reliable for the both foreground and background superpixels are initially classified in the trimap. Tested several values of Îť, and the normalizer l2 least square with nonzero Îť produces better saliency maps than the least square method without regularize (Îť = 0). Both foreground and background superpixels in HDCT space are important for this work. The overall process of the HDCT-based saliency detection is described in algorithm 1. B. Local Saliency Estimation via Regression In HDCT method first determine the k-nearest foreground superpixels and k-nearest background superpixels. For each superpixel Xi, find the K-nearest foreground superpixels XFS = {XFS1 ,XFS2, . . . ,XFSK} and K-nearest background superpixels XBS ={XBS1, XBS2, . . . ,XBSK}, and utilize the Euclidean distance between a superpixel Xi and superpixels XFS or XBS as features. The Euclidean distance to the K-nearest foreground

Fig. 2: An illustration on local saliency features. Black, white and gray regions denotes background, foreground and unknown superpixels

The Euclidean distance to the K-nearest foreground (dFSi∈ RKĂ—1) and background (dBSi∈ RKĂ—1) features of the i thsuperpixel is defined as follows: ||đ?‘?đ?‘– − đ?‘?đ??šđ?‘†đ?‘– ||22 ||đ?‘?đ?‘– − đ?‘?đ??ľđ?‘†đ?‘– ||22 1


||đ?‘?đ?‘– − đ?‘?đ??šđ?‘†đ?‘– ||22 ||đ?‘?đ?‘– − đ?‘?đ??ľđ?‘†đ?‘– ||22 2 1 . . đ?‘‘đ??šđ?‘†đ?‘– = , đ?‘‘đ??ľđ?‘†đ?‘– = (9) . . . . 2 2 ||đ?‘? − đ?‘? || ||đ?‘? − đ?‘? đ??šđ?‘†đ?‘– 2] đ??ľđ?‘†đ?‘–1 ||2 ] [ đ?‘– [ đ?‘– đ?‘˜ th Where FSij denotes the j nearest foreground superpixel and BSij denotes the jthnearest background superpixel from the th i superpixel. The spatial distances between a candidate superpixel and the nearby foreground/background superpixels can be a useful feature for evaluating the saliency degree. The feature vector of color distances from the ithsuperpixel to the K-nearest foreground (dCFi∈R8KĂ—1) and background (dCBi∈R8KĂ—1) superpixels is defined as follows: đ?‘‘(đ?‘?đ?‘– , đ?‘?đ??šđ?‘†đ?‘– ) đ?‘‘(đ?‘?đ?‘– , đ?‘?đ??ľđ?‘†đ?‘– ) 1

đ?‘‘(đ?‘?đ?‘– , đ?‘?đ??šđ?‘†đ?‘– ) 2 đ?‘‘đ??śđ??šđ?‘– = : . đ?‘‘(đ?‘? , đ?‘? [ đ?‘– đ??šđ?‘†đ?‘– )] đ?‘˜


đ?‘‘(đ?‘?đ?‘– , đ?‘?đ??ľđ?‘†đ?‘– ) 2 đ?‘‘đ??śđ??ľđ?‘– = : . đ?‘‘(đ?‘? , đ?‘? [ đ?‘– đ??ľđ?‘†đ?‘– )]



Eight color channels are used to measure the color distance, where ci ,cFSi j, and cBSi jare eight-dimensional color vectors. The distance vector d(ci , cFSi j) is eight-dimensional vector, where each element of d(ci , cFSi j) is the distance in a single color channel. For saliency evaluation, used the superpixel-wise random forest [50] algorithm, derive feature vectors using Initial trimap. Initial trimap derived by random forest classification method. Two stages of random forest, divided the training data set into two disjoint sets so that the second random forest is trained with many realistic inputs. The first random forest trained with one data set and access training data set for second random forest. This process is repeated in a manner alike to five-fold cross-validation. C. Final Saliency Map Generation Final saliency map generated from global and local saliency maps. HDCT-based saliency map to catch the object precisely. The false negative rate is almost high mature to textures or noise. In contrast, the learning-based saliency map is fewer affected by noise, and it has a low false negative rate but a high false positive rate. Combine the two maps for saliency map. Proposed Two approaches to combine the two saliency map. The first approach is to act the pixelwise multiplication of the two maps: 1 Smult = (p(SG ) Ă— p(SL )), (11) Z Where Z is a normalization factor, p(.) is a pixel wise combination function. SG is the global saliency result, and SL is the local saliency result. The second approach is to join the two maps using a summation: 1 Ssum = (p(SG ) + p(SL )), (12) Z Use weightage to the highly salient regions. The weight values are computed by contrasting the saliency map with the ground truth. Compute the optimal weight values for the linear summation by solving the nonlinear least-squares problem, as Shown below: min ||ω1 p(ω2 SG ) + ω3 p(ω4 SL ) − GT||22 , (13) ω1 ≼0,ω2 ≼0, ω3 ≼0,ω4 ≼0

Where GTis the ground truth of an image in the training data. The final solution for the objective function in Eq. (17) is obtained as ω1= 1.15, ω2= 0.74,ω3= 1.57, and ω4= 0.89. Fig. 3 shows the precision-recall curve of the joined map.

Fig. 3: Comparison of precision-recall curves of each step on the MSRA-B dataset

The equation of the final saliency map combination as:

1 Sfinal = (ω1 p(ω2 SG ) + ω3 p(ω4 SL ) (14) Z Discover the performance greatly increments after linking the two maps: highly salient regions that have been caught by the local saliency map are conserved, and the false negative region that is ambiguously salient is damaged. The learning-based method can detect the saliency degree by detecting the spatial distribution of the nearest foreground and background superpixels. Learning-based method contains a better result than the matting algorithm.

V. EXPERIMENTS A. Benchmark Datasets for Salient Region Detection 1) MSRA-B Dataset 5,000 images are in the MSRA-B dataset with the pixel-wise ground truth. Color of saliency region different from the background region. Same training set used, including 2,500 images and the test set including 2,000 images.

Fig. 4:

F-measure curve with state-of-the-art algorithm on MSRA-B dataset B. Performance Evaluation Use two standard criteria for calculate salient region detection algorithm: precision-recall rate and F-measure rate. 1) Precision-Recall Evaluation The precision is also called the positive predictive value, and it is defined as the ratio of the number of ground-truth pixels retrieved as a salient region to the total number of pixels retrieved as the salient region.

VI. CONCLUSION A novel salient region detection method that concludes the foreground regions from a trimap using two different methods: global saliency estimation via HDCT and local saliency estimation via regression. The trimap-based robust estimation overcomes the limitations of inaccurate initial saliency classification. As a result this method achieves good performance and is computationally efficient in comparison to the state-of-the art methods. Proposed method of this work is the best performing method for salient region detection. The goal to extend the feature for the initial trimap to another improves algorithm performance.

REFERENCES [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S.Süsstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282,Nov. 2012. [2] J. Kim, D. Han, Y.-W. Tai, and J. Kim, “Salient region detection via high-dimensional color transform,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, pp. 883–890. [3] Borji, M.-M. Cheng, H. Jiang, and J. Li. (2015). “Salient object detection: A benchmark.” [Online]. Available: [4] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency tuned salient region detection,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR), Jun. 2009, pp. 1597–1604. [5] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001. [6] J. Wang and M. F. Cohen, “Optimized color sampling for robust matting,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2007, pp. 1–8. [7] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and S. Li, “Salient object detection: A discriminative regional feature integration approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013, pp. 2083–2090. [8] Borji, D. N. Sihite, and L. Itti, “Salient object detection: A benchmark,” in Proc. IEEE Eur. Conf. Comput. Vis. (ECCV), Oct. 2012, pp. 414–429. [9] W. Zhu, S. Liang, Y. Wei, and J. Sun, “Saliency optimization from robust background detection,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR), Jun. 2014, pp. 2814–2821. [10] Levin, A. Rav Acha, and D. Lischinski, “Spectral matting,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 10, pp. 1699– 1712, Oct. 2008.

