A New Unsupervised Approach for Segmenting and Counting Cells in HighThroughput Microscopy Image Sets
Abstract: New technological advances in automated microscopy have given rise to large volumes of data, which have made human-based analysis infeasible, heightening the need for automatic systems for high-throughput microscopy applications. In particular, in the field of fluorescence microscopy, automatic tools for image analysis are making an essential contribution in order to increase the statistical power of the cell analysis process. The development of these automatic systems is a difficult task due to both the diversification of the staining patterns and the local variability of the images. In this paper, we present an unsupervised approach for automatic cell segmentation and counting, namely CSC, in high-throughput microscopy images. The segmentation is performed by dividing the whole image into square patches that undergo a gray level clustering followed by an adaptive thres holding. Subsequently, the cell labeling is obtained by detecting the centers of the cells, using both distance transform and curvature analysis, and by applying a region growing process. The advantages of CSC are manifold. The foreground detection process works on gray levels rather than on individual pixels, so it proves to be very efficient. Moreover, the combination of distance transform and
curvature analysis makes the counting process very robust to clustered cells. A further strength of the CSC method is the limited number of parameters that must be tuned. Indeed, two different versions of the method have been considered, CSC7 and CSC-3, depending on the number of parameters to be tuned. The CSC method has been tested on several publicly available image datasets of real and synthetic images. Results in terms of standard metrics and spatially-aware measures show that CSC outperforms the current state of the art techniques. Existing system: In this work, we propose a new unsupervised method for cell segmentation and counting, namely CSC, in high-throughput microscopy images. The foreground detection process works locally by dividing the image into square overlapping patches. Each patch is quantized by means of gray level clustering and binarized by applying an adaptive thresholding so as to extract The foreground pixels. The cell detection process elaborates the foreground pixels by implementing two subsequent steps, namely the detection of isolated cells and the partitioning of cell clusters. We would highlight the fact that the detected foreground corresponds to the final result of many existing segmentation methods in literature and can be considered sufficient for a comparison with such methods according to standard metrics. Our partitioning of the image into single cells is obtained at the end of the counting process, i.e. after the center of each cell has been detected and all the pixels of the cells have been labeled by means of a region growing. This allows a comparison with a class of approaches, which also provide a labeling of single cells, by adopting metrics. Proposed system: The approaches in the first class are based on thresholding and the combinations of some morphological operations. The method in exploits morphological operators to initialize a level set function that is iteratively evolved until the convergence to the contour of the cells is reached. Similarly, the method in implements an iterative approach based on morphological operations. Indeed, the authors have proposed an iterative erosion method based on information about gray level and gradient intensity. Though this method works for different types of cell images, it suffers from a high number of false seeds because noise blobs might be detected as real
cells. The approach in tries to address this problem by implementing a multithresholding operation controlled by means of a rule-based verification procedure, so that each segmented blob is considered as a cell on the basis of information like area size, shape and position. This approach was tested only on four common antinuclear antibody (ANA) patterns, namely homogeneous, speckled, nucleolar and centromere, while difficult staining patterns like golgi, nuclear membrane and mitotic spindle were not considered. In general, methods based on global thresholding followed by morphological operations encounter problems with cells characterized by irregular patterns. Advantages: The advantages of this approach are manifold. First of all, it is unsupervised and can be applied on different image sets and on different types of cells. The computational cost for the foreground detection process is mainly related to the number of gray tones which is fixed, rather than to the number of pixels, which increases proportionally with the resolution of the image. Moreover, the counting process not only provides a computation of the number of cells in the image, but it also generates a partitioning of the image into cells. Disadvantages: The approach in tries to address this problem by implementing a multithresholding operation controlled by means of a rule-based verification procedure, so that each segmented blob is considered as a cell on the basis of information like area size, shape and position. This approach was tested only on four common antinuclear antibody (ANA) patterns, namely homogeneous, speckled, nucleolar and centromere, while difficult staining patterns like golgi, nuclear membrane and mitotic spindle were not considered. By morphological operations encounter problems with cells characterized by irregular patterns. Thus, new approaches like those proposed in try to overcome this limitation by differentiating the segmentation process into two or more branches according to the appearance of the regions. Modules:
Foreground detection: The foreground detection pipeline the foreground detection process is conceived as a quantization of the gray levels by means of a clustering process, so that a foreground/background separation is made possible through a thresholding operation. Since microscopy images generally show very different characteristics in terms of resolution, contrast, noise level and staining pattern, an image preprocessing could be necessary. However, the image correction only mitigates the local distortions, which even so hinder the application of a global gray level clustering. For this reason, the clustering is made local by dividing the whole image into overlapping patches. The patches are then processed independently and the resulting clusters are assigned either to the foreground or to the background. Since some pixels of the image can belong to different patches and since they can be marked as either foreground or background, a merging criterion is implemented to recompose a global binary mask that represents the output of this segmentation process. A graphical scheme of the whole foreground detection pipeline is shown in Figure 1, while the single steps are discussed in the following. Partitioning of grouped cells in clusters: Connected components of the foreground that have not been assigned a label during the first stage are considered as clusters of cells. These connected components include more than one seed, but not necessarily one for each cell in the cluster. Missed seeds can be detected by exploiting the information provided by the boundary of the connected component. The contour is partitioned into small segments, which are processed independently by an ellipse fitting algorithm so that the center of the approximating ellipse may be considered as a candidate for inclusion in S as a new seed. Since the boundary of a cell is represented by a smooth curve, while touchdown regions are characterized by sharp bends, the partitioning of the contour of a connected component is based on curvature analysis. However, computing the curvature point-wise is time consuming, especially for high resolution images. In order for the curvature analysis to be efficient, but still effective, an estimate of the curvature is computed, rather than its exact value.
Materials: The datasets used for the evaluation of the CSC method are the Indirect Immunofluore scence dataset (IIF-dataset) and the Broad Bioimage Benchmark Collection (BBBC) . The IIF-dataset is publicly available and consists of real images that have been adopted as a benchmark for two international challenges of HEP-2 cell segmentation and classification. We would highlight that only the training set is publicly available. The IIF-dataset includes 1008 specimen images, which have been acquired by means of a fluorescence microscope (40-fold magnification) coupled with a 50W mercury vapor lamp and a digital camera. The image resolution is of 1388 1038 pixels at 24 bits. As stated in [25], in the images produced by the ANA test, the segmentation map obtained using DAPI (i.e. a fluorescent stain that binds strongly to A-T rich regions in DNA) can be considered as the gold standard. However, with the use of DAPI the laboratory work-flow becomes more complicated, so the goal is to analyze the ANA test images to obtain a binary segmentation of the cells that is as accurate as that obtained with DAPI. The IIF-dataset provides both fluorescence images and the corresponding binary masks obtained from DAPI. In light of the above considerations, in this work we used such masks as ground truth, even though they were neither produced nor reviewed by experts of the field. Parameter setting: The foreground detection and cell counting processes are driven by a small number of parameters, whose purpose is detailed in this section. In Table II, we report the selected values for the parameters to adopt for different image datasets. The image pre-processing is an optional step regulated by a Boolean parameter that is set to true, when image correction is required and false, otherwise. The foreground detection is based on a sliding window, whose behavior is determined by two parameters, namely the window size n and the sliding step. The patch size n strongly depends on the homogeneity of the input image. Indeed, the more homogeneous are the objects in the image, the larger is the value that can be assigned to n. The parameter determines the degree of overlap of the different patches. An appropriate value for can be selected according to the same heuristics adopted for n.