A Maximum Likelihood Approach for Depth Field Estimation Based on Popular Plane Images
Abstract: In this work, a multi-resolution method for depth estimation from dense image arrays is presented. Recent progress in consumer electronics has enabled the development of low cost hand-held plenoptic cameras. In these systems, multiple views of a scene are captured in a single shot by means of a micro-lens array placed on the focal point of the first camera lens, in front of the imaging sensor. These views can be processed jointly to obtain accurate depth maps. In this contribution, to reduce the computational complexity associated to global optimization schemes based on match cost functions, we make a local estimate based on the maximization of the total log-likelihood spatial density aggregated along the epipolar lines corresponding to each view pair. This method includes the local Maximum Likelihood estimation of the depth field based on Epipolar Plane Images. To face the potential accuracy losses associated to the ambiguity problem that arises in flat surface regions, while preserving bandwidth in correspondence of the edges, we adopt a multi-resolution scheme. In practice, the depth map resolution is reduced in regions where maximizing the higher resolution functional is ill-conditioned. The main benefits of the proposed system are in a reduced computational complexity and a high accuracy of the estimated depth.
Experimental results show that the proposed scheme represents a good trade off among accuracy, robustness, and discontinuities handling. Existing system: The use of a ML local estimation of the depth field by exploiting a multi-view approach has been already investigated by the authors . This contribution extends the method proposed . More specifically, the mathematical framework of the ML local solution is reformulated in terms of EPI analysis, thus showing the existing relation between the theoretically optimal ML solution and EPI-based approaches. The advantage of this new conceptual model is twofold. On the one side it enables to compare EPI based algorithms with the ML solution, furthermore showing the procedure to exploit multiple epipolar directions in EPI methods. On the other side, it provides the possibility to extend the classical ML solution by adopting the techniques for handling occlusions designed for the EPI domain. This operation results in an increased reliability of the estimated depth map. In addition, compared to , an extensive validation in comparison with state-of the- art methods has been performed. Proposed system: In this contribution we analyze a local optimization scheme based on a matching cost function given by the log Likelihood spatial density aggregated along the epipolar line corresponding to each view pair. As detailed in the following, the proposed method includes as special case the Maximum Likelihood (ML) estimate of the depth field based on EPIs. In this case, given an EPI, for each pixel of the row corresponding to the reference view, the method computes the depth. In Section VI the experimental tests performed for evaluating the effectiveness of the proposed method are described and discussed. Finally, Section shows the conclusions. In the Appendix, the computation of the Fisher information exploited in the proposed method is detailed. In the following, the tests performed for assessing the validity of the proposed method are reported. In more details, first tests were carried out for tuning the parameters of the system and for performing a first assessment of the method. Advantages:
The usage of a weighted median filter is motivated by the fact that the variance of the estimation error changes with the magnitude of the image gradient, and therefore cannot be considered constant over the depth field. This approach allows or remove outliers while preserving the edges and the structures. In the classical median filter, each pixel has the same importance, thus resulting in potential artifacts such as the rounding of sharp corners or the erasing of thin structures of the depth map. In the weighted median filter adopted here, pixels falling inside the smoothing window are weighted in accordance to the variance of the estimate computed as the inverse of Fisher’s Information, as detailed in the Appendix. In , the disparity estimate at the output of the Median Filter is report. Disadvantages: As shown by the experimental results, extension to dense arrays of block matching techniques employed in the recovery of the disparity field from stereo pairs can benefit from the high redundancy associated to multiple views. The image database consists of 28 light field scenes. The scenes are designed in order to include issues that are particularly challenging for the depth estimation procedure: occlusion of the boundaries, presence of fine structures, low textures, smooth surfaces, and camera noise. In this way, it is possible to record multiple views of a scene in a single shot by using a single camera, thus avoiding problems related to calibration and camera synchronization. The micro-lens array records information on the incident light direction at different positions, that is the light field. It is useful to remind. In this contribution we address the problem of the estimation of the 3D model of the scene, conveyed by the depth map, thanks to the redundancy available in dense. image arrays Modules: Depth map: In order to estimate the depth of a scene, stereo imaging systems try to emulate the Human Visual System (HVS). However, the use of a short inter-ocular distance, or baseline, such as the one of the HVS, results in a low accuracy of the depth map. Thus, extensions of the traditional binocular system to more complex frameworks
based on multiple cameras have been investigated . In this context, novel imaging systems, have been introduced based on the concept of integral imaging introduced by G. In this contribution we address the problem of the estimation of the 3D model of the scene, conveyed by the depth map, thanks to the redundancy available in dense image arrays. The methods for depth estimation from plenoptic images can be mainly classified into two groups. The first one is based on the minimization of some matching cost functionals. The methods belonging to this group can be considered an extension of the ones originally developed for stereo pairs. As such, they mainly differ in the adopted matching cost function itself and in the way the matching cost is aggregated, which can be either globally or locally. A detailed taxonomy of stereo correspondence algorithms can be found . Due to their high computational complexity, global optimization methods usually exploit a reduced number of views. Multi resolution research: In flat uniform regions, the gradient of the image can be too small, so that a large ambiguity in the computation of the minimum of Eq. (32) may persist even when a large window is adopted. This fact has been extensively confirmed by the experimental activity. A mitigation of this effect can be obtained by applying the depth map estimator to the image array at lower resolution. Given a view L(p;q)(x), let us denote with L(p;q) #2k (x) its version at resolution 2 k, downsampled by a factor of 2k. Then, starting from the highest resolution, to determine if the functional _#2k (x; z) obtained by applying Eq. (32) to L(p;q) #2k (x) is prone to ambiguity, we count how many times the functional computed in correspondence with the quantized depths zm 2 ZQ falls into a small range above the minimum. Maximum like hood : The rest of the paper is organized as follows. In Section II the ML depth estimation method is described, while its simplified EPI version, based on the local estimation of the depth field maximizing the average log-Likelihood spatial density, is described in Section III. In Section IV, the logic adopted to handle occlusions is presented. In Section V the adaptive logics employed for the selection of the smoothing window employed in the local estimate and for the selection of the depth map estimates at different resolutions are presented. In Section VI the experimental tests performed for evaluating the effectiveness of the proposed
method are described and discussed. Finally, Section VII shows the conclusions. In the Appendix, the computation of the Fisher information exploited in the proposed method is detailed. Nevertheless, ambiguity in the selection of the maxima of the likelihood functional in flat, uniform areas still persists. This fact suggested the adaptive control of the spatial bandwidth as a mitigation of the effects induced by the lack of gradient energy.