Accurate, efficient, and noise-robust method to compute a relative depth map from an image or video sequence captured by a single camera (monocular).



In the context of computer vision, the monocular depth estimation problem can be defined as inferring the depth order of the objects present in a scene using only information from a single camera (an image or a video sequence). In general situations, the extracted depth relations are relative, meaning that it can only be concluded that an object is closer or farther (or at the same depth) to the camera than another object, but no information about its absolute depth position can be extracted.

The fundamental nature of the problem is due to the impact that relative depth information (the position compared to other objects in the scene) would have in high- level or semantic image processing and computer vision systems or applications, including the generation of 3D content from 2D sources.



Practical and efficient solution for the computation of relative depth information in still images and video sequences. Multimedia material available at:

The technology allows to extract depth information at low-level, so that no knowledge or understanding on the image content is required. It is based on a mathematical model that encodes, in a quantitative manner, perceptual depth cues at different scales such as convexity/concavity, inclusion, and T-junctions, leading to an interpretation that is consistent with the perception of the human visual system. The model can be easily interpreted and tuned according to a specified visual response.



  • Accurate, robust to noise, and temporally consistent dense maps of relative depth, while not compromising the performance of the whole system.
  • Efficient (pyramidal) implementation, easy video extension, simple configuration, and fast performance.
  • Significantly outperforming state-of-the-art approaches in accuracy and efficiency.




MATLAB implementation for still image and video sequences.



High level applications in media, entertainment, security, telecommunications: object detection and recognition, conversion of 2D video content to 3D, multi-camera view generation or interpolation, video editing, advertisement insertion in video content, or seamless visual effects. Computational or time constraint (real time) applications.



Technology available for licensing and codevelopment.



Marc Santandreu
Technology Transfer Unit 
(+34) 93 542 2896
[email protected]




Monocular depth, depth order, occlusion detection, single camera,



Ref: TEC-0071


Fact Sheet