A method for creating an augmented reality scene, the method comprising, by a computing device with a processor and a memory, receiving a first video image data and a second video image data; calculating an error value for a current pose between the two images by comparing the pixel colors in the first video image data and the second video image data; warping pixel coordinates into a second video image data through the use of the map of depth hypotheses for each pixel; varying the pose between the first video image data and the second video image data to find a warp that corresponds to a minimum error value; calculating, using the estimated poses, a new depth measurement for each pixel that is visible in both the first video image data and the second video image data.