A method for real-time visual odometry comprises capturing a first three-dimensional image of a location at a first time, capturing a second three-dimensional image of the location at a second time that is later than the first time, and extracting one or more features and their descriptors from each of the first and second three-dimensional images. One or more features from the first three-dimensional image are then matched with one or more features from the second three-dimensional image. The method further comprises determining changes in rotation and translation between the first and second three-dimensional images from the first time to the second time using a random sample consensus (RANSAC) process and a unique iterative refinement technique.