Patent attributes
A plurality of temporally successive vehicle sensor images are received as input to a variational autoencoder neural network that outputs an averaged semantic birds-eye view image that includes respective pixels determined by averaging semantic class values of corresponding pixels in respective images in the plurality of temporally successive vehicle sensor images. From a plurality of topological nodes that each specify respective real-world locations, a topological node closest to the vehicle, and a three degree-of-freedom pose for the vehicle relative to the topological node closest to the vehicle, is determined based on the averaged semantic birds-eye view image. A real-world three degree-of-freedom pose for the vehicle is determined by combining the three degree-of-freedom pose for the vehicle relative to the topological node and the real-world location of the topological node closest to the vehicle.