Techniques are provided for estimating a three-dimensional pose of an object. An image including the object can be obtained, and a plurality of two-dimensional (2D) projections of a three-dimensional bounding (3D) box of the object in the image can be determined. The plurality of 2D projections of the 3D bounding box can be determined by applying a trained regressor to the image. The trained regressor is trained to predict two-dimensional projections of the 3D bounding box of the object in a plurality of poses, based on a plurality of training images. The three-dimensional pose of the object is estimated using the plurality of 2D projections of the 3D bounding box.