A depth image of an object can be input to a deep neural network to determine a first four degree-of-freedom pose of the object. The first four degree-of-freedom pose and a three-dimensional model of the object can be input to a silhouette rendering program to determine a first two-dimensional silhouette of the object. A second two-dimensional silhouette of the object can be determined based on thresholding the depth image. A loss function can be determined based on comparing the first two-dimensional silhouette of the object to the second two-dimensional silhouette of the object. Deep neural network parameters can be optimized based on the loss function and the deep neural network can be output.