Patent attributes
One embodiment provides techniques for automatically pre-labeling point cloud data with cuboid annotations. Point cloud data is processed using ML models to detect, associate, and localize objects therein, in order to generate cuboid tracks that each include a series of cuboid annotations associated with an object. An object detection model that detects objects and performs coarse localization is trained using a loss function that separately evaluates the distances between corners of predicted cuboids and corners of ground truth cuboids for position, size, and yaw. A refinement model that performs more accurate localization takes as input 2D projections of regions surrounding cuboid tracks predicted by the object detection model and the cuboid tracks, and outputs refined cuboid tracks. The refined cuboid tracks are filtered to a set of keyframes, with in-between frames being interpolated. The cuboid tracks can then be presented to a user for viewing and editing.