Patent attributes
An actuator and end effector are controlled according to images from cameras having a surface in their field of view. Vessels (cups, bowls, etc.) and other objects are identified in the images and their configuration is assigned to a finite set of categories by a classifier that does not output a 3D bounding box or determine a 6D pose. For objects assigned to a first subset of categories, grasping parameters for controlling the actuator and end effector are determined using only 2D bounding boxes, such as oriented 2D bounding boxes. For objects not assigned to the first subset, a righting operation may be performed using only 2D bounding boxes. Objects that are still not in the first set may then be grasped by estimating a 3D bounding box and 6D pose.