Patent attributes
The described techniques relate to predicting object behavior based on top-down representations of an environment comprising top-down representations of image features in the environment. For example, a top-down representation may comprise a multi-channel image that includes semantic map information along with additional information for a target object and/or other objects in an environment. A top-down image feature representation may also be a multi-channel image that incorporates various tensors for different image features with channels of the multi-channel image, and may be generated directly from an input image. A prediction component can generate predictions of object behavior based at least in part on the top-down image feature representation, and in some cases, can generate predictions based on the top-down image feature representation together with the additional top-down representation.