Performing object localization inside a cabin of a vehicle is provided. A camera image is received of a cabin of a vehicle by an image sensor at a first location with respect to a plurality of seating zones of a vehicle. Object detection on the camera image is performed to identify one or more objects in the camera image. A machine-learning model trained on images taken at the first location is utilized to place the one or more objects into the seating zones of the vehicle according to a plurality of bounding boxes corresponding to the plurality of seating zones for the first location.