A method and system may be used to provide a user interface allowing a user to identify electronic devices by placing bounding boxes on a video of scene. The bounding boxes may identify one or more electronic devices. Coordinates of the bounding boxes may be stored to allow determining when a user is gesturing towards one of the bounding boxes. The coordinates of the bounding boxes may be updated by a computer vision process that determines whether one or more electronic devices have been moved. When it is detected that a user is gesturing towards a bounding box, an electronic device associated with the bounding box may be controlled.