Patent attributes
Systems and processes for operating a digital assistant are provided. An example process for performing a task includes, at an electronic device having one or more processors and memory, receiving a spoken input including a request, receiving an image input including a plurality of objects, selecting a reference resolution module of a plurality of reference resolution modules based on the request and the image input, determining, with the selected reference resolution module, whether the request references a first object of the plurality of objects based on at least the spoken input, and in accordance with a determination that the request references the first object of the plurality of objects, determining a response to the request including information about the first object.