Patent attributes
Methods and systems for resolving entities using multi-modal functionality are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Contextual metadata representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. When natural language understanding processing attempts to resolve one or more declared slots for a particular intent, matching slots from the contextual metadata may be determined, and the matching entities may be placed in an intent selected context file to be included with the natural language understanding's output data. The output data may be provided to a corresponding application for causing one or more actions to be performed.