One embodiment provides a method, involving: receiving, at a device, voice input comprising at least one command; identifying, using an image of the user, a direction of user focus; and responsive to the identifying that the direction of user focus is directed toward the device, performing an action based on the at least one command. Other aspects are described and claimed.