Enhanced camera-based input, in which a detection region surrounding a user is defined in an image of the user within a scene, and a position of an object (such as a hand) within the detection region is detected. Additionally, a control (such as a key of a virtual keyboard) in a user interface is interacted with based on the detected position of the object.