The present disclosure describes techniques for implementing user interface interaction. The disclosed techniques comprise playing a video on an interface; monitoring user input performed on the interface; determining a target frame based at least in part on the user input; determining a location where the user input is performed on the interface; determining whether the location of the user input is in a predetermined area of the target frame, wherein the predetermined area is associated with at least one object in the target frame; and implementing an operation associated with the at least one object in response to determining that the location of the user input is in the predetermined area of the target frame.