Disclosed herein is a robot including an output interface including at least one of a display or a speaker, and a processor configured to acquire output data of a predetermined playback time point of content output via the robot or an external device, recognize a first emotion corresponding to the acquired output data, and control the output interface to output an expression based on the recognized first emotion.