A device is provided in which a video image controller acquires information for a plurality of video images from a video image source, a cursor position controller calculates cursor position information and generates cursor image information, a display image generator synthesizes the plurality of video images and cursor image information and displays the same on a display device, a distance information generator generates distance information based on the video image position information and the cursor position information, and an audio output controller decides volume of audio for the plurality of video images based on this distance information, and outputs to an audio output device.