Information representing a position and orientation of a captured image or information for deriving the position and orientation is acquired from the captured image as extraction information, and a reduced-information-amount image is generated by reducing an amount of information in the captured image in accordance with a distance from a position that a user is gazing at in the captured image. The reduced-information-amount image and the extraction information are outputted to an external device, and a composite image that has been generated based on the reduced-information-amount image and an image of a virtual space generated by the external device based on the extraction information is received.