Patent attributes
Generating images and videos depicting a human subject wearing textually defined attire is described. An image generation system receives a two-dimensional reference image depicting a person and a textual description describing target clothing in which the person is to be depicted as wearing. To maintain a personal identity of the person, the image generation system implements a generative model, trained using both discriminator loss and perceptual quality loss, which is configured to generate images from text. In some implementations, the image generation system is configured to train the generative model to output visually realistic images depicting the human subject in the target clothing. The image generation system is further configured to apply the trained generative model to process individual frames of a reference video depicting a person and output frames depicting the person wearing textually described target clothing.