Technology attributes
Other attributes
Conditional generative adversarial networks (cGANs) are a deep learning method where a conditional setting is applied, meaning that both the generator and discriminator are conditioned on some sort of auxiliary information, such as class labels or data from other modalities. As a result, the ideal model can learn multi-modal mapping from inputs to outputs by feeding it with different contextual information.
cGANs are an extension of the generative adversarial network (GAN), a framework for training generative models to produce data with similar characteristics as input training data. GANs can generate novel image, video, or audio data from a random input, using two competing networks:
- Generator—When provided a vector of random values as an input, the generator produces data with the same structure as its training data.
- Discriminator—Provided with batches of data containing observations from both training and generated data (from the generator) to attempt to distinguish the observations as "real" or "generated."
A cGAN also takes advantage of auxiliary information, such as labels, during the training process:
- Generator—Given a label and random array as input, outputing data with the same structure as the training data observations corresponding to the same label.
- Discriminator—Given batches of labeled data containing observations from both the training and generated data (from the generator), to attempt to distinguish the observations as "real" or "generated."
Using additional information means cGANs offer faster convergence and greater control over the output of the generator.
cGANs were introduced in a 2014 paper titled Conditional Generative Adversarial Nets by Mehdi Mirza and Simon Osindero. In a uni-modal experiment, Mirza and Osindero trained a cGAN on 784-dimensional MNIST images conditioned on their class labels. This generated results that were comparable with some other methods but were outperformed by non-conditional GANs. Another experiment demonstrated automated image tagging using cGANs to generate (possibly multi-modal) distributions of tag-vectors conditional on image features. This showed promise and attracted further exploration of possible uses for cGANs.
A wide-ranging suite of possible applications for cGANs includes the following:
- Image-to-image translation —cGANs were demonstrated to be effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. This led to the development of cGAN-based software, pix2pixHD.
- Text-to-image synthesis—An experimental TensorFlow implementation of synthesizing images builds on top of the implementation of TensorFlow/TensorLayer dcGANs (deep convolutional Generative Adversial Networks).
- Video generation—A deep neural network can be used to predict future frames in a natural video sequence.
- Convolutional face generation—cGANs can be used to generate faces with specific attributes from nothing but random noise.
- Generating shadow maps—These introduced an additional sensitivity parameter to the generator that effectively parameterized the loss of the trained detector, proving more efficient than previous state-of-the-art methods.
- Diversity-sensitive computer vision tasks—This explicitly regularizes the generator to produce diverse outputs depending on latent codes, with demonstrated effectiveness on three cGAN tasks: image-to-image translation, image inpainting, and video prediction/generation.