Patent attributes
The present disclosure includes methods and systems for generating captions for digital images. In particular, the disclosed systems and methods can train an image encoder neural network and a sentence decoder neural network to generate a caption from an input digital image. For instance, in one or more embodiments, the disclosed systems and methods train an image encoder neural network (e.g., a character-level convolutional neural network) utilizing a semantic similarity constraint, training images, and training captions. Moreover, the disclosed systems and methods can train a sentence decoder neural network (e.g., a character-level recurrent neural network) utilizing training sentences and an adversarial classifier.