Patent attributes
Disclosed herein are methods, systems, and computer-readable media for generating an image corresponding to a text input. In an embodiment, operations may include accessing a text description and inputting the text description into a text encoder. The operations may include receiving, from the text encoder, a text embedding, and inputting at least one of the text description or the text embedding into a first sub-model configured to generate, based on at least one of the text description or the text embedding, a corresponding image embedding. The operations may include inputting at least one of the text description or the corresponding image embedding, generated by the first sub-model, into a second sub-model configured to generate, based on at least one of the text description or the corresponding image embedding, an output image. The operations may include making the output image, generated by the first second sub-model, accessible to a device.