Golden has been acquired by ComplyAdvantage.Read about it here ⟶

DeepFloyd

DeepFloyd is an AI research lab in Stability AI developing a text-to-image generator model.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 2 May, 2023

Arthur Smalley

edited on 2 May, 2023

Edits made to:

Timeline (+1 events) (+253 characters)

Article (+1 images) (+5031/-197 characters)

Article

Overview

DeepFloyd is ana multimodal AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. TheStability modelAI isreleased ina earlynon-commercial accessresearch and has been praised for its ability to generate realistic and well-written text. The lead researcherpreview of DeepFloyd is Misha Konstantinov. The model is expected toIF beon releasedApril in28, 2023, providing research labs the opportunity to examine and beexperiment openwith sourcethe text-to-image model. Stability AI plans to release IF as a fully open-source model in the future.

...

IF is a modular cascaded, pixel diffusion model, which means.

Modular—the model consists of several neural networks that solve independent tasks such as generating images from prompts or upscaling.
Cascaded—IF models high-resolution data in a cascading manner using a series of individually trained models at different resolutions. The process begins with a base model that produces unique low-res samples that are upscaled by successive models known as amplifiers.
Diffusion—the base and super-resolution models are diffusion models where a Markov chain of steps is used to inject random noise into data until the process is reversed to generate new samples.
Pixel—this diffusion is implemented on a pixel level, unlike latent diffusion models (such as Stable Diffusion) that utilize latent representations.

Images are generated using a three-stage process passing the text prompt through the frozen T5-XXL language model to convert it to a qualitative text representation.

The base diffusion model transforms natural language text into a 64x64 image. DeepFloyd has trained three versions of the base model, each with different parameters: IF-I 400M, IF-I 900M, and IF-I 4.3B.
To ‘amplify’ the image, two text-conditional super-resolution models (Efficient U-Net) are applied to the output of the base model. The first of these upscales the 64x64 image to a 256x256 image. Again, several versions of this model are available: IF-II 400M and IF-II 1.2B.
The second super-resolution diffusion model is applied to produce a vivid 1024x1024 image. The final third stage model IF-III has 700M parameters.

Diagram showing the image generation process of DeepFloyd IF and the various models it uses.

Features

DeepFloyd IF features include:

Deep text prompt understanding

IF's generation pipeline utilizes the large language model T5-XXL-1.1 as a text encoder. A significant amount of text-image cross-attention layers also provides better prompt and image alliance.

Text descriptions in images

Incorporating the T5 model, IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations.

Photorealism

IF achieves an impressive zero-shot FID score of 6.66 on the COCO dataset, FID is a metric used to evaluate the performance of text-to-image models.

Aspect ratio shifts

IF can generate images with a non-standard aspect ratio, vertical or horizontal, as well as the standard square aspect.

Zero-shot image-to-image translations

Image modification is possible by resizing the original image to 64 pixels, adding noise through forward diffusion, and using backward diffusion with a new prompt to denoise the image. The style can be changed further through super-resolution modules via a prompt text description.

Training

DeepFloyd IF was trained on a custom high-quality LAION-A dataset, containing 1B image-text pairs. LAION-A is an aesthetic subset of the English part of the LAION-5B dataset. It was obtained after deduplication based on similarity hashing, extra cleaning, and other modifications to the original dataset. The DeepFloyd team’s custom filters were used to remove watermarked, NSFW, and other inappropriate content.

Limitations and bias

DeepFloyd IF does not achieve perfect photorealism and was trained primarily with English captions, limiting its ability to return accurate images in other languages. While filters were applied, the LAION dataset used to train the model does contain contains adult, violent, and sexual content. IF may also reinforce or exacerbate social Biases. Again due to training based on English descriptions, texts and images from other languages are likely to be insufficiently accounted for.

License

Upon release, DeepFloyd IF was released under a research license with plans to move to a permissive license release. Any attempt to deploy the model in production requires not only that the license is followed but full liability over the person deploying the model. Stability AI believes research on DeepFloyd IF can lead to the development of novel applications in various domains including art, design, storytelling, virtual reality, accessibility, and more. Possible areas and tasks include:

Generation of artistic imagery and use in design
Safe deployment of models which have the potential to generate harmful content
Probing and understanding the limitations and biases of generative models
Applications in educational or creative tools
Research on generative models

Excluded uses of IF include:

Out-of-scope use—the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
Misuse and malicious use—using the model to generate content that is cruel to individuals is a misuse of this model.

Timeline

April 28, 2023

Stability AI releases a non-commercial research preview of DeepFloyd IF.

The release offers research labs the opportunity to examine and experiment with the text-to-image model. Stability AI plans to release IF as a fully open-source model in the future.

Edits on 18 Apr, 2023

Amy Tomlinson Gayle

edited on 18 Apr, 2023

Edits made to:

Article (+9/-9 characters)

Article

DeepFloyd is an AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. The model is in early access and has been praised for its ability to generate realistic and well-written text. The lead researcher of DeepFloysDeepFloyd is Misha Konstantinov. The model is expected to be released in 2023 and be open source.

Arthur Smalley

edited on 18 Apr, 2023

Edits made to:

Infobox (+11 properties)

Description (+91 characters)

Article (+1 images) (+540 characters)

Further Resources (+1 rows) (+4 cells) (+139 characters)

DeepFloyd

DeepFloyd is an AI research lab in Stability AI developing a text-to-image generator model.

Article

DeepFloyd is an AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. The model is in early access and has been praised for its ability to generate realistic and well-written text. The lead researcher of DeepFloys is Misha Konstantinov. The model is expected to be released in 2023 and be open source.