DeepFloyd is an AI research lab in Stability AI developing a text-to-image generator model.
DeepFloyd is ana multimodal AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. TheStability modelAI isreleased ina earlynon-commercial accessresearch and has been praised for its ability to generate realistic and well-written text. The lead researcherpreview of DeepFloyd is Misha Konstantinov. The model is expected toIF beon releasedApril in28, 2023, providing research labs the opportunity to examine and beexperiment openwith sourcethe text-to-image model. Stability AI plans to release IF as a fully open-source model in the future.
IF is a modular cascaded, pixel diffusion model, which means.
Images are generated using a three-stage process passing the text prompt through the frozen T5-XXL language model to convert it to a qualitative text representation.
DeepFloyd IF features include:
IF's generation pipeline utilizes the large language model T5-XXL-1.1 as a text encoder. A significant amount of text-image cross-attention layers also provides better prompt and image alliance.
Incorporating the T5 model, IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations.
IF achieves an impressive zero-shot FID score of 6.66 on the COCO dataset, FID is a metric used to evaluate the performance of text-to-image models.
IF can generate images with a non-standard aspect ratio, vertical or horizontal, as well as the standard square aspect.
Image modification is possible by resizing the original image to 64 pixels, adding noise through forward diffusion, and using backward diffusion with a new prompt to denoise the image. The style can be changed further through super-resolution modules via a prompt text description.
DeepFloyd IF was trained on a custom high-quality LAION-A dataset, containing 1B image-text pairs. LAION-A is an aesthetic subset of the English part of the LAION-5B dataset. It was obtained after deduplication based on similarity hashing, extra cleaning, and other modifications to the original dataset. The DeepFloyd team’s custom filters were used to remove watermarked, NSFW, and other inappropriate content.
DeepFloyd IF does not achieve perfect photorealism and was trained primarily with English captions, limiting its ability to return accurate images in other languages. While filters were applied, the LAION dataset used to train the model does contain contains adult, violent, and sexual content. IF may also reinforce or exacerbate social Biases. Again due to training based on English descriptions, texts and images from other languages are likely to be insufficiently accounted for.
Upon release, DeepFloyd IF was released under a research license with plans to move to a permissive license release. Any attempt to deploy the model in production requires not only that the license is followed but full liability over the person deploying the model. Stability AI believes research on DeepFloyd IF can lead to the development of novel applications in various domains including art, design, storytelling, virtual reality, accessibility, and more. Possible areas and tasks include:
Excluded uses of IF include:
April 28, 2023
The release offers research labs the opportunity to examine and experiment with the text-to-image model. Stability AI plans to release IF as a fully open-source model in the future.
DeepFloyd is an AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. The model is in early access and has been praised for its ability to generate realistic and well-written text. The lead researcher of DeepFloysDeepFloyd is Misha Konstantinov. The model is expected to be released in 2023 and be open source.
DeepFloyd is an AI research lab in Stability AI developing a text-to-image generator model.
DeepFloyd is an AI research lab developing a text-to-image generator model called IF. The DeepFloyd team works within Stability AI. IF is designed to improve on other AI models with respect to generating text and captions in images based on the prompt provided. The model is in early access and has been praised for its ability to generate realistic and well-written text. The lead researcher of DeepFloys is Misha Konstantinov. The model is expected to be released in 2023 and be open source.
DeepFloyd is an AI research lab in Stability AI developing a text-to-image generator model.