Stable Diffusion 3.0 Debuts New Architecture To Reinvent Text-To-Image Gen AI

Friday February 23, 2024. 02:00 PM , from Slashdot

An anonymous reader quotes a report from VentureBeat: Stability AI is out today with an early preview of its Stable Diffusion 3.0 next-generation flagship text-to-image generative AI model. The new Stable Diffusion 3.0 model aims to provide improved image quality and better performance in generating images from multi-subject prompts. It will also provide significantly better typography than prior Stable Diffusion models enabling more accurate and consistent spelling inside of generated images. Typography has been an area of weakness for Stable Diffusion in the past and one that rivals including DALL-E 3, Ideogram and Midjourney have also been working on with recent releases. Stability AI is building out Stable Diffusion 3.0 in multiple model sizes ranging from 800M to 8B parameters.

Stable Diffusion 3.0 isn't just a new version of a model that Stability AI has already released, it's actually based on a new architecture. 'Stable Diffusion 3 is a diffusion transformer, a new type of architecture similar to the one used in the recent OpenAI Sora model,' Emad Mostaque, CEO of Stability AI told VentureBeat. 'It is the real successor to the original Stable Diffusion.' Stable Diffusion 3.0 is taking a different approach by using diffusion transformers. 'Stable Diffusion did not have a transformer before,' Mostaque said.

Transformers are at the foundation of much of the gen AI revolution and are widely used as the basis of text generation models. Image generation has largely been in the realm of diffusion models. The research paper that details Diffusion Transformers (DiTs), explains that it is a new architecture for diffusion models that replaces the commonly used U-Net backbone with a transformer operating on latent image patches. The DiTs approach can use compute more efficiently and can outperform other forms of diffusion image generation. The other big innovation that Stable Diffusion benefits from is flow matching. The research paper on flow matching explains that it is a new method for training Continuous Normalizing Flows (CNFs) to model complex data distributions. According to the researchers, using Conditional Flow Matching (CFM) with optimal transport paths leads to faster training, more efficient sampling, and better performance compared to diffusion paths.

Read more of this story at Slashdot.