MIT and NVIDIA launched faster AI image generation model

hackster.io

Researchers from MIT and NVIDIA have created a new AI architecture called Hybrid Autoregressive Transformer (HART). This model combines two popular types of AI, autoregressive models and diffusion models, to generate high-quality images more quickly. Autoregressive models generate images by predicting pixels in sequence. They are fast but often lack fine detail. In contrast, diffusion models create more detailed images through a slow process of refining them, typically requiring many iterations. HART takes advantage of the strengths of both types of models. It uses an autoregressive model to quickly create the basic structure of an image and then a smaller diffusion model to add detailed refinements. The result is that HART can produce images nearly nine times faster than traditional diffusion models without sacrificing quality. It reduces the number of steps needed for detail refinement from over 30 to about eight. This efficiency allows HART to run on standard laptops and even some smartphones. In addition to speed, HART has 31% lower computational requirements compared to current diffusion models, while still achieving comparable or better image quality. This makes it suitable for integrating with other AI systems that work with both text and images. The researchers believe HART could also be useful in other areas. For example, it might help train AI-powered robots to understand visual data more effectively. Video game designers could use HART to create intricate environments and characters much faster. In future developments, the team hopes to adapt HART for use with video and audio, expanding its capabilities in creating multimedia experiences in real time.


With a significance score of 4.2, this news ranks in the top 6.2% of today's 23372 analyzed articles.

Get summaries of news with significance over 5.5 (usually ~10 stories per week). Read by 9500 minimalists.