NVIDIA unveils the latest developments in visual AI
NVIDIA showcases cutting-edge generative AI advancements at CVPR, ranging from custom image creation to self-driving capabilities and visual language understanding, indicating significant technological progress.
New visual generative AI models and techniques are being shown by NVIDIA researchers at the Computer Vision and Pattern Recognition (CVPR) conference in Seattle this week.
The improvements include making custom images, editing 3D scenes, understanding visual language, and helping self-driving cars see better.
Jan Kautz, VP of learning and perception research at NVIDIA, said, “Artificial intelligence, and generative AI in particular, is a major step forward in technology.”
At CVPR, NVIDIA Research is talking about how we’re pushing the limits of what’s possible. For instance, they are discussing powerful image generation models that could enhance the productivity of professional creators and autonomous driving software that could pave the way for the development of the next generation of self-driving cars.
Two of the over 50 NVIDIA research projects being shown have been chosen as finalists for CVPR’s Best Paper Awards.
One paper looks at how training dynamics affect diffusion models, and the other is about high-definition maps for self-driving cars.
NVIDIA also beat more than 450 other teams around the world to win the CVPR Autonomous Grand Challenge’s End-to-End Driving at Scale track.
This important achievement shows NVIDIA’s innovative work in using generative AI for complete models of self-driving cars, which earned them an Innovation Award from CVPR.
One of the most important research projects is JeDi, a new method that lets creators quickly change diffusion models which is the main method for turning text into images to show specific objects or characters using only a few reference images, instead of fine-tuning on custom datasets, which takes a lot of time.
Foundation Pose is a new foundation model that can instantly understand and track the 3D pose of objects in videos without having to be trained for each one individually.
It broke a record for speed and could lead to new uses for AR and robotics.
Innovations in NVIDIA’s visual AI research
Researchers at NVIDIA also created NeRF Deformer, a way to change the 3D scene captured by a Neural Radiance Field (NeRF) using a single 2D snapshot, instead of making changes and animating them by hand or making a new NeRF.
This could make it easier to edit 3D scenes for use in robotics, graphics, and digital twins.
NVIDIA and MIT worked together to create VILA, a new family of vision language models that understand images, videos, and text better than any other models on the market.
By combining visual and linguistic understanding, VILA can even understand internet memes because it has better reasoning skills.
NVIDIA’s visual AI research covers a wide range of fields. For example, over a dozen papers look at new ways for self-driving cars to perceive, map, and plan.
Sanja Fidler, the Vice President of NVIDIA’s AI Research team, will discuss the potential use of vision language models in self-driving cars.
NVIDIA’s CVPR research shows how generative AI could give creators more power, speed up automation in healthcare and manufacturing, and move autonomy and robotics forward.