This AI research from Stability AI and Tripo AI introduces the TripoSR model for fast generation 3D feed from a single image

This AI research from Stability AI and Tripo AI introduces the TripoSR model for fast generation 3D feed from a single image

Written By Adarsh Shankar Jha

In the field of 3D artificial intelligence, the lines between 3D creation and 3D reconstruction from a small number of views are beginning to blur. This convergence is driven by a number of breakthroughs, including the emergence of large-scale public 3D datasets and advances in production model topologies

There has been new research on using 2D diffusion models to generate 3D objects from input photos or text messages to overcome the lack of 3D training data. One example is DreamFusion, which pioneered score distillation (SDS) sampling by optimizing 3D models using a 2D diffusion model. For creating detailed 3D objects, this method is a game changer as it uses 2D priors for 3D rendering. However, due to the high computational and optimization requirements and the difficulty in accurately managing the output models, these methods usually face limits with low production speed. Feedforward 3D reconstruction models are much more efficient in terms of computing power. Several newer methods in this vein have demonstrated the potential of scalable training on various 3D datasets. These new methods greatly improve the efficiency and practicality of 3D models, allowing rapid inference and perhaps giving better control over the results produced.

A new study from Stability AI and Tripo AI presents the TripoSR model, which can generate 3D forward models from an image in less than half a second using an A100 GPU. The team provides various improvements to data curation and performance, model design, and training methodologies, while also expanding on the LRM architecture. For 3D reconstruction from a single image, TripoSR uses the transformer architecture, as does LRM. Takes an object in an RGB photo and produces a 3D model.

The TripoSR model includes three main parts:

  • An image encoder
  • A triplane-based neural radiation field (NeRF).
  • A three-plane image decoder

The image encoder is initialized using a pretrained vision transformer model called DINOv1. This model plays a critical role in the TripoSR model. It converts an RGB image into a series of latent vectors, which encode the global and local image properties necessary to reconstruct the 3D object.

The proposed approach avoids explicit parameterization to create a more robust and flexible model that can handle various real-world conditions without relying on precise camera data. Important design factors include the number of transformer layers, triplane size, NeRF model details, and the main training settings.

Two enhancements to training data collection have been implemented in response to the paramount importance of the data:

  • Data curation: Data curation, which involved selecting a subset of the Objaverse dataset distributed under the CC-BY license, improved the quality of the training data.
  • Data performance: They have implemented various data performance strategies to improve the generalization of the model, even when trained exclusively on the Objaverse dataset. These techniques better mimic real-world photo distribution.

Experiments have shown that the TripoSR model numerically and qualitatively outperforms competing open source solutions. This, together with the availability of the pre-trained model, an online interactive demo, and the source code under the MIT license, represents significant progress in the fields of artificial intelligence (AI), computer vision (CV), and computer graphics (CG). ). The team expects a transformative impact in these fields by equipping researchers, developers and artists with these cutting-edge tools for 3D artificial intelligence.


check it Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us Twitter. Join us Telegram channel, Discord Channeland LinkedIn Groops.

If you like our work, you will love our work newsletter..

Don’t forget to join us 38k+ ML SubReddits

Want to get in front of 1.5 million AI enthusiasts? Work with us here


20221028 101632 Dhanshree Shenwai

Dhanshree Shenwai is a Computer Science Engineer with good experience in FinTech companies covering Finance, Cards & Payments and Banking with strong interest in AI applications. He is enthusiastic about exploring new technologies and developments in today’s evolving world that make everyone’s life easy.


You May Also Like

0 Comments