This master’s thesis explores recent advancements in machine learning, particularly those enabling the generation of 3D graphics assets from textual descriptions or input images. Central to this research is the evaluation and enhancement of the Dream-
Gaussian pipeline, a method leveraging Gaussian Splatting and diffusion models to create 3D scenes from minimal input data. The research identifies the limitations of the DreamGaussian approach and proposes refinements, including the introduction of ...
This master’s thesis explores recent advancements in machine learning, particularly those enabling the generation of 3D graphics assets from textual descriptions or input images. Central to this research is the evaluation and enhancement of the Dream-
Gaussian pipeline, a method leveraging Gaussian Splatting and diffusion models to create 3D scenes from minimal input data. The research identifies the limitations of the DreamGaussian approach and proposes refinements, including the introduction of an additional depth prior to improve model convergence.
The thesis provides a comprehensive overview of state-of-the-art techniques in 3D generation, discussing key methods such as Gaussian Splatting, Score Distillation Sampling (SDS), and diffusion models. It addresses the specific issues encountered when adapting 2D image and text generation techniques for 3D applications. A significant focus is placed on the "depth loss" approach, detailing the process of
inferring and refining depth maps to enhance 3D asset quality.
Experimental results validate the proposed improvements, highlighting the impact of the depth prior on the generation process. Additionally, a "Text-to-Image-to-3D" pipeline is introduced, showcasing an alternative method for generating 3D assets from textual input by first inferring a reference view image.
The thesis concludes with a discussion of the achievements and limitations of the current work, offering insights into potential future research directions.
+