Repositori Digital de la UPF

Guies

Enviaments recents

No hi ha miniatura disponible

Joint multi-view RGB optimization for clothed 3D avatar reconstruction

The creation of high fidelity 3D human avatars from images is a central challenge in computer vision, with wide ranging applications in virtual reality, gaming, and telepresence. However, state-of-the-art single-view reconstruction methods are inherently limited by self occlusion and viewpoint ambiguity, often resulting in geometrically inaccurate or incomplete models, especially for subjects with complex clothing. This thesis introduces ExECON, a novel pipeline for avatar reconstruction. Our method extends ECON, a state-of-the-art framework that uses a single front-view RGB image, by leveraging sparse multi-view RGB inputs for improved robustness and geometry accuracy. The cornerstone of ExECON is a proposed multi-view algorithm, named Joint Multi-view Body Optimization (JMBO), which optimizes a single, canonical SMPL-X body model across all available viewpoints simultaneously. This multi-view consistent body prior then serves as a more accurate foundation for a subsequent detailed surface reconstruction stage, which leverages real front and back views to improve both body pose and clothing geometry. The efficacy of JMBO has been demonstrated through experimental validation. The multi-stage approach is critical for achieving global consistency, which significantly improves key pose and geometric quality metrics. An end to end evaluation of the final reconstructed avatars reveals substantial quantitative enhancements. The over all geometric error is reduced by nearly 65% compared to the single-view baseline. Qualitative results show that the method successfully reconstructs challenging loose clothing geometries, such as hoodies, which are a common failure case for single view systems. Our work demonstrates that establishing a multi-view geometrically consistent body prior and using it to guide the surface reconstruction can resolve the critical ambiguities of single-view methods, producing more accurate and complete 3D human avatars.

(2025) Ece Ugur, Fulden