Composite recurrent network with internal denoising for facial alignment in still and video images in the wild

Aspandi, Decky; Martinez, Oriol; Sukno, Federico Mateo; Binefa i Valls, Xavier

Composite recurrent network with internal denoising for facial alignment in still and video images in the wild

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Aspandi, Decky
dc.contributor.author Martinez, Oriol
dc.contributor.author Sukno, Federico Mateo
dc.contributor.author Binefa i Valls, Xavier
dc.date.accessioned 2021-05-21T08:22:14Z
dc.date.available 2021-05-21T08:22:14Z
dc.date.issued 2021
dc.description.abstract Facial alignment is an essential task for many higher level facial analysis applications, such as animation, human activity recognition and human - computer interaction. Although the recent availability of big datasets and powerful deep-learning approaches have enabled major improvements on the state of the art accuracy, the performance of current approaches can severely deteriorate when dealing with images in highly unconstrained conditions, which limits the real-life applicability of such models. In this paper, we propose a composite recurrent tracker with internal denoising that jointly address both single image facial alignment and deformable facial tracking in the wild. Specifically, we incorporate multilayer LSTMs to model temporal dependencies with variable length and introduce an internal denoiser which selectively enhances the input images to improve the robustness of our overall model. We achieve this by combining 4 different sub-networks that specialize in each of the key tasks that are required, namely face detection, bounding-box tracking, facial region validation and facial alignment with internal denoising. These blocks are endowed with novel algorithms resulting in a facial tracker that is both accurate, robust to in-the-wild settings and resilient against drifting. We demonstrate this by testing our model on 300-W and Menpo datasets for single image facial alignment, and 300-VW dataset for deformable facial tracking. Comparison against 20 other state of the art methods demonstrates the excellent performance of the proposed approach.
dc.description.sponsorship This work is partly supported by the Spanish Ministry of Economy and Competitiveness under project grant TIN2017-90124-P, the Ramon y Cajal Programme, the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502) and the donation bahi2018-19 to the CMTech at the UPF.
dc.format.mimetype application/pdf
dc.identifier.citation Aspandi D, Martinez O, Sukno F, Binefa X. Composite recurrent network with internal denoising for facial alignment in still and video images in the wild. Image Vis Comput. 2021;111:104189. DOI: 10.1016/j.imavis.2021.104189
dc.identifier.doi http://dx.doi.org/10.1016/j.imavis.2021.104189
dc.identifier.issn 0262-8856
dc.identifier.uri http://hdl.handle.net/10230/47639
dc.language.iso eng
dc.publisher Elsevier
dc.relation.ispartof Image and Vision Computing. 2021;111:104189
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/TIN2017-90124-P
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject.keyword Facial alignment
dc.subject.keyword Facial tracking
dc.subject.keyword Temporal modeling
dc.subject.keyword Internal denoising
dc.title Composite recurrent network with internal denoising for facial alignment in still and video images in the wild
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament de Tecnologies de la Informació i les Comunicacions)