Continuous lip reading in Spanish
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Ronquillo, Yadira
- dc.date.accessioned 2022-10-25T18:21:44Z
- dc.date.available 2022-10-25T18:21:44Z
- dc.date.issued 2022
- dc.description Tutors: Federico Sukno, Adriana Fernández López
- dc.description Treball de fi de grau en Biomèdica
- dc.description.abstract Lip reading, also known as visual speech recognition, is the task of decoding text from lip movement, which involves analysing the change in the speaker’s lip shape. It has a wide application in the fields of security, assisted driving systems, virtual reality, speech transcription for cases where audio is not available, and communication of people who are hearing-impaired. For the last case, it can also be an extremely helpful tool for these people to communicate through video calls or to understand what the other person is speaking. For such reasons, lip-reading has been the subject of a vast research effort over the last few decades. Currently, deep learning is being used to deal with this task. However, the training of the lip-reading model relies on a large amount of data. Therefore, lip reading has limited its applicability to English since this is the only language with large-scale datasets. In this work, we used a new audio-visual dataset in the Spanish language, which has been built from a subset of the RTVE database. This is the largest publicly available sentence-level lip reading dataset to date in the Spanish language and it consists of over 13 hours of video, extracted from Canal 24 horas. We used it to develop an Automatic Lip-Reading (ALR) system for continuous speech recognition in Spanish. For this purpose, we employed Audio-Visual Hidden Unit BERT (AV-HuBERT) model, based on transformer network. The system obtained can differentiate some short sentences. On the other hand, we observed transfer learning works better when the languages are similar, and that there is a relationship between the size of the dataset and the learning transfer method.ca
- dc.format.mimetype application/pdf*
- dc.identifier.uri http://hdl.handle.net/10230/54585
- dc.language.iso engca
- dc.rights ©Tots els drets reservatsca
- dc.rights.accessRights info:eu-repo/semantics/openAccessca
- dc.subject.keyword Deep Learning
- dc.subject.keyword Automatic speech recognition
- dc.subject.keyword Transfer learning
- dc.subject.keyword Lip reading
- dc.title Continuous lip reading in Spanishca
- dc.type info:eu-repo/semantics/bachelorThesisca