Ronquillo, Yadira2022-10-252022-10-252022http://hdl.handle.net/10230/54585Tutors: Federico Sukno, Adriana Fernández LópezTreball de fi de grau en BiomèdicaLip reading, also known as visual speech recognition, is the task of decoding text from lip movement, which involves analysing the change in the speaker’s lip shape. It has a wide application in the fields of security, assisted driving systems, virtual reality, speech transcription for cases where audio is not available, and communication of people who are hearing-impaired. For the last case, it can also be an extremely helpful tool for these people to communicate through video calls or to understand what the other person is speaking. For such reasons, lip-reading has been the subject of a vast research effort over the last few decades. Currently, deep learning is being used to deal with this task. However, the training of the lip-reading model relies on a large amount of data. Therefore, lip reading has limited its applicability to English since this is the only language with large-scale datasets. In this work, we used a new audio-visual dataset in the Spanish language, which has been built from a subset of the RTVE database. This is the largest publicly available sentence-level lip reading dataset to date in the Spanish language and it consists of over 13 hours of video, extracted from Canal 24 horas. We used it to develop an Automatic Lip-Reading (ALR) system for continuous speech recognition in Spanish. For this purpose, we employed Audio-Visual Hidden Unit BERT (AV-HuBERT) model, based on transformer network. The system obtained can differentiate some short sentences. On the other hand, we observed transfer learning works better when the languages are similar, and that there is a relationship between the size of the dataset and the learning transfer method.application/pdfeng©Tots els drets reservatsContinuous lip reading in Spanishinfo:eu-repo/semantics/bachelorThesisDeep LearningAutomatic speech recognitionTransfer learningLip readinginfo:eu-repo/semantics/openAccess