Comunicació presentada a: 9th International Conference on Spoken Language Processing; 17-21 de setembre de 2006 a Pittsburgh, Estats Units d'Amèrica
Multimodal person recognition systems normally use shortterm spectral features as voice information. In this paper prosodic information is added to a system based on face and voice spectrum features. By using two fusion techniques, support vector machines and matcher weighting, different fusion strategies based on the fusion of monomodal scores in several steps are proposed. The performance of the system is clearly improved when the prosodic information is added and the best results are achieved ...
Multimodal person recognition systems normally use shortterm spectral features as voice information. In this paper prosodic information is added to a system based on face and voice spectrum features. By using two fusion techniques, support vector machines and matcher weighting, different fusion strategies based on the fusion of monomodal scores in several steps are proposed. The performance of the system is clearly improved when the prosodic information is added and the best results are achieved when prosodic scores are previously fused and the resulting scores are fused again with spectral and facial scores. Speech and face scores have been obtained upon Switchboard-I and XM2VTS databases
respectively.
+