Dubbing is a type of audiovisual translation where dialogues are
translated and enacted so that they give the impression that the
media is in the target language. It requires a careful alignment
of dubbed recordings with the lip movements of performers in
order to achieve visual coherence. In this paper, we deal with
the specific problem of prosodic phrase synchronization within
the framework of machine dubbing. Our methodology exploits
the attention mechanism output in neural machine translation
to ...
Dubbing is a type of audiovisual translation where dialogues are
translated and enacted so that they give the impression that the
media is in the target language. It requires a careful alignment
of dubbed recordings with the lip movements of performers in
order to achieve visual coherence. In this paper, we deal with
the specific problem of prosodic phrase synchronization within
the framework of machine dubbing. Our methodology exploits
the attention mechanism output in neural machine translation
to find plausible phrasing for the translated dialogue lines and
then uses them to condition their synthesis. Our initial work in
this field records comparable speech rate ratio to professional
dubbing translation, and improvement in terms of lip-syncing
of long dialogue lines.
+