Deepfakes are flooding the internet network with fake synthetic media that cannot
be easily distinguished by the human eye. This content can be classified into
face synthesis, face swap and facial attribute manipulation. Encoder-decoder and
generative networks are the leading architectures in deepfake creation due to the
fact that they produce very realistic and high quality content. Flow-based models
have been emerging in these recent years due to the several properties that make
them attractive ...
Deepfakes are flooding the internet network with fake synthetic media that cannot
be easily distinguished by the human eye. This content can be classified into
face synthesis, face swap and facial attribute manipulation. Encoder-decoder and
generative networks are the leading architectures in deepfake creation due to the
fact that they produce very realistic and high quality content. Flow-based models
have been emerging in these recent years due to the several properties that make
them attractive for the scientific community. Nevertheless, only the techniques of
face synthesis and attribute manipulation have been explored with this architecture.
Here we present a first approach for face swap with flow-based models under the
assumption of transferring facial expressions between identities with vector arithmetic.
Different arithmetic expressions have been used to generate deepfake content
that has been later evaluated in terms of the likeliness to the target identity, the
quality of the expression transfer and the probability of the image of being fake.
The arithmetic expression that produces better deepfake content according to our
metric is the linear combination of the expression vector, obtained as the mid point
of the difference between the original expression of the source (with the expression
that wants to be transferred) and its mean face regulated by a factor α, and the
mean face of the target identity. The results show that there are no clear guidelines
on the best α value outputting a better expression transfer since this value
depends not only on the expression that wants to be transferred, but also on the
target identity. Regarding the quality of the synthesized expression, the obtained
mean error in terms of the intensity of the Action Units characterizing the selected
expressions is of 0:3 in the 0-5 intensity scale. This work supposes a first insight into
face swapping via expression transferring in flow-based models providing an initial
pipeline for both generation and evaluation of such type of deepfake content.
+