Emojis are very common in social media
and understanding their underlying semantics
is of great interest from a Natural Language
Processing point of view. In this work,
we investigate emoji prediction in short text
messages using a multi-task pipeline that simultaneously
predicts emojis, their categories
and sub-categories. The categories are either
manually predefined in the unicode standard
or automatically obtained by clustering over
word embeddings. We show that using this
categorical ...
Emojis are very common in social media
and understanding their underlying semantics
is of great interest from a Natural Language
Processing point of view. In this work,
we investigate emoji prediction in short text
messages using a multi-task pipeline that simultaneously
predicts emojis, their categories
and sub-categories. The categories are either
manually predefined in the unicode standard
or automatically obtained by clustering over
word embeddings. We show that using this
categorical information adds meaningful information,
thus improving the performance of
emoji prediction task. We systematically analyze
the performance of the emoji prediction
task by varying the number of training samples
and also do a qualitative analysis by using
attention weights from the prediction task.
+