Short-Term Meaning Shift: A Distributional Exploration

We present the first exploration of meaning shift over short periods of time in online communities using distributional representations. We create a small annotated dataset and use it to assess the performance of a standard model for meaning shift detection on short-term meaning shift. We find that the model has problems distinguishing meaning shift from referential phenomena, and propose a measure of contextual variability to remedy this.


Introduction
Semantic change has received increasing attention in empirical Computational Linguistics / NLP in the last few years (Tang, 2018;Kutuzov et al., 2018).Almost all studies so far have focused on meaning shift in long periods of time-decades to centuries.However, the genesis of meaning shift and the mechanisms that produce it operate at much shorter time spans, ranging from the online agreement on words' meaning in dyadic interactions (Brennan and Clark, 1996) to the rapid spread of new meanings in relatively small communities of people in (Wenger, 1998;Eckert and McConnell-Ginet, 1992).This paper is, to the best of our knowledge, the first exploration of the latter phenomenon-which we call short-term meaning shift-using distributional representations.
More concretely, we focus on meaning shift arising within a period of 8 years, and explore it on data from an online community of speakers, because there the adoption of new meanings happens at a fast pace (Clark, 1996;Hasan, 2009).Indeed, short-term shift is usually hard to observe in standard language, such as the language of books or news, which has been the focus of long-term shift studies (e.g., Hamilton et al., 2016;Kulkarni et al., 2015), since it takes a long time for a new meaning to be widely accepted in the standard language.
Our contribution is twofold.First, we create a small dataset of short-term shift for analysis and evaluation, and qualitatively analyze the types of meaning shift we find. 1 This is necessary because, unlike studies of long-term shift, we cannot rely on material previously gathered by linguists or lexicographers.Second, we test the behavior of a standard distributional model of semantic change when applied to short-term shift.Our results show that this model successfully detects most shifts in our data, but it overgeneralizes.Specifically, the model gets confused with contextual changes due to speakers in the community often talking about particular people and events, which are frequent on short time spans.We propose to use a measure of contextual variability to remedy this and showcase its potential to spot false positives of referential nature like these.We thus make progress in understanding the nature of semantic shift and towards improving computational models thereof.

Related Work
Distributional models of semantic change are based on the hypothesis that a change in context of use mirrors a change in meaning.This in turn stems from the Distributional Hypothesis, that states that similarity in meaning results in similarity in context of use (Harris, 1954).Therefore, all models (including ours) spot semantic shift as a change in the word representation in different time periods.Among the most widely used techniques are Latent Semantic Analysis (Sagi et al., 2011;Jatowt andDuh, 2014), Topic Modeling (Wijaya andYeniterzi, 2011), classic distributional representations based on co-occurence matrices of target words and context terms (Gulordava and Baroni, 2011).More recently, researchers have used word embeddings computed using the skip-gram model by Mikolov et al. (2013).Since embeddings computed in different semantic spaces are not directly comparable, time related representations are usually made comparable either by aligning different semantic spaces through a transformation matrix (Kulkarni et al., 2015;Azarbonyad et al., 2017;Hamilton et al., 2016) or by initializing the embeddings at t i+1 using those computed at t i (Kim et al., 2014;Del Tredici et al., 2016;Phillips et al., 2017;Szymanski, 2017).We adopt the latter methodology (see Section 3.2).
Unlike most previous work, we focus on the language of online communities.Recent studies of this type of language have investigated the spread of new forms and meanings (Del Tredici andFernández, 2017, 2018;Stewart and Eisenstein, 2018), competing lexical variants (Rotabi et al., 2017), and the relation between conventions in a community and the social standing of its members (Danescu-Niculescu-Mizil et al., 2013).None of these works has analyzed the ability of a distributional model to capture these phenomena, which is what we do in this paper for short-term meaning shift.Kulkarni et al. (2015) consider meaning shift in short time periods on Twitter data, but without providing an analysis of the observed shift nor systematically assessing the performance of the model, as we do here.
Evaluation of semantic shift is difficult, due to the lack of annotated datasets (Frermann and Lapata, 2016).For this reason, even for long-term shift, evaluation is usually performed by manually inspecting the n words whose representation changes the most according to the model under investigation (Hamilton et al., 2016;Kim et al., 2014).Our dataset allows for a more systematic evaluation and analysis, and enables comparison in future studies.

Data
We exploit user-generated language from an online forum of football fans, namely, the r/LiverpoolFC subreddit, one of the many communities hosted by the Reddit platform. 2Del Tredici  and Fernández (2018) showed that this subreddit presents many characteristics that favour the cre-ation and spread of linguistic innovations, such as a topic that reflects a strong external interest and high density of the connections among its users.This makes it a good candidate for our investigation.We focus on a short period of eight years, between 2011 and 2017.In order to enable a clearer observation of short-term meaning shift, we define two non-consecutive time bins: the first one (t 1 ) contains data from 2011-2013 and the second one (t 2 ) from 2017. 3 We also use a large sample of community-independent language for the initialization of the word vectors, namely, a random crawl from Reddit in 2013.Table 1 shows the size of each sample.

Model
We adopt the model proposed by Kim et al. (2014), a representative method for computing diachronic meaning shift. 4While other methods might be equally suitable (see Section 2), we expect our results not to be method-specific, because they concern general properties of short-term shift, as we show in Sections 4 and 5.In the model by Kim et al. (2014), word embeddings for the first time bin t 1 are initialized randomly; then, given a sequence of time-related samples, embeddings for t i are initialized using the embeddings of t i−1 and further updated.If at t i the word is used in the same contexts as in t i−1 , its embedding will only be marginally updated, whereas a major change in the context of use will lead to a stronger update of the embedding.The model makes embeddings across time bins directly comparable.
We implement the following steps: First, we create randomly initialized word embeddings with the large sample Reddit 13 to obtain meaning representations that are community-independent.We then use these embeddings to initialize those in LiverpoolFC 13 , update the vectors on this sample, and thus obtain embeddings for time t 1 .This step adapts the general embeddings to the LiverpoolFC community.Finally, we initialize the word embeddings for LiverpoolFC 17 with those of t 1 , train on this sample, and get embeddings for t 2 .
The vocabulary is defined as the intersection of the vocabularies of the three samples (Reddit 13 , LiverpoolFC 13 , LiverpoolFC 17 ), and includes 157k words.For Reddit 13 , we include only words that occur at least 20 times in the sample, so as to ensure meaningful representations for each word, while for the other two samples we do not use any frequency threshold: Since the embeddings used for the initialization of LiverpoolFC 13 encode community-independent meanings, if a word doesn't occur in LiverpoolFC 13 its representation will simply be as in Reddit 13 , which reflects the idea that if a word is not used in a community, then its meaning is not altered within that community.We train with standard skip-gram parameters (Levy et al., 2015): window 5, learning rate 0.01, embedding dimension 200, hierarchical softmax.

Evaluation dataset
Our dataset consists of 97 words from the r/LiverpoolFC subreddit with annotations by members of the subreddit -that is, community members with domain knowledge (needed for this task) but no linguistic background.
To ensure that we would get enough cases of semantic shift to enable a meaningful analysis, we started out from content words that increase their relative frequency between t 1 and t 2 .5A threshold of 2 standard deviations above the mean yielded ∼200 words.The first author manually identified 34 semantic shift candidates among these words by analyzing their contexts of use in the r/LiverpoolFC data.Semantic shift is defined here as a change in the ontological type that a word denotes, which takes place when the word starts to be used to denote an entity which is different from the one originally denoted and the new use spreads among the members of a community (see examples in Sec. 4).We added two types of confounders: 33 words with a significant frequency increase but not marked as meaning shift candi-dates, and 33 words with constant frequency between t 1 and t 2 , included as a sanity check.All words have absolute frequency in range .
The participants were shown the 100 words (in randomized order) together with randomly chosen contexts of usage from each time period (µ=4.7 contexts per word) and, for simplicity, were asked to make a binary decision about whether there was a change in meaning.In order to have the redditors familiarize with the concept of meaning change, we first provide them with an intuitive, non-technical definition, and then a set of cases that exemplify it.The instructions to participants can be found in the project's GitHub repository (see footnote 1).
Semantic shift is arguably a graded notion.In line with a suggestion by Kutuzov et al. (2018) to account for this fact, we aggregate the annotations into a graded semantic shift index, ranging from 0 (no shift) to 1 (shift) depending on how many subjects spotted semantic change.The shift index is exclusively based on the judgments of the redditors, and does not consider the preliminary candidate selection done by us.Overall, 26 members of r/LiverpoolFC participated in the survey, and each word received on average 8.8 judgements.Further details about the dataset are in Appendix A.

Types of Meaning Shift
We identify three main types of shift in our data via qualitative analysis of examples with a shift index > 0.5: metonymy, metaphor, and meme.
Metonymy.In metonymic shifts, a highly salient characteristic of an entity is used to refer to it.Among these cases are, for example, 'highlighter', which in t 2 occurs in sentences like 'we are playing with the highlighter today', or 'what's up with the hate for this kit?This is great, ten times better than the highlighter', used to talk about a kit in a colour similar to that of a highlighter pen; or 'lean', in 'I hope a lean comes soon!', 'Somebody with speed. . .make a signing. . .Cuz I need a lean', used to talk about hiring players due to new hires typically leaning on a Liverpool symbol when posing for a photo right after signing for the club.Particularly illustrative is the 'F5' example shown in Table 2.While 'F5' is initially used with its common usage of shortcut for refreshing a page (1), it then starts to denote the act of refreshing in order to get the latest news about the possible transfer of a new  player to LiverpoolFC (2).This use catches on and many redditors use it to express their tension while waiting for good news (3-5), 6 though not all members are aware of the new meaning of the word (6).When the transfer is almost done, someone leaves the 'F5 squad' (7), and after a while, another member recalls the period in which the word was used (8).
Metaphor.Metaphorical shifts lead to a broadening of the original meaning of a word through analogy.For example, in t 2 'shovel' occurs in sentences such as 'welcome aboard, here is your shovel' or 'you boys know how to shovel coal': the team is seen as a train that is running through the season, and every supporter is asked to figuratively contribute by shoving coal into the train boiler.
Meme.Finally, memes are another prominent source of meaning shift.In this case, fans use a word to make jokes and be sarcastic, and the new usage quickly spreads within the community.

Modeling Results and Analysis
The positive correlation between cosine distance and semantic shift index (Pearson's r= 0.49, p < 0.001, see Figure 1) confirms the hypothesis that meaning shift is mirrored by a change in context of use.However, we also find systematic deviations.

False negatives
A small, but consistent group is that of words that undergo semantic shift but are not captured by the model (blue ellipsis Figure 1; shift index>0.5, cosine distance<0.25).These are all metaphorical shifts; in particular, cases of extended metaphor (Werth, 1994), where the metaphor is developed throughout the whole text.For instance, besides the 'shovel' example mentioned in Section 4, we find 'pharaoh', the nickname of an Egyptian player who joined Liverpool in 2017, used in sentences like 'approved by our new Pharaoh Tutankhamun', or 'our dear Egyptian Pharaoh, let's hope he becomes a God'.Despite the metaphoric usage, the local context of these words is similar to the literal one, and so the model does not spot the meaning shift.We expect this to happen in long-term shift models, too, but we are not aware of results confirming this.

False positives
A larger group of problematic cases is that of words that do not undergo semantic shift despite showing relatively large differences in context between t 1 and t 2 (red ellipsis in Figure 1; shift in-dex=0, cosine distance>0.25).Manual inspection reveals that most of these "errors" are due to a referential effect: words are used almost exclusively to refer to a specific person or event in t 2 , and so the context of use is different from the contexts in t 1 .For instance, 'stubborn' is almost always used to talk about a coach who was not there in 2013 but only in 2017; 'entourage', for the entourage of a particular star of the team; 'independence' for the political events in Catalonia (Spain).In all these cases, the meaning of the word stays the same, despite the change in context.In line with the Distributional Hypothesis, the model spots the context change, but it is not sensitive to its nature.We expect long-term shift to not be as susceptible to referential effects like these because embeddings are aggregated over a larger and more varied number of occurrences.We expect that in referential cases the contexts of use will be narrower than for words with actual semantic shift, as they are specific to one person or event.Hence, a measure of contextual variability should help spot false positives.To test this hypothesis, we define contextual variability as follows: For a target word, we create a vector for each of its contexts (5 words on both sides of the target) in t 2 by averaging the embeddings of the words occurring in it, and define variability as the average pairwise cosine distance between context vectors. 7We find that contextual variability is indeed significantly correlated with semantic shift in our dataset (Pearson's r = 0.55, p < 0.001), while it is independent from cosine distance (Pearson's r= 0.18, p > 0.05).These two aspects are thus complementary.While both shift words and referential cases change context of use in t 2 , context variability captures the fact that only in referential cases words occur in a restricted set of contexts.Figure 2 shows this effect visually.This result can inform future models of short-term meaning shift.

Conclusion
The goal of this initial study was to bring to the attention of the NLP community short-term meaning shift, an under-studied problem in the field.Our hope is that it will spark further research into a phenomenon which, besides being of theoretical interest, has potential practical implications for NLP downstream tasks concerned with 7 There are alternative ways of measuring contextual variability, but we expect them to yield the same picture.For instance, we experimented with a different window size and obtained the same pattern.user-generated language, as modeling how word meanings rapidly change in communities would allow a better understanding of what their members say.Future research should experiment with other datasets (reddits from other domains, other online communities) and also alternative models that address the challenges described here.

A Further Details on Evaluation Dataset
For our experiment, we considered content words only, which we identified by using the external list of common words available at https://www.wordfrequency.info/free.asp.Three words were discarded from the initial list after analysis of the redditor data: 'discord' and 'owls' due to the homonymy with proper names not detected during survey's implementation; 'tracking' because the chosen examples clearly mislead the judgements of the redditors.
As detailed in Section 3.3, 26 members of r/LiverpoolFC participated in the survey, and each word received on average 8.8 judgements.We computed inter-annotator agreement as Krippendorff's alpha, and obtained α = 0.58, a relatively low value but common in semantic tasks (Artstein and Poesio, 2008).
The results of the annotation validate our initial word sampling procedure: • the words that present a significant increase in frequency and were annotated as meaning shift by us received an average shift annotation of 0.72 (± 0.15); • the words that present a significant increase in frequency but that were not annotated as meaning shift by us received an average shift annotation of 0.15 (± 0.16); • the words that keep a constant frequency between t 1 and t 2 , and we don't consider examples of meaning shift, got 0.07 (± 0.12).

Figure 2 :
Figure 2: Semantic shift index vs.context variability.Red ellipsis: referential cases which are assigned high cosine distance values by the model (false positives).

Table 1 :
Time bin and size of the datasets.

Table 2 :
Examples of use of 'F5' with time stamps, which illustrate the speed of the meaning shift process.All the examples are from LiverpoolFC 17 .
Communities of practice: Learning, meaning, and identity.Cambridge University Press.Paul Werth.1994.Extended metaphor-a text-world account.Language and literature, 3(2):79-103.Derry Tanti Wijaya and Reyyan Yeniterzi.2011.Understanding semantic change of words over centuries.In Proceedings of the 2011 international workshop on DETecting and Exploiting Cultural di-versiTy on the social web, pages 35-40.ACM.