Introduction

Hypothesis: frequency change in a word will lead to direct competition with (and possibly replacement of) near-synonym(s), unless the lexical subspace experiences high communicative need.
Is it possible to describe some variance in terms of which successful words compete with their neighbors and which do not?

Data

522 unique words (COHA, 1890-1999) with frequency increase \(\ln\geq 2\) between any 2 successive spans of 10 years (& occur in \(\geq 2\) years & \(\geq 100\) times in the latter span).

(click on the legend to show/hide traces)

Quantifying competition

Embed targets into vector space (LSA) of the preceding decade, compute semantic neighbors
Important: word occurrence probabilities sum up to 1; increase in x => decrease in y.
The measure: where probability mass gets equalized, i.e., target increase\(\geq \sum_{}^{}\)(neighbors’ decreases). Either cosine distance, or n increasing neighbors.
Indicates if the increasing target replaced semantically close word(s) (direct competition, obvious likely source of probability mass).
Example: relativism, increasing +13.2pmw between 1965-1974, 1975-1984:

word	freq. change	cumulative sum of decreases	cosine sim	normd. dist
relativism	+13.2
marxism	-5.68	5.68	0.68	0
thesis	+9.00	5.68	0.67	0.01
jacksonian	-11.64	17.32>13.2	0.66	0.03

(some outliers removed to faciliate visualization)

Communicative need

Topical advection as a proxy: weighted mean log frequency change in the top \(n\) (PPMI-weighted) context words of the target.

Results

(hover over the markers to see more info)

	Estimate	p	clearer competition signal if…
Linear regression model predicting the cosine distance (normalized by value of top neighbor) where probability mass gets equalized
advection	0.0999	<0.001	lower comm. need
occurs in n years	0.0086	<0.001	bursty series
abs. freq. change	0.0005	0.011	lower freq (change)
max %decrease	0.0008	<0.001	a clear loser
R²=0.2, F=12.13(12,509), p<0.001

Also controlled for in the model, but all p>0.05: • standard deviation of yearly frequencies (burstiness) • semantic subspace instability • uniqueness of the form • smallest edit distance among closest semantic neighbors • polsemy • leftover probability mass • age of the word in the corpus • target decade.

Conclusions

Controlling for a range of factors, communicative need (operationalized by advection), describes a moderate amount of variance in competitive interactions between words: low advection words are more likely to replace a word with a similar meaning. Presumably high communicative need facilitates the co-existence of similar words.

Appendix

Notes on the competition measure

We made sure to avoid auto-correlation between the advection measure and the dependent variable by filtering the neighbour lists of each target so that no topic word of the target (i.e. those with a PPMI>0 with the target, which are used to calculate the advection value) would be account for as a neighbor. This also makes sense from a semantic point of view: if two words, even if very similar occur near each other (e.g., “salt and pepper”), then it’s less palusible that they would be competing against one another. Exceptions are certainly possible, such as meta-linguistic expressions (“vapor, previously spelled as vapour”), but we assume these would be rare.
We also filtered out target words with higher-than-expected lexical dissemination (a proxy to polysemy, cf. Stewart & Eisenstein 2018), and those with a leftover probability mass >100% of its frequency.
We did not make use of the entire Corpus of Historical American English, as most of the 19th century decades are less balanced and smaller in size, the imbalance extends to the occurrence of non-standard dialects or registers in occasional year subcorpora.
This approach certainly has limitations stemming from the imperfect nature of corpus tagging, composition balance, and vector semantics (LSA). We also disregard issues such as homonymy (although we control for polysemy in the targets) and multi-word units.
We ran randomized baselines to make sure the observed correlation with advection is not some (unknown) artefact of the machine learning models used here. This was done by randomizing similarity matrices, i.e. each target was assigned a random list of neighbors, with random similarity values (drawn from the concatenation of all similarity vectors). After hundreds of iterations, the advection variable would come out with a p-value below 0.05 in only about 5% of the runs (i.e., as expected with an \(\alpha=0.05\)).

The polysemy measure

Ongoing and future work

The same model has now been tested on two more historical corpora in two additional languages, with very similar results; we are working on including more corpora.
We also experimented using GAMs for modelling the effect, which slightly improved descriptive power.
Variability in LSA vector space density is controlled for in this version by normalizing distance by the distance of the first neighbor. Another approach would be to z-score each distance using the mean and standard deviation (of the first neighbor), taken from the individual vector spaces build for each target.

References

Karjus, A., Blythe, R.A., Kirby, S., Smith, K., 2018. Quantifying the dynamics of topical fluctuations in language. ArXiv preprint.
Regier, T., Carstensen, A., Kemp, C., 2016. Languages Support Efficient Communication about the Environment: Words for Snow Revisited. PLOS ONE 11, 1–17.
Gibson, E., Futrell, R., Jara-Ettinger, J., Mahowald, K., Bergen, L., Ratnasingam, S., Gibson, M., Piantadosi, S.T., Conway, B.R., 2017. Color naming across languages reflects color use. Proceedings of the National Academy of Sciences.
Hamilton, W.L., Leskovec, J., Jurafsky, D., 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.
Xu, Y., Kemp, C., 2015. A Computational Evaluation of Two Laws of Semantic Change., in: CogSci.
Schlechtweg, Dominik, Stefanie Eckmann, Enrico Santus, Sabine Schulte im Walde, and Daniel Hole, 2017. German in Flux: Detecting Metaphoric Change via Word Entropy. arXiv preprint.
Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E., 2012. Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death. Scientific Reports 2.
Stewart, I., Eisenstein, J., 2018. Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp. 4360–4370.
Turney P.D., Mohammad S.M., 2019 The natural selection of words: Finding the features of fitness. PLOS ONE 14(1): e0211512.

This research was supported by a Kristjan Jaak scholarship, funded and managed by Archimedes Foundation in collaboration with the Ministry of Education and Research of Estonia.

Modelling lexical interactions
in diachronic corpora

Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith

Centre for Language Evolution, University of Edinburgh
andreskarjus.github.io | a.karjus(@)sms.ed.ac.uk

Introduction

Data

Quantifying competition

Communicative need

Results

Conclusions

Appendix

Modelling lexical interactionsin diachronic corpora

Andres Karjus, Richard A. Blythe, Simon Kirby, Kenny Smith

Centre for Language Evolution, University of Edinburghandreskarjus.github.io | a.karjus(@)sms.ed.ac.uk

Introduction

Data

Quantifying competition

Communicative need

Results

Conclusions

Appendix

Modelling lexical interactions
in diachronic corpora

Centre for Language Evolution, University of Edinburgh
andreskarjus.github.io | a.karjus(@)sms.ed.ac.uk