522 unique words (COHA, 1890-1999) with frequency increase \(\ln\geq 2\) between any 2 successive spans of 10 years (& occur in \(\geq 2\) years & \(\geq 100\) times in the latter span).
word | freq. change | cumulative sum of decreases | cosine sim | normd. dist |
---|---|---|---|---|
relativism | +13.2 | |||
marxism | -5.68 | 5.68 | 0.68 | 0 |
thesis | +9.00 | 5.68 | 0.67 | 0.01 |
jacksonian | -11.64 | 17.32>13.2 | 0.66 | 0.03 |
Topical advection as a proxy: weighted mean log frequency change in the top \(n\) (PPMI-weighted) context words of the target.
R2=0.2. Clearer competition signal if: lower communicative need/advection (b=0.09, p<0.001), bursty series, smaller changes, a clear loser present. Also controlled for, but all p>0.05: std of yearly frequencies • semantic subspace instability • uniqueness of the form • smallest edit distance among closest sem neighbors • polsemy • leftover prob. mass • age of word in corpus • target decade.
Controlling for a range of factors, communicative need (operationalized by advection), describes a moderate amount of variance in competitive interactions between words: low advection words are more likely to replace a word with a similar meaning. Presumably high communicative need facilitates the co-existence of similar words.
Notes on the competition measure
The polysemy measure
Details of the linear regression model for the English COHA data
Linear regression model predicting the cosine distance (normalized by value of top neighbor) where probability mass gets equalized | |||
Estimate | p | clearer competition signal if… | |
---|---|---|---|
advection | 0.0999 | <0.001 | lower comm. need |
occurs in n years | 0.0086 | <0.001 | bursty series |
abs. freq. change | 0.0005 | 0.011 | lower freq (change) |
max %decrease | 0.0008 | <0.001 | a clear loser |
R2=0.2, F=12.13(12,509), p<0.001 |
Also controlled for in the model, but all p>0.05: • standard deviation of yearly frequencies (burstiness) • semantic subspace instability • uniqueness of the form • smallest edit distance among closest semantic neighbors • polsemy • leftover probability mass • age of the word in the corpus • target decade.
Ongoing and future work
References
Karjus, A., Blythe, R.A., Kirby, S., Smith, K., [to appear in Language Dynamics and Change]. Quantifying the dynamics of topical fluctuations in language.
Regier, T., Carstensen, A., Kemp, C., 2016. Languages Support Efficient Communication about the Environment: Words for Snow Revisited. PLOS ONE 11, 1–17.
Gibson, E., Futrell, R., Jara-Ettinger, J., Mahowald, K., Bergen, L., Ratnasingam, S., Gibson, M., Piantadosi, S.T., Conway, B.R., 2017. Color naming across languages reflects color use. Proceedings of the National Academy of Sciences.
Hamilton, W.L., Leskovec, J., Jurafsky, D., 2016. Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers.
Xu, Y., Kemp, C., 2015. A Computational Evaluation of Two Laws of Semantic Change., in: CogSci.
Schlechtweg, Dominik, Stefanie Eckmann, Enrico Santus, Sabine Schulte im Walde, and Daniel Hole, 2017. German in Flux: Detecting Metaphoric Change via Word Entropy. arXiv preprint.
Petersen, A.M., Tenenbaum, J., Havlin, S., Stanley, H.E., 2012. Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death. Scientific Reports 2.
Stewart, I., Eisenstein, J., 2018. Making “fetch” happen: The influence of social and linguistic context on nonstandard word growth and decline, in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, pp. 4360–4370.
Turney P.D., Mohammad S.M., 2019 The natural selection of words: Finding the features of fitness. PLOS ONE 14(1): e0211512.
The first author was supported by a Kristjan Jaak scholarship, funded and managed by Archimedes Foundation in collaboration with the Ministry of Education and Research of Estonia.