Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Challenges in
detecting evolutionary forces
in language change
using diachronic corpora

Andres Karjus
(supervised by Kenny Smith, Richard A. Blythe, Simon Kirby)
Centre for Language Evolution, University of Edinburgh

CLE seminar, 6.11.2018

1 / 33

A bit of background

2 / 33

A bit of background

3 / 33

A bit of background

  • All natural languages change over time
3 / 33

A bit of background

  • All natural languages change over time
  • Many have suggested that language change, like other evolutionary processes, involves both directed selection as well as stochastic drift (Sapir1921, Jespersen1922, Andersen1987, Mcmahon1994, Croft2000, Blythe2012)
  • Number of ways in which selective biases may influence language change (Kirby2008, Smith2013, Enfield2014, Croft2000, Haspelmath1999, Labov2011, Mcmahon1994, Zipf1949, Baxter2006, Daoust2017; +et-al.'s )
3 / 33

A bit of background

  • All natural languages change over time
  • Many have suggested that language change, like other evolutionary processes, involves both directed selection as well as stochastic drift (Sapir1921, Jespersen1922, Andersen1987, Mcmahon1994, Croft2000, Blythe2012)
  • Number of ways in which selective biases may influence language change (Kirby2008, Smith2013, Enfield2014, Croft2000, Haspelmath1999, Labov2011, Mcmahon1994, Zipf1949, Baxter2006, Daoust2017; +et-al.'s )
  • Signatures of selection should be inferable from the usage data (Sindi2016, Reali2010, Bentley2008, Amato2018, Kander2017; +et-al.'s)
3 / 33
4 / 33

Newberry et al. 2017, Detecting evolutionary forces in language change

  • "...we quantify the strength of selection relative to stochastic drift in language evolution."
5 / 33

Newberry et al. 2017, Detecting evolutionary forces in language change

  • "...we quantify the strength of selection relative to stochastic drift in language evolution."

  • "...time series derived from large corpora of annotated texts"

    • English verb (ir)regularization; COHA
    • Frequency Increment Test (FIT)
5 / 33

Newberry et al. 2017, Detecting evolutionary forces in language change

  • "...we quantify the strength of selection relative to stochastic drift in language evolution."

  • "...time series derived from large corpora of annotated texts"

    • English verb (ir)regularization; COHA
    • Frequency Increment Test (FIT)
  • "...this work provides a method for testing selective theories of language change against a null model and reveals an underappreciated role for stochasticity in language evolution."

5 / 33

The Frequency Increment Test (FIT)

  • Feder et al. 2014 (from a family of tests of selection, cf. refs in paper)
  • Series of relative variant frequencies vi(0,1) at time ti
  • Transformed into frequency increments
  • Yi=(vivi1)/2vi1(1vi1)(titi1)
6 / 33

The Frequency Increment Test (FIT)

  • Feder et al. 2014 (from a family of tests of selection, cf. refs in paper)
  • Series of relative variant frequencies vi(0,1) at time ti
  • Transformed into frequency increments
  • Yi=(vivi1)/2vi1(1vi1)(titi1)
  • Rationale: under neutral evolution, the increments vivi1 are normally distributed with a mean of 0, and variance ~ vi1(1vi1)(titi1) (inversely proportional to effective population size; when 0<<vi<<1; Gaussian approximation of the Wright-Fisher diffusion process)
6 / 33

The Frequency Increment Test (FIT)

  • Feder et al. 2014 (from a family of tests of selection, cf. refs in paper)
  • Series of relative variant frequencies vi(0,1) at time ti
  • Transformed into frequency increments
  • Yi=(vivi1)/2vi1(1vi1)(titi1)
  • Rationale: under neutral evolution, the increments vivi1 are normally distributed with a mean of 0, and variance ~ vi1(1vi1)(titi1) (inversely proportional to effective population size; when 0<<vi<<1; Gaussian approximation of the Wright-Fisher diffusion process)
  • Test under the null hypothesis of drift ~ test that the increments are normally distributed with a mean of 0 (e.g.: one-sample t-test).
6 / 33

7 / 33

Problem: how to bin the data for time series

  • Microbial experiments: samples that are taken at chosen intervals and resequenced
  • Common approach in corpora usage: bin fixed length time segments
    • there is always a minimal time precision threshold (COHA: years)
    • but often not enough observations at fine precision
    • so: decades, years, days, minutes
    • example: daily newspaper
8 / 33

Problem: how to bin the data for time series

  • Microbial experiments: samples that are taken at chosen intervals and resequenced
  • Common approach in corpora usage: bin fixed length time segments
    • there is always a minimal time precision threshold (COHA: years)
    • but often not enough observations at fine precision
    • so: decades, years, days, minutes
    • example: daily newspaper
  • Newberry et al.: use variable width quantile binning, n(bins) = log(total frequency). Assures ~same number of occurrences per bin (but bins cover different lengths of time)
8 / 33
185019001950200000.20.40.60.81
b: nanavar-width, c=1.25; FITp=0.01 S-Wp=0.59<-lit lighted->>
185019001950200000.20.40.60.81
b: nanavariable-width, c=1 (Newberry et al.); FITp=0.1 S-Wp=0.33<-spelt spelled->>

p<0.05
p<0.2
p>0.2

9 / 33

Replication of Newberry et al. 2017 (36 verbs)

10 / 33

Replication of Newberry et al. 2017 (36 verbs)

11 / 33

Replication of Newberry et al. 2017 (36 verbs)

12 / 33

Replication of Newberry et al. 2017 (36 verbs)

13 / 33

Replication of Newberry et al. 2017 (36 verbs)

14 / 33

Some thoughts

15 / 33

Some thoughts

  • In broad strokes, the generalization by Newberry et al. 2017 holds - selection is indeed detected in only ~3..7 verbs (depending on binning), and drift is quite prevalent (at α=0.05).
15 / 33

Some thoughts

  • In broad strokes, the generalization by Newberry et al. 2017 holds - selection is indeed detected in only ~3..7 verbs (depending on binning), and drift is quite prevalent (at α=0.05).

  • However, for most individual time series, the FIT result varies between binnings (except for ~3 almost unambiguous cases)

15 / 33

Some thoughts

  • In broad strokes, the generalization by Newberry et al. 2017 holds - selection is indeed detected in only ~3..7 verbs (depending on binning), and drift is quite prevalent (at α=0.05).

  • However, for most individual time series, the FIT result varies between binnings (except for ~3 almost unambiguous cases)

  • So is it a good approach to study language change?
    Depends on the goal.

15 / 33

Some thoughts

  • In broad strokes, the generalization by Newberry et al. 2017 holds - selection is indeed detected in only ~3..7 verbs (depending on binning), and drift is quite prevalent (at α=0.05).

  • However, for most individual time series, the FIT result varies between binnings (except for ~3 almost unambiguous cases)

  • So is it a good approach to study language change?
    Depends on the goal.

  • But still, what's the deal with the variation in the results...?

15 / 33

What's going on?

16 / 33

(e.g. spill, burn)

17 / 33

(e.g. knit)

18 / 33

(differences between number of bins)

19 / 33

20 / 33

(e.g., tell)

21 / 33

Simulating change and applying binning
to determine the reasonable application range
of the FIT

22 / 33

Simulating change and binning

23 / 33

Simulating change and binning

  • Run a large number of Wright-Fisher simulations with 200 different selection coefficients s[0,5]
23 / 33

Simulating change and binning

  • Run a large number of Wright-Fisher simulations with 200 different selection coefficients s[0,5]

  • 200 generations, the "mutant" starting at 5% and 50% of the population of size 1000.

23 / 33

Simulating change and binning

  • Run a large number of Wright-Fisher simulations with 200 different selection coefficients s[0,5]

  • 200 generations, the "mutant" starting at 5% and 50% of the population of size 1000.

  • For each s, bin the series in successively fewer number of bins
    e.g. 200 (bin length 1) -> 100 (length 2) -> 66 (length 3) etc

23 / 33

Simulating change and binning

  • Run a large number of Wright-Fisher simulations with 200 different selection coefficients s[0,5]

  • 200 generations, the "mutant" starting at 5% and 50% of the population of size 1000.

  • For each s, bin the series in successively fewer number of bins
    e.g. 200 (bin length 1) -> 100 (length 2) -> 66 (length 3) etc

  • Repeat every combination 100x for good measure

23 / 33
050100150200012345
parameter space of sselection strength s
05010015020002004006008001000
s: 000.002840.008780.027180.084130.260410.8062.49467Wright-Fisher simulationsgenerationspopulation>
24 / 33

25 / 33

(start at 5%)

26 / 33

(start at 50%)

27 / 33

Observations

  • The FIT is insensitive to binning when selection is too weak ( s<0.01) to be detected; beyond about s>0.02 (depending on the start value) sensitivity to binning increases (false negatives)
28 / 33

Observations

  • The FIT is insensitive to binning when selection is too weak ( s<0.01) to be detected; beyond about s>0.02 (depending on the start value) sensitivity to binning increases (false negatives)
  • 0.01<s<0.02 is relatively insensitive; but also where binning can instead decrease the FIT p-value (false positives)
28 / 33

Observations

  • The FIT is insensitive to binning when selection is too weak ( s<0.01) to be detected; beyond about s>0.02 (depending on the start value) sensitivity to binning increases (false negatives)
  • 0.01<s<0.02 is relatively insensitive; but also where binning can instead decrease the FIT p-value (false positives)
  • The normality assumption is systematically violated when s approaches 0.1 (unless extreme binning is applied, which increases the false negative rate)
28 / 33

Range of applicability of the FIT for linguistic data

  • Conditions where the FIT is not reliably applicable:
    • partially completed changes, too short series
    • too few data points (sensitive to binning & absorption adjustment)
    • too long series (multiple events or processes)
    • too high selection (particularly with high binning)
    • small near-boundary fluctuations (false positives)
    • steep changes from boundary->non-boundary values
    • monotonically increasing series (normality assumption)
  • Where it is:
    • weak selection, non-monotonic series away from 0/1, but window covering enough of (a single) change
29 / 33

Conclusions

30 / 33

Conclusions

  • What a time to be alive! (data, methods, tools)
30 / 33

Conclusions

  • What a time to be alive! (data, methods, tools)
  • We evaluated the proposal of Newberry et al. 2017
    Found that the results are dependent on corpus binning, small sample effects, and the specifics of the FIT.
  • Testing vs generating hypotheses; degrees of freedom
30 / 33

Conclusions

  • What a time to be alive! (data, methods, tools)
  • We evaluated the proposal of Newberry et al. 2017
    Found that the results are dependent on corpus binning, small sample effects, and the specifics of the FIT.
  • Testing vs generating hypotheses; degrees of freedom
  • Fixing the issues would invite answers to numerous interesting questions
30 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
31 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
    • Do different parts of grammar/lexicon experience stronger drift?
31 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
    • Do different parts of grammar/lexicon experience stronger drift?
    • What is the relationship of selection strength and niche in language change? (cf. Laland2001, Altmann2011)
31 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
    • Do different parts of grammar/lexicon experience stronger drift?
    • What is the relationship of selection strength and niche in language change? (cf. Laland2001, Altmann2011)
    • Can different types of selection (top-down, grassroots, momentum) be distinguished? (Amato2018, Stadler2016)
31 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
    • Do different parts of grammar/lexicon experience stronger drift?
    • What is the relationship of selection strength and niche in language change? (cf. Laland2001, Altmann2011)
    • Can different types of selection (top-down, grassroots, momentum) be distinguished? (Amato2018, Stadler2016)
    • What is the role of drift in creole evolution? (Strimling2015)
31 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
    • Do different parts of grammar/lexicon experience stronger drift?
    • What is the relationship of selection strength and niche in language change? (cf. Laland2001, Altmann2011)
    • Can different types of selection (top-down, grassroots, momentum) be distinguished? (Amato2018, Stadler2016)
    • What is the role of drift in creole evolution? (Strimling2015)
    • In semantic change? (Hamilton2016)
31 / 33
  • Fixing these issues would invite answers to numerous interesting questions such as
    • Do different parts of grammar/lexicon experience stronger drift?
    • What is the relationship of selection strength and niche in language change? (cf. Laland2001, Altmann2011)
    • Can different types of selection (top-down, grassroots, momentum) be distinguished? (Amato2018, Stadler2016)
    • What is the role of drift in creole evolution? (Strimling2015)
    • In semantic change? (Hamilton2016)
    • Are some languages changing more due to drift than others? Relation to community size? (Reali2018, Atkinson2015)
      (+et-al.'s)
31 / 33

Conclusions

  • What a time to be alive! (data, methods, tools)
  • We evaluated the proposal of Newberry et al. 2017
    Found that the results are dependent corpus binning, small sample effects, and the specifics of the FIT.
  • Testing vs generating hypotheses; degrees of freedom
  • Fixing the issues would invite answers to numerous interesting questions
  • Identifying the role of drift vs selection in language change is an important goal, but: care with applying such tests to linguistic data, to avoid biases due to specifics of the domain and the particular test.
  • Slides, code & arXiv link at http://andreskarjus.github.io
32 / 33

Acknowledgements...

  • Kenny Smith, Richard Blythe, Simon Kirby
  • Mitchell Newberry
  • Alison Feder
  • Support by the Kristjan Jaak program, funded by the Archimedes Foundation & Ministry of Education and Research of Estonia
33 / 33

A bit of background

2 / 33
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow