The math behind the English language

January 23, 2020

Share this article:

[supsystic-social-sharing id='1']

Like many other languages, the English language is based on grammatical rules. It contains all parts to make a basic sentence, such as nouns, adverbs and verbs. All of this has not stayed the same over time. Scientists have known for decades now that every language evolves in striking similar ways to adapt to the needs of its users. Not only the words, but also the grammar change and mutate over time, where new versions slowly rise to take over the older ones. This development is not a bad thing, since if English had not changed since, for instance, 1950, we would not have words to refer to WiFi, TV, smartphones and laptops. The adjustment of the language from year to year is so slow that we hardly notice it. Nonetheless, reading Shakespeare’s writing from the sixteenth century can be quite though, but old texts like the ‘Canterbury Tales’ and ‘Beowulf’ are the English language’s version of the fossil record. 


Since the beginning, irregular verbs are the bane of any effort to learn English. Today, the past tense of the majority of the English verbs have the suffix ‘-ed’. Besides these so-called regular verbs, such as ‘laughed’ and ‘helped’, irregular verbs exist. These verbs can obey more ancient rules for verb conjugation, such as ‘sang/sung’ or ‘drank/drunk’ or obey no rules at all, like ‘went’ and ‘had’. However, for the ones that are not really a fan of these irregular verbs, there is good news. Of the 177 irregular verbs that existed 1,200 years ago, only 145 survived into Middle English and just 98 remain irregular today. This means that many formerly irregular verbs such as ‘help’, which past tense used to be ‘holp’, but has been regularized to ‘helped’, have put on new regular guises.

This data was used in a study done by Erez Lieberman, Martin Nowak and their colleagues from Harvard University, where they looked at this record to mathematically model how the irregular verbs evolved over time and how they will change in the future. 

Lieberman and his colleagues built upon a previous study of seven competing rules for verb conjugation in Old English, found in ‘Beowulf’, six of which have gradually faded from use over time. Back then, only about 75% of the verbs followed the one surviving rule, which adds an ‘ed’ suffix to simple past and past participle forms. Today, less than 3% of verbs are irregular. To these belong the ten most commonly used English verbs: ‘be, have, do, go, say, can, will, see, take and get’. It was found by Lieberman that this is because irregular verbs are phased out much more slowly if they are commonly used. Besides that, verbs that did not exist in earlier forms of the English language, such as ‘googled’ and ‘emailed’, are also taking on the standard regular form with the suffix ‘ed’. 

By using the CELEX corpus, a massive online database of modern texts, he worked out the frequency of these verbs in modern English. With this, it was found that mathematical analysis of this linguistic evolution reveals that irregular verb conjugations behave in an extremely regular way. One can make predictions and have insights into the future stages of a verb’s evolutionary trajectory. 

The frequencies of the irregular verbs were set out to the number of irregular verbs and to the regularization rate, as can be seen in the figure below. They found out that the verbs regularize in a way that is ‘inversely proportional to the square root of their frequency’. To put this in other words, this means that if the verb is used 100 times less frequently, it will regularize 10 times as fast and if it is used 10,000 times less frequently, it will regularize 100 times as fast.

Using this model, the team managed to estimate how much staying power the remaining irregular verbs have and assigned half-lives to them, just as scientists do to radioactive isotopes that decay over time. They found that the two most common irregular verbs, ‘be’ and ‘have’, have a frequency of more than 10^{-1} and have half-lives of over 38,000 years. This is such a long time one can conclude that they are effectively immune to regularity and very unlikely to change. This result gives an astonishingly precise description of something linguists have suspected for a long time; the most frequently used irregular verbs are repeated so often that they are unlikely to ever go extinct. Besides that, it was found that out of the 98 remaining irregular verbs examined, sixteen of them will probably have adopted the ‘-ed’ suffix by 2500. 

Lieberman also suspects that, using the results from his study, the next verb to be regularized is ‘wed’. This is due to the fact that it is one of the least commonly used modern irregular verbs. Accordingly, the past form of ‘wed’ will soon be replaced by ‘wedded’. 

Before, language was considered too messy and difficult for mathematical study, but now we are able to successfully evaluate an aspect of how language changes and develops. 



Reference: Lieberman, Michel, Jackson, Tang & Nowak. 2007. Quantifying the evolutionary dynamics of language. Nature doi:10.1038/nature06137

Dit artikel is geschreven door Deirdre Westenbrink


Read more

Regression analysis: A beginner’s guide

Regression analysis: A beginner’s guide

Econome­­trics, the int­­ersection of economics and statistics, employs sophisticated methods to analyse and quantify relationships within economic systems. One of its fundamental tools is regression analysis, a statistical technique that allows economists tot model...

Are you tying your shoelaces wrong?

Are you tying your shoelaces wrong?

We tie our shoelaces to ensure that our shoes stay on tight, and we do these by tying a knot. There are different ways to tie your shoelaces, you may have learnt the “around the tree” technique, but somehow, they still always come undone, why? This all has to do with...