Academetrics: the math behind predicting Oscar winners

January 21, 2020

Share this article:

[supsystic-social-sharing id='1']

The biggest night of Hollywood is almost upon us! On the night of February 9th, the annual Academy Awards will be awarded. The biggest names of Hollywood will gather in the Dolby Theatre in Los Angeles to celebrate yet another year of cinematic achievements. During the ceremony, 24 golden statuettes, known as an “Academy Award of Merit” or “Oscar”, will be awarded. Every year there is an incredible fuzz when the nominees are announced and the excitement lasts until the actual award ceremony. This makes you wonder whether or not this excitement is justified. The Academy Awards are not the first award ceremony for the cinematic achievements of the year prior and it is also not the first time the Academy Awards are hosted. We can conclude that there is a vast amount of data available on prior award ceremonies and on the films, that makes you wonder whether or not it is possible to make predictions about the winners! Luckily for you, you can find out in this article!


Academy Awards

The Academy Awards will celebrate its 92nd birthday this year, which means that it is the oldest way in which Hollywood celebrates its productions. It is part of the ‘big four’ in the world of show business, together with the Emmy’s, Grammy’s and Tony’s (television, recording, and theater). This year the Academy Awards will once again conclude the so-called ‘award season’. First, you will have the opportunity to enjoy the Golden Globes, the Critics’ Choice Movie Awards, the Screen Actors Guild Awards and the British Academy Film Awards on January 5th, January 12th, January 19th and February 2nd respectively. It is no exaggeration if one states that the market for movie awards is saturated. Although the enormous amount of possible prizes to win, the Academy Awards are still the most prestigious. This is, as has been said, partly due to the fact that it is the longest existing way to celebrate movie productions, but mostly because these awards are voted upon by fellow actors, directors, and composers. These are the people who know what the job entails and those whose acknowledgment is worth the most.


Academy Award winners

Academ Awards 2019 winners


Predicting outcomes

It is understandable that such a large event, where probabilities, chance, and luck play an important role attracts fortune seekers who want to make money by betting on outcomes. This results, of course, in people taking advantage of the power of mathematics to get the best predictions. Many people attempt to make models that successfully pinpoints the winners in each category, and the most famous of these people is Ben Zauzmer. He recently published a [expand title=book] Oscarmetrics: The math behind the biggest night in hollywood [/expand]
full of insights into his model and the history of the Oscars itself. The reason that Zauzmer is the most famous Oscar predictor has two reasons. Firstly, he is very active (and funny) on twitter prior to the award ceremony and secondly, which is probably the main reason, because he has a success rate of 77% since he started in 2012. Last year he even predicted 20 out of the 21 categories for which he has a model correctly. Hence, we have to conclude that it has to be possible to create a reliable model for predicting the outcomes.


Predicting Academy Award winners

Ben Zauzmer



As with creating every other model, to construct this model one has to start with collecting the data. This means that one has to search for indicators that are useful in predicting Oscar outcomes. First of all, Oscar data from previous years is needed. To be able to predict future outcomes, one needs to know in which categories movies have been nominated in the past and how well they performed in each category. Furthermore, nominations and achievements in other award ceremonies are also very insightful, for example, the Golden Globes, BAFTAs, but also things like the Directors Guild awards and similar. Lastly, one needs attributes of every movie, such as the genre, duration, and the IMDB score. Other attributes that are of importance are the scores of a movie on Rotten Tomatoes and Metacritic, the so-called critics’ reviews. It is obvious that only having data does not get you anywhere. Each category is best explained by different indicators. If you want to predict the best director, you should look at the Directors Guild Awards, while the Best Original Score can be predicted using the Golden Globes. Hence, it is important to find the appropriate weight of each indicator for each category.


That brings us to the modeling part of the problem. The model for predicting the winner in each category can be framed as a discrete choice model, where one winner is selected in each category. This model also permits the weights of each indicator to vary across categories. Then the probability of selecting movie j conditional on the choice set C_i, which consists of all movies nominated in category i is given by:

    \[\mathbb{P}[Y = j | \mathbf{x}_i] = \int \frac{\exp^{\mathbf{b}^T\mathbf{x}_{ij}}}{\sum_{h \in C_i}  \exp{\mathbf{b}^T\mathbf{x}_{ih}}} f(\mathbb{\beta})\text{d}\mathbb{\beta}\]

Here x_{ij} = (x_{ij1},…,x_{ijp})^T denotes the values of the p explanatory variables for film j which has been nominated for award i. Estimating this model for all available years using the maximum likelihood method gives the weight of each variable in each year. Using the found variable weights and locally weighted scatterplot smoothing (which is beyond the scope of this article), one can convert the collected data into the variable weights which can be used to predict the outcome of the coming Academy Awards.
However, one could encounter a small problem here, namely that a lot of data, such as the outcomes of other award shows, have only been around for a few years. For example, the Screen Actors Guild Award is only around for 26 years. This means that adding one more year’s worth of data can sometimes highly influence the weight of a certain indicator. Which in turn implies that the model has to be reweighted every year with the outcomes of the most recent year.



When the BAFTAs have been distributed all the data can be collected. It is certain that Ben Zauzmer will make a prediction for the upcoming Oscars. However, with the given tools you are now also able to construct a model and predict the outcomes. The only question that remains is whether or not you want to know which movie will take home which prize. It is still the biggest night of Hollywood and the greatest charms of such an award show will always be the anticipation, the tension during the announcements. It is wonderful what mathematics can do, but luckily you are not obliged to use it or believe it.

Dit artikel is geschreven door Jochem Hak

Jochem Hak

Read more

Regression analysis: A beginner’s guide

Regression analysis: A beginner’s guide

Econome­­trics, the int­­ersection of economics and statistics, employs sophisticated methods to analyse and quantify relationships within economic systems. One of its fundamental tools is regression analysis, a statistical technique that allows economists tot model...

Are you tying your shoelaces wrong?

Are you tying your shoelaces wrong?

We tie our shoelaces to ensure that our shoes stay on tight, and we do these by tying a knot. There are different ways to tie your shoelaces, you may have learnt the “around the tree” technique, but somehow, they still always come undone, why? This all has to do with...