It seems I’ve been way too busy lately to actually post anything here, so today we’ve got a guest post from David Hartley.

With Australia making it back into the world group of the Davis Cup after six years and Steve taking the week (month?) off, I thought this would be a great opportunity to write a guest blog. Now I will try to keep this in the same vein as Steve’s blogs but I suspect I won’t be able to stop myself including some equations (the worst ones I will provide links to instead of including in the text).

As a mathematician and sportsfan I often find myself checking out various statistics (for example F1 sector and lap differentials) and trying to predict what will happen and the end of the match/race. One of the most interesting sets I’ve found are tennis statistics, especially since you can win more points than another player but still lose in straight sets. And because most games end with only slightly different amounts of points won but again are over in straight sets. In this blog we will see why this is so.

A while ago (nine and a half years apparently) I came across a maths question in an interesting probability book. It asked: If the probability of you winning a point is $p$, what is the probability of you winning a game of tennis? Having just completed high school and therefore knowing some basic binomial theory, I thought I should tackle this.

Firstly $p$ will be a number between 0 and 1 and represents the likelihood you win a point (For example if you win 55% of points then $p=0.55$). Next the probability of two (or more) events happening is the product of their probabilities, for example the probability of you winning a game of tennis to love (nil) is $p\times p\times p\times p=p^4$ – that is, the probability of winning four straight points.

The next best way to win a game of tennis is by winning a point after being up 40-15 (or 3-1 if tennis had a sensible scoring system). This sort of game can occur four ways, since your opponent can win their point on the first, second, third or fourth points of the game (this is often written as the binomial coefficient $\binom{4}{1}$). Therefore the probability of you winning to 15 (scoring a point from being 40-15 up) is $4p^4(1-p)$, since you lose one point with a probability $1-p$ but win four points each with a probability $p$.

Likewise to win a game to 30 the probability is $10p^4(1-p)^2$, since there are ten different ways they can win 2 out of the 5 points (or ten ways you can win 3 out of 5 points if, like me, you prefer to keep track of the points you win).

So far, so good but what happens next? You can’t win from 40-40, instead you would need to win the next two points. Therefore the probability of winning is $20p^5(1-p)^3$. Unfortunately/fortunately this isn’t the end of the story. If you only win one of the next two points you are not out of the game but back at deuce, thus we seem to be trapped in a loop. To get out I’m afraid we need to use math(s) (Note to editor: go to hell Steve it’s maths!).

We know that the probability of getting to deuce is $20p^3(1-p)^3$, if we let $d$ be the probability of winning from deuce then $d$ is given by the sum of the probability of winning both points and wining one of the next two points then winning from deuce. Mathematically it is written as:

$d=p^2+2p(1-p)d$.

This can be solved for $d$ to give:

$d=\frac{p^2}{1-2p(1-p)}$.

With that little trick of algebra the problem is solved and the probability of you winning the game is:

$p^4+4p^4(1-p)+10p^4(1-p)^2+\frac{20p^5(1-p)^3}{1-2p(1-p)}$.

To get a feel for the formula assume you win 55% points on average against another player (a relatively small difference considering a tight set might have around 60 points), then the probability they win a game is 62.3%. A steep increase indeed (the slope has a maximum value of 2.5 when $p=0.5$). Figure 1 shows the probability curve.

The probability curve for winning a game of tennis. Sexy, huh?

The situation gets worse if you consider a set of tennis. For this we need some extra formulas such as those for the probability of winning a tie-break and both tie-break set and advantage set. The probability curves for an advantage set (Figure 2) shows that if you have a 55% chance of winning a point, you get a 82% chance of winning the set. Remember if there are 60 points in the set that means on average the points won would be 33-27.

The probability curve for winning an advantage set. Mmmmm Matlab.

The curves for the probability of winning a match can also be done and they show that for a five set match where the last set goes to advantage, a person who wins 55% of points has over a 95% chance of winning the match.

The probability curve for winning a five set advantage match. Wowwee, don’t bother showing up unless you win more than a third of the points on average.

Of course this analysis is very simplified. For instance if anyone remembers Wayne Arthurs then they will know that the probability you win a point on serve, 73% for Arthurs in 1999 leading to 91% of service games won (our model gives 93%), can be vastly different to the probability you win a point while receiving, 29% in the same year with 8% of receiving games won (8.8% by the model). Including both parameters will change the formulas for the probability of winning a tie-break game or either type of set; since mathematically it makes no difference if you serve first or second in the set, the match formulas don’t change. Using those formulas the data tells us he had a ~59% chance of winning; his actual winning percentage that year was 51% (he only played 27 matches so the data set is small). However, the moral of the story is that even if your favourite player has a scoreline that looks like they were easily beaten, this may just be a result of compounding probabilities.