Problem 73:  Collecting voles ($✓$) 2000 Paper II

A group of biologists attempts to estimate the magnitude, $N$, of an island population of voles (Microtus agrestis). Accordingly, the biologists capture a random sample of 200 voles, mark them and release them. A second random sample of 200 voles is then taken of which 11 are found to be marked. Show that the probability, ${p}_{N}$, of this occurrence is given by

${p}_{N}=k\frac{{\left(\right\left(N-200\right)!\left)\right}^{2}}{N!\left(N-389\right)!}\phantom{\rule{0.3em}{0ex}},$

where $k$ is independent of $N$.

The biologists then estimate $N$ by calculating the value of $N$ for which ${p}_{N}$ is a maximum. Find this estimate.

All unmarked voles in the second sample are marked and then the entire sample is released. Subsequently a third random sample of 200 voles is taken. Using your estimate for $N$, write down the probability that this sample contains exactly $j$ marked voles, leaving your answer in terms of binomial coefficients.

Deduce that

$\sum _{j=0}^{200}\left(\begin{array}{c}\hfill 389\hfill \\ \hfill j\hfill \end{array}\right)\left(\begin{array}{c}\hfill 3247\hfill \\ \hfill 200-j\hfill \end{array}\right)=\left(\begin{array}{c}\hfill 3636\hfill \\ \hfill 200\hfill \end{array}\right).$

This is really just an exercise in combinations. (Recall that a permutation is a reordering of a set of objects, and a combination is a selection of a subset from a set.) You assume that you are equally likely to choose any given subset of the same size, so that the probability of a set of speciﬁc composition is the number of ways of choosing a set of that composition divided by the total number of ways of choosing any set of the same size. Of course, you are assuming that the voles are indistinguishable, except for the marks made by the biologists.

Maximising a discrete (not a continuous) function of $N$ came up on one of the previous questions: you have to compare adjacent terms.

The numbers look rather bad, though they turn out OK. My instinct would be to do it algebraically ﬁrst: replace 200 by $a$ and 11 by $b$, then substitute back at the end. I am sure it will lead to a better understanding of what is going on.

Solution to problem 73

For the second sample, 200 out of $N$ voles are already marked, so ${p}_{N}$ is just the number of ways of choosing 11 from 200 and 189 from $N-200$ divided by the number of ways of choosing 200 from $N$:

${p}_{N}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\frac{\left(\genfrac{}{}{0.0pt}{}{200}{11}\right)\left(\genfrac{}{}{0.0pt}{}{N-200}{189}\right)}{\left(\genfrac{}{}{0.0pt}{}{N}{200}\right)}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}=\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\frac{\frac{200!}{11!\phantom{\rule{0.3em}{0ex}}189!}\phantom{\rule{2.77695pt}{0ex}}\frac{\left(N-200\right)!}{189!\phantom{\rule{0.3em}{0ex}}\left(N-389\right)!}}{\frac{N!}{200!\phantom{\rule{0.3em}{0ex}}\left(N-200\right)!}}i\phantom{\rule{0.3em}{0ex}},$

so

$k=\frac{{\left(200!\right)}^{2}}{11!\phantom{\rule{0.3em}{0ex}}{\left(189!\right)}^{2}}\phantom{\rule{0.3em}{0ex}}.$

At the maximum value, ${p}_{N}\approx {p}_{N-1}$, i.e.

$\frac{{\left(N-200\right)}^{2}}{N\left(N-389\right)}\approx 1\phantom{\rule{0.3em}{0ex}},$

which gives $N\approx 20{0}^{2}∕11\approx 3636$ (just divide 40000 by 11).

At the third sample, there are 389 marked voles and an estimated $3636-389=3247$ unmarked voles. Hence

The ﬁnal part follows immediately, using

Post-mortem

Adding the Latin name of the species was a nice touch, I thought (not my idea); it adds an air of verisimilitude to the problem.

I suppose that this might be the basis of a method of estimating the population of voles — rather clever really. I don’t know how well it works in practice though. The assumption mentioned earlier, that picking one set of voles of size 200 is just as likely as picking any other set, surely relies on perfect mixing of the marked voles, which would be rather difficult to achieve (especially as female voles can be highly territorial).

You might be asking yourself why it was OK to ﬁnd the maximum by setting ${p}_{N}\approx {p}_{N-1}$. This is a standard method, but of course it only works if the distribution is one-humped, like a normal distribution. An alternative approach would have been to approximate the distributions using Stirling’s approximation, which at its most basic is

$lnN!\approx NlnN-N\phantom{\rule{2.77695pt}{0ex}}.$

This gives exactly the same equation as the ${p}_{N}\approx {p}_{N-1}$ method, and shows that the distribution is indeed one-humped.