Saturday, October 20, 2012

Beta Distribution


Out of the 5 probability distribution groups I discussed in my previous blog, one group I like the most is the beta distribution. It sits alone – like me when writing this blog – compared to the other 4 groups. Another group that stands alone is the hypergeometric distribution, but this can be approximated by binomial distribution at some point.

Beta distribution is still connected to binomial, but with a clever twist involving conditional probability. It is worth telling in this blog. The story goes like this.

Binomial distribution gives the probability of having r successes out of n trials, where the sequence of these r successes among the n trials does not matter. It is equal to n!/(r! (n–r)!) pr(1-p)n–r, and it is a conditional probability P(r | P = p) since it gives us the probability of getting r successes if there are n trials and the probability of success is p.

There are many situations where the value of p is unknown. For instance, if I work as a port authority's container inspection staff, I have to inspect thousands of incoming containers to the port. It is impossible to inspect all of these containers, so I have to sample only a fraction of them. As I sample them, I will find containers that have to be processed further and others that pass security and other checks. From the limited sampling I have done, I have to then determine what is the most likely total number of containers that actually need further processing. This data is useful if I would like to propose additional inspection staff, for example. In a situation like this, beta distribution becomes very useful.

Using example above, you can imagine that there are a lot of useful applications for beta distribution.

Beta distribution is obtained from the binomial using Bayes' theorem since it gives the conditional probability of probability, P(p | R = r), if there are r successes out of n trials from the sampling done. The Bayes' theorem states that P(p | R = r) P(r) = P(r | P = p) P(p). Since P(p) = 1 and P(r) = 1/(r + 1), we arrive at the beta distribution:

P(p | R = r) = (n+1)!/((n–r)! r!) pr (1–p)n–r.

It looks similar to the binomial distribution, but they are very different because p is the variable for beta distribution: it gives the probability distribution of probability of getting r successes out of n trials. In other words, to get a certain r successes for a given number of trials n, we can vary the probability p.

If the inspections are done many times, it is reasonable to seek the average of the probability p. But if the inspection is done once, it is the most likely probability p we need to find. Analyzing beta distribution further allows us to determine the optimum sampling size so that we can minimize errors and keep the inspection cost down.

The mean of p is equal to (r+1)/(n+2) with its attendant standard deviation. How about the most likely probability p? I will leave this for you to calculate.

No comments:

Post a Comment