I teach a probability and statistics for engineers course this Fall term. One challenging task for students is to correctly pick the most appropriate probability distribution function for a particular problem. It requires them to think about its underlying probability structure and to precisely extract every bit of information from sentences of the problem. They learn that words "at least" or "given that" carry significant implications.
I cover standard discrete probability functions: (i) binomial, (ii) geometric, (iii) negative binomial, (iv) hypergeometric, (v) Poisson, and (vi) uniform, while for continuous probability functions: (vii) uniform, (viii) normal, (ix) exponential, (x) gamma, (xi) Weibull, (xii) lognormal, and (xiii) beta.
I am going to summarize these statistical distribution functions by simple illustrations to highlight the thought process that underlies the selection of the most appropriate distribution function.
1. The first group of the 13 distribution functions is the discrete uniform and the continuous uniform. They are picked if we know only the possible outcomes of a problem without a priori information on their individual likelihood. The outcomes are distinct from each other and are consequently thought to be independent of each other. In other words, a presence of one outcome does not influence the likelihood of another outcome.
We use the discrete uniform if we can count these outcomes as pure numbers (i.e., integers). For instance, a possible outcome of buying grocery is 2 apples. We use the continuous uniform if we deal with intervals, whether distance or time or something else. 12 km distance is an interval, thus a continuous variable. If we use the sign posts along the 12 km distance as counters, then we ought to use the discrete uniform.
2. The second group is headed by the binomial. We need this function to describe the probability of r successes occuring in x attempts. The binomial does not care the sequence of these r successes. For example, 3 successes and 1 failure can be described as p
3(1–p), where p is the probability of a success. If we want 3 successes first and then followed by 1 failure – precisely in that order – then the expression p
3(1–p) is equal to its probability. However, in many problems we do not care the sequence of appearance of the 3 successes. So, the sequence can be one of these 4: ppp(1–p), pp(1–p)p, p(1–p)pp, (1–p)ppp. Thus, there are 4 possible outcomes. The probability of having 3 successes and 1 failure in which ever order they appear is thus 4ppp(1–p).
Binomial function leads to Poisson function if the probability of success, p, is small. How small? It depends on the number of attempts x. Roughly, px should be several orders of magnitude smaller than x and x needs to be large (i.e., 1000 or more).
Binomial function leads to normal function if the number of attempts is large and our data are expressed in continuous variables, such as time duration or distance. The normal function requires that we have average and standard deviation information. The normal function can also be used – since it approximates binomial – to approximate binomial and Poisson function.
Lognormal function can be thought of as a variation of normal function. The probability of an outcome can be described by binomial function, but there is an exponential function relationship connecting the outcome probability and the binomial function. (Complicated, eh?)
3. The third group is headed by negative binomial. This group also describes the probability of r successes in x attempts. But we impose a peculiar sequence of appearance of the successes: we want 1 out of r successes to occur last in the x attempts. This means ppp(1–p) is ruled out as a possible outcome since the last attempt results in a failure of probability (1–p). Why do we need this peculiar sequence? This requirement describes "waiting time" problems. It describes the probability of waiting for a computer to malfunction, for example. It describes the probability of train arrival, and so on.
If we want 1 success only, the negative binomial function is reduced to the geometric function. From the negative binomial we can get the exponential function.
If our variable is continuous, then the negative binomial becomes the gamma distribution. (Gamma function is not appropriate name since it is referred to one well-known function.) Exponential function can be obtained from the gamma distribution when we want 1 success only.
The exponential function can also be derived from Poisson, so we have an overlap between the second and third group. This overlap occurs because we can describe possible outcomes of having at least 1 success using either Poisson (thus, binomial) or negative binomial function. Probability having at least 1 success is the opposite of probability of having no success. The description of a no-success outcome: (1–p)
x for x attempts is the same for either binomial or negative binomial.
Weibull function can be thought of as an improvement of exponential function since the latter describes a lifetime of a spare part that does not age. A spare part made of metals ages with time. Such spare part, such as a ball bearing, degrades over time and does not fail suddenly.
4. The fourth group is hypergeometric function. It describes the probability of sequentially picking r objects from a basket filled with x objects, given that this population of x objects is divided into two classes: "success" and "failure" objects. It is similar to binomial, but hypergeometric imposes a condition that p, the probability of success, is not constant: p depends on the remaining objects in the basket as we sequentially remove one object after another to yield a total of r objects.
It is expected, therefore, that if the number of objects in the basket is very large, then taking 1 object out, will not significantly change p. In this case, p can be considered constant if we do not take too many objects from the basket, i.e., r is much smaller than x. Hypergeometric function can thus be approximated by binomial when x is very large and r is much smaller.
5. The fifth group is beta function. It is related to binomial, but instead of concerning with the number of successes in a given number of attempts, we actually want to know what is the most likely probability of success given only information on the number of successes. The beta function – which is from the beta integral – tells us how we can guess the most fair probability based on incomplete information.