On ‘Probability’, guesses and an 18th century mathematician’s insight

When considering risk, we invariably introduce and expect others to work with a concept of probability – in fact this is a key component of the lives of professionals working with risk.  Clearly, to be credible, those advising on, or working with risk must truly understand the concept and to be able to clarify what is meant by the word ‘probability’.

It is easy to define and discuss probability in terms of frequency – a simple concept, and one that most people have little difficulty with.  Normally explained in terms of the throw of dice or toss of a coin.  For example what is the probability of throwing heads in a hundred tosses of a 20c coin?  The maths is simple engough; by dividing the frequency of occurrence by the number of tosses and you get the probability of 1 in 2 – simple stuff.  And what is the probability of throwing a seven with the toss of two dice?   Divide the number of possible outcomes (1&6, 6&1, 2&5, 5&2, 3&4, 4&3) by the total number of possible outcomes (6 x 6 = 36) and you get the probability 1 in 6. Still not too complex.

Most people are comfortable with this way of thinking about probability as it tends to agree with common sense and works just fine in most cases.  However there is a fundamental problem with this approach and one that is of direct relevance to very many risk assessment problems.  Typically, when we quote probability in our every day lives and in some studies, we are referring to situations where we have considerable experience or exposure to, or that are based upon relatively large data sets or situations that are easily replicable or repeatable and thus quantifiable.  However, it is not uncommon, say within a ‘risk workshop’, to be referring to a one-off or otherwise unique occurrence that has not yet, or may not have occurred.  Even if it did it may not be repeatable, and just as importantly, the quoted probability can’t be tested – one can’t prove it.

Consider for example that;

  • The Governor of a Reserve Bank has concluded that she needs to raise interest rates as there is a high probability that the economy will continue to rise in the next quarter. Neither she or her advisors have numerous copies of their nation, complete with copies of the national economy, somewhere in a vault that they can tend to, monitor and watch over for the yet to occur quarter. So how can she or her staff make and then prove such a claim?
  • You are running a risk workshop in which a project manager claims that he project has a 95% chance of success, but there is only one project . How can he conclude this let alone prove her case, however the project turned out?  He could equally have said 10%, and yet still he could not prove one way or the other that he was right, and neither could anyone else.

So what does the assignment of a probability value to a unique, one-off event mean?  The economist John Keynes wrote exstensively on the subject of risk.  At one point he stated, “…it is difficult to find an intelligible account of the meaning of ‘probability’, or of how we are ever to determine the probability of any particular proposition.”  Although some mathematicians and philosophers have been working this challenge for more than two centuries and have made good progress, across most professional fields a take up and understanding of the concepts remains very limited.  Some sectors can reasonably model situations and multiply a particular condition many times and can thus prove a given ‘probability’ value (typically using fault-tree or statistical models).  However, except in the most technical applications, this can’t legitimately be done during most risk assessment problems in many decision making contexts.

The answer to the problem described by Keynes’ was actually already in existance, but little known and even less understood.  Anyone considering risk faces this situation but many will not have dwelled on it too long. The answer lies in the development of a numerical measure that expresses ‘the degree of rational belief in the correctness of a specific inference from a given hypothesis’.  This number – the probability – is invariably designated by the symbol [p], and is established without reference to a frequency of known events, counting occurrences, analysing data, or by use of statistics.  It should not however, be assigned arbitrarily, but by a rational process of argument and logic, though in many cases this is no more than an informed guess – hopefully based upon relevant knowledge and experience.

This concept of chance can be termed the ‘rational definition’ in order to distinguish it from the more determinant, frequency-based ‘statistical definition’.  We can think of ‘rational probabilities’ as what one would use down at the TAB – you can think about the outcome of an event, apply your knowledge of the game, and draw a conclusion including a degree of belief.  The probability being your belief as to the likelihood that a particular outcome will occur, with the degree of belief influencing the amount you are willing to wager.

OK, most of us can run with this concept, but when self doubt creeps in can we turn to something more for reassurance?  Here’s where a brilliant 18th Century insight comes in.

This particularly rigorous and proven method of defining rational probabilities is a theorem named after its creator, Thomas Bayes (1702 -61), an English Non-conformist minister who was also an outstanding mathematician.  His place in the history is assured by the increasing use of the term ‘Bayesian probability’.  Bayes wrote an essay setting out his theorem that was published after his death and that lay largely unrecognised and unvalued for many years.  Although not widely taught, it has become more widely known in recent years as its usefulness is now being recognised.  Its direct benefit to the study of economics, medical research and engineering can be expected to see it increasingly being taken up by many professions.

The basic, if unexpected, idea behind the Bayesian approach is that if there is no data from which to initially calculate probability, you start by assuming or estimating a probability (ies). Then we can use this assumption to calculate how likely various events are.  Finally, we can use this new data to update our assumptions – with the Bayesian theorem allowing us to do this. The key point is that you start by assuming some probabilities at the start. This is called the “prior probability” or “prior probability distribution”, or simply “prior and then work from that initial point.

The following is taken with only slight change from a particularly neat summary by Hans Christian von Baeyer, a Professor of Physics who uses it in his book on the importance of Information in Science.  He is making a different point but the explanation serves our purpose just as well here.

Bayes’ theorem answers the following question: Suppose you know, or believe you know, the probability that a certain conclusion follows from an initial assessment (let’s say as tabled during a normal risk assessment workshop).  Suppose further that new information is subsequently obtained and added to the assessment.  How do you then compute the updated probability that the conclusion is true, based on the combination of the original assessment and the new information?  The value of the simple formula Bayes derived in the answer to this question (that is very relevant to much risk assessment work) lies in its rigour.  Although some of its inputs (rationally defined probabilities) may involve guesswork, the way they fit together does not: the theorem is indisputable.  As mentioned above, the most important practical application of Bayes’ theorem is that it obliges you to estimate an initial probability – a tasked required for almost all risk assessments.  Given new information it then leads to the question: “how is the likelihood changed by the new information?”

 Bayes’ answer, as contained in his theorem, is a bit cumbersome to express, and so to initially understand.  However, here goes:

 The theorem states that the new probability [p’] (the posterior probability) for belief in the conclusion [C], based on added information [I], is equal to the prior probability times by a factor.  If that factor is greater than one, the new information increases rational belief in the conclusion; if less than one, the rational belief is diminished.  The factor, in turn, is the ratio of the two probabilities and is written [p(C¦I)] divided by [p(I)].  The number [p(C¦I)] is the key ingredient in the calculation and represents a kind of inverse of the desired answer: it is the probability that if the conclusion [C] was assumed, the information [I] would follow.  The final ingredient [p(I)] is the probability that the information [I] is true, regardless of any other assumptions.

Don’t worry too much if you couldn’t follow the above, here is an example that illustrates the strength of the theorem, in this case resulting in a surprising conclusion.

Suppose that a certain cancer has an incidence of one in a hundred, so the prior probability [p] that you have it is 1%.  Suppose further that a new test for cancer has been developed, and has an accuracy of 99%.  If you are unfortunate enough to test positive, an unsettling piece of information we can designate [I], what is the posterior probability [p’] that you really do have cancer?  How much should you worry?  The unexpected answer can be derived by use of Bayes’ theorem.

The number labelled [p(C¦I)] is the likelihood that if you really had cancer, the test result would be positive.  Since the test is so good, that probability is very high, so we set [p(C¦I)] at approximately equal to one.  Now for the last element, [p(I)].  Out of a hundred people, one is expected to have cancer, and would probably test positive, while approximately one other healthy person, will give a ‘false positive’ because the test errs once in one hundred cases. 

Therefore, on average two people in a hundred will test positive.  This means that when you take the test you can expect a 2% chance of a positive test outcome.  The multiplying factor in Bayes’ theorem is therefore approximately [p(C¦I)/p(I) = 1/(2%)].  The probability that you are really ill, calculated as the prior probability times the Bayes factor is [px = p(C¦I)/p(I) = (1%)/(2%) = 1/2].  The chance that you are ill is just 50% – or should that be the chance you are well is 50%, even though the test, which was inferred to be virtually infallible, was positive!  If this is extended to a sample of 10,000 people in which 99 have cancer, 198 people will be told they have cancer.

OK, aside from a few engineers and scientists, most of us shouldn’t need to turn to Bayes that often!  However, we do need to have a good handle on the concept of ‘rational probabilities’ not only so we can believe in our own work – even when based upon little more than guesses, but also so to be able to defend their use in our risk assessment studies.  It is reassuring to know that guesses – albeit considered guesses – can also have a sound mathematical basis!

Geraint Bermingham