Statisticians have long been separated into two camps as to how they philosophically interpret their trade. These schools of thought are usually called Frequentists and Bayesians.

Frequentists believe that a probability, p\in[0~ 1] , associated with a specific possible outcome of an observable occurrence or process, is simply telling you that, could you observe this occurrence (or process) infinitely many times, the fraction of such observations that would yield that specific outcome is p . Using the age-old coin toss example: tossing the coin is the occurrence or process and recording a Heads or Tails are the two observation. The number 0.5 \left(P(\text{Tails})=0.5=P(\text{Heads})\right) tells a Frequentist that, in the pursuit of infinitely many coin tosses, the ratio of Heads recorded to the number of tosses performed asymptotically approaches 0.5. And that’s all! The value should not be interpreted as the most likely outcome for the next observation or sample taken from the process (though I’ve always wondered how a Frequentist would gamble…). If taken to its logical extreme, the Frequentist interpretation effects how one should reason about probabilities, but such a discussion would be, aside from boring, beyond the scope of this article.

In admittedly subtle contrast, those of the Bayesian school of thought say everything has a probability of occurring. This is a direct conclusion from Bayes’ rule: to know how the probability of event a is effected by event b , you must also know the probability of events b and a.

Of course in reality, both Frequentists and Bayesians make use of Bayes’ Rule which is the only correct way to reason about conditional probabilities. Bayes Rule is

p(a|b)=\frac{p(b|a)p(a)}{p(b)},

where p(a|b) is the conditional probability of a given b : i.e. the probability associated with a given that b exists or has occurred. Bayes’ Rule follows from a basic law of probability: since the probability of both a and b happening, p(a,b) must be the same whether we know that a happened first or that b happened first:

p(a|b)p(b)=p(a,b)=p(b|a)p(a).

So why are you a Bayesian? Well suppose you are being accosted by a large, noisy, flying insect (this is a fairly realistic scenario here in Africa) and, to add some trauma, suppose it’s night time. How does your brain tell where the danger is coming from?

Your brain has two sources of information available to it: you can both hear the insect and, to some extent, see it. But which one does it trust? Vision? But it’s dark and you can hardly see! Hearing? But this is hardly precise… The answer is BOTH!

We can model this mathematically using Bayes’ rule. Suppose your eyes are telling you that the insect is at point \mu_v^{ }, but you can’t be sure, so you say it’s at \mu_v^{ } give or take \sigma_v^{ }. On the other hand, it sounds like the insect is at position \mu_a, give or take \sigma_a.

Both the auditory and visual systems are giving your brain a probability distribution for where that sensory system detects the the insect is, given that there is an insect out there. In this case, let’s express these beliefs about where the insect is with Gaussian distributions1, centred at the the best-guess values and each with their respective variances.

So the auditory system is giving you a distribution for where the source of the noise is, given that it’s actually at X_a (which neither we, nor the insect, will ever truly know). Similarly, the visual system gives you a distribution over where the visually detected object is, given that it’s at X_v.

Of course, your wily brain has every reason to believe these two ‘objects’ are in fact the same vicious winged buzzing beast! So assume X_a=X_v=X. How do we now best infer what X really is? Or rather, what’s our best guess for the insect’s location?

Figure 1 gives a schematic idea of what’s going on: given that there is a vicious killer insect in the world at ‘true’ position X the signals from our auditory and visual systems give us uncertain information (i.e. distributions) about where point X is. So we write:

p(A|X)=\mathcal{N}(\mu_a,\sigma_a^2) (distribution of insect’s position from auditory cue),
p(V|X)=\mathcal{N}(\mu_v,\sigma_v^2) (distribution of insect’s position from visual cue).

Where p(A|X) and p(V|X) are distributions over the position of the auditory and visual source locations respectively. Here, \mathcal{N}(a,b) denotes a Gaussian distribution with mean a and variance b.

Killer insect location . If X marks the 'true' position of the signal source, audio and visual cues give slightly different distributions for where the source might be.

Figure 1: Killer insect location . If X marks the ‘true’ position of the signal source, audio and visual cues give slightly different distributions for where the source might be. Notice the auditory distribution is a bit broader, as our spacial precision on auditory stimuli is generally worse than that from visual stimuli.

What’s our best guess for where the insect is? Bayes’ rule tells us:

p(X|A,V)=\frac{p(A,V|X)p(X)}{p(A,V)}=\frac{p(A|X)p(V|X)p(X)}{p(A,V)}

Where we’ve assumed the independence of the auditory and visual cues2. If we further assume that the prior, p(X) is Gaussian too (p(X)=\mathcal{N}(0,1)), we can compute this new distribution explicitly. Note we’re only interested in the factors with explicit X dependence, hence we write:

p(X|A)p(X|V)p(X)\propto exp\{\frac{-1}{2}\left(\frac{(X-\mu_a)^2}{\sigma_a^2}+\frac{(X-\mu_v)^2}{\sigma_v^2}+X^2\right)\}
RHS = exp\{\frac{-1}{2}\left(X^2(1/\sigma_a^2+1/\sigma_v^2+1)-2(\mu_v/\sigma_v^2+\mu_a/\sigma_a^2)X\right)\}
RHS=exp\{\frac{-1}{2}\left(X^2(1/\sigma_a^2+1/\sigma_v^2+1)-2(\mu_v/\sigma_v^2+\mu_a/\sigma_a^2)X\right)\}

Now, if we define \sigma^{-2}:=(1/\sigma_a^2+1/\sigma_v^2+1), we can “complete the square” in X to notice that \mu_v/\sigma_v^2+\mu_a/\sigma_a^2=(\frac{\mu_v}{1+\sigma_v^2/\sigma_a^2+\sigma_v^2}+\frac{\mu_a}{1+\sigma_a^2/\sigma_v^2+\sigma_a^2})\sigma^{-2}. Now it makes sense to define \mu:=(\frac{\mu_v}{1+\sigma_v^2/\sigma_a^2+\sigma_v^2}+\frac{\mu_a}{1+\sigma_a^2/\sigma_v^2+\sigma_a^2}) so we can write

p(X|A)p(X|V)p(X)\propto exp\{\frac{-1}{2}\left(\frac{(X-\mu)^2}{\sigma^2}\right)+\text{const.}(X)\}

The reason we don’t care about the constant terms is the same reason we didn’t worry about the denominator of P(A,V) earlier: we’re interested in obtaining a probability distribution over X, so any multiplicative constants are absorbed into the normaliser. More technically, a Gaussian distribution is fully defined by its sufficient statistics; the mean and variance, so that’s all we need.

So out estimate for the position of the dreaded insect is given by a Gaussian distribution with mean \mu=\frac{\mu_a}{1+\sigma_a^2/\sigma_v^2+\sigma_a^2}+\frac{\mu_v}{1+\sigma_v^2/\sigma_a^2+\sigma_v^2} and variance \sigma^2=(1/\sigma_a^2+1/\sigma_v^2+1)^{-1}.

It’s worth taking a moment to think about what these results are telling us. Consider the definition of \mu: it’s really just a weighted sum of the expected position values obtained from audio and visual cues. In particular, if the audio cue was extremely uncertain (i.e. \sigma_a^2\to\infty) then \mu\to\frac{\mu_v}{1+\sigma_v^2}. This agrees with our intuition, that when one cue is very uncertain, our best estimate is only dependent on the more precise of the two. Similarly, in the same limit, \sigma^2\to(1/\sigma_v^2+1)^{-1}. Of course, if the audio signal was far more accurate (\sigma_a^2<<\sigma_v^2<1) then we'd naturally ignore the uncertainty in the visual signal: \sigma^2\approx(1/\sigma_a^2)^{-1}.

So Bayesian inference is the natural way to integrate different beliefs (probabilities) about something, but what's that got to do with you? Well, there is a lot of experimental evidence that this exactly what your brain does! In particular, Bayesian models like the simple one above don't just tell us the best prediction given the information, but they tell us how uncertain we should be. Experimentalists have found that similar models correctly predict how often people get spatial location tasks right/wrong given the precision of the visual and tactile data they receive[1]! There are many other ways in which Bayesian inference appears in neural function, so I encourage the interested reader to follow the references below.

References

1. Pouget, A. et al. Probabilistic Brains: knowns and unknowns, Nature Neuroscience vol. 16, no. 9 Sept. 2013.
2. Doyi, K. et al. Bayesian Brains: Probabilistic Approaches to Neural Coding, MIT Press 2007.

Footnotes

1. In reality, these variable should be three dimensional, since your visual system locates points in three dimensions. For simplicity here, we’re sticking to one.
2. This is not a particularly realistic assumption, and much evidence indicates otherwise. But correlations can be taken into account in this framework.

How clear is this post?