## Covid-19 tests: probabilities

Bayes’ Theorem is applied to medical tests, to calculate the probability of being infected with a virus, given a positive or negative test result. What drives the uncertainty is false negative results, or false positive results. In this article, I give a practical outline as to how one can interpret one’s test result, after calculating the relevant probability using Bayes’ Theorem.

To start off with, we need two estimates. For a negative covid-19 test, we need the rate of false negative results, and the current actual prevalence of the disease in the community. On the other hand, for a positive covid-19 test, we need the rate of false positives, and the current prevalence of the disease. False outcomes in tests vary according to the laboratory doing the test, and probably also the skill with which each individual test is carried out, but, for the sake of a rational understanding of the usefulness of these tests, we can use common statistics to calculate feasible probabilities.…

By | January 1st, 2021|News|0 Comments

## A simple introduction to causal inference

Introduction

Causal inference is a branch of Statistics that is increasing in popularity. This is because it allows us to answer questions in a more direct way than do other methods. Usually, we can make inference about association or correlation between a variable and an outcome of interest, but these are often subject to outside influences and may not help us answer the questions in which we are most interested.

Causal inference seeks to remedy this by measuring the effect on the outcome (or response variable) that we see when we change another variable (the ‘treatment’). In a sense, we are looking to reproduce the situation that we have when we do an designed experiment (with a ‘treated’ and a ‘control’ group). The goal here is to have groups that are otherwise the same (with regard to factors that might influence the outcome) but where one is ‘treated’ and the other is not.…

## Correlation vs Mutual Information

This post is based on a (very small) part of the (dense and technical) paper Fooled by Correlation by N.N. Taleb, found at (1)

Notes on the main ideas in this post are available from Universidad de Cantabria, found at (2)

The aims of this post are to 1) introduce mutual information as a measure of similarity and 2) to show the nonlinear relationship between correlation and information my means of a relatively simple example

Introduction

A significant part of Statistical analysis is understanding how random variables are related – how much knowledge about the value of one variable tells us about the value of another. This post will consider this issue in the context of Gaussian random variables. More specifically, we will compare- and discuss the relationship between- correlation and mutual information.

Mutual Information

The Mutual Information between 2 random variables is the amount of information that one gains about a random variable by observing the value of the other.…

## The Objective Function

In both Supervised and Unsupervised machine learning, most algorithms are centered around minimising (or, equivalently) maximising some objective function. This function is supposed to somehow represent what the model knows/can get right. Normally, as one would expect, the objective function does not always reflect exactly what we want.

The objective function presents 2 main problems: 1. how do we minimise it (the answer to this is up for debate and there is lots of interesting research about efficient optimisation of non-convex functions and 2) assuming we can minimise it perfectly, is it the correct thing to be minimising?

It is point 2 which is the focus of this post.

Let’s take the example of square-loss-linear-regression. To do so we train a linear regression model with a square loss $\mathcal{L}(\mathbf{w})=\sum_i (y_i - \mathbf{w}^Tx_i)^2$. (Where we are taking the inner product of learned weights with a vector of features for each observation to predict the outcome).…

Introduction

A key consideration when analysing stratified data is how the behaviour of each category differs and how these differences might influence the overall observations about the data. For example, a data set might be split into one large category that dictates the overall behaviour or there may be a category with statistics that are significantly different from the other categories that skews the overall numbers. These features of the data are important to be aware of and go find to prevent drawing erroneous conclusions from your analysis. Context, the source of the data and a careful analysis of the data can prevent this. Simpson’s paradox is an interesting result of some of these effects.

Simpson’s paradox is observed in statistics when a trend is observed in a number of different groups but it is not observed in the overall data or the opposite trend is observed.

Observing the overall data might therefore lead us to draw a conclusion, but when the data is grouped we might conclude something different.…

## The (Central) Cauchy distribution

The core of this post comes from Mathematical Statistics and Data Analysis by John A. Rice which is a useful resource for subjects such as UCT’s STA2004F.

Introduction

The Cauchy distribution has a number of interesting properties and is considered a pathological (badly behaved) distribution. What is interesting about it is that it is a distribution that we can think about in a number of different ways*, and we can formulate the probability density function these ways. This post will handle the derivation of the Cauchy distribution as a ratio of independent standard normals and as a special case of the Student’s t distribution.

Like the normal- and t-distributions, the standard form is centred on, and symmetric about 0. But unlike these distributions, it is known for its very heavy (fat) tails. Whereas you are unlikely to see values that are significantly larger or smaller than 0 coming from a normal distribution, this is just not the case when it comes to the Cauchy distribution.…

Gallery

## p-values (part 3): meta distribution of p-values

Introduction

So far we have discussed what p-values are and how they are calculated, as well as how bad experiments can lead to artificially small p-values. The next thing that we will look at comes from a paper by N.N. Taleb (1), in which he derives the meta-distribution of p-values i.e. what ranges of p-values we might expect if we repeatedly did an experiment where we sampled from the same underlying distribution.

The derivations are pretty in depth and this content and the implications of the results are pretty new to me, so any discrepancies/misinterpretations found should be pointed out and/or discussed.

Thankfully, in this video (2) there is an explanation that covers some of what the paper says as well as some Monte-Carlo simulations. My discussion will focus on some simulations of my own that are based on those that are done in the video.

We have already discussed what p-values mean and how they can go wrong.…

## p-values (part 2) : p-Hacking Why drinking red wine is not the same as exercising

What is p-hacking?

You might have heard about a reproducibility problem with scientific studies. Or you might have heard that drinking a glass of red wine every evening is equivalent to an hour’s worth of exercise.

Part of the reason that you might have heard about these things is p-hacking: ‘torturing the data until it confesses’. The reason for doing this is mostly pressure on researchers to find positive results (as these are more likely to be published) but it may also arise from misapplication of Statistical procedures or bad experimental design.

Some of the content here is based on a more serious video from Veritasium: https://www.youtube.com/watch?v=42QuXLucH3Q. John Oliver has also spoken about this on Last Week Tonight, for those who are interested in some more examples of science that makes its way onto morning talk shows.

p-hacking can be done in a number of ways- basically anything that is done either consciously or unconsciously to produce statistically significant results where there aren’t any.…

## A quick argument for why we don’t accept the null hypothesis

Introduction

When doing hypothesis testing, an often-repeated rule is ‘never accept the null hypothesis’. The reason for this is that we aren’t making probability statements about true underlying quantities, rather we are making statements about the observed data, given a hypothesis.

We reject the null hypothesis if the observed data is unlikely to be observed given the null hypothesis. In a sense we are trying to disprove the null hypothesis and the strongest thing we can say about it is that we fail to reject the null hypothesis.

That is because observing data that is not unlikely given that a hypothesis is true does not make that hypothesis true. That is a bit of a mouthful, but basically what we are saying is that if we make some claim about the world and then we see some data that does not disprove this claim, we cannot conclude that the claim is true.…

## p-values: an introduction (Part 1)

The starting point

This is the first of (at least) 3 posts on p-values. p-values are everywhere in statistics- especially in fields that require experimental design.

They are also pretty tricky to get your head around at first. This is because of the nature of classical (frequentist) statistics. So to motivate this I am going to talk about a non-statistical situation that will hopefully give some intuition about how to think when interpreting p-values and doing hypothesis testing.

My New Car

I want to buy a car. So I go down to the second hand car dealership to get one. I walk around a bit until I find one that I like.

I think to myself: ‘this is a good car’.

Now because I am at a second-hand car dealership I find it appropriate to gather some data. So I chat to the lady there (looks like a bit of a scammer, but I am here for a deal) about the car.…