Introduction

In this post we introduce two important concepts in multivariate calculus: the gradient vector and the directional derivative. These both extend the idea of the derivative of a function of one variable, each in a different way. The aim of this post is to clarify what these concepts are, how they differ and show that the directional derivative is maximised in the direction of the gradient vector.

The gradient vector, is, simply, a vector of partial derivatives. So to find this, we can 1) find the partial derivatives 2) put them into a vector.  So far so good. Let’s start this on some familiar territory: a function of 2 variables.

That is, let $f: \mathbb{R}^2 \rightarrow \mathbb{R}$ be a function of 2 variables, x,y. Then the gradient vector can be written as: $\nabla f(x,y) = \left [ {\begin{array}{c} \frac{\partial f(x,y)}{\partial x} \\ \frac{\partial f(x,y)}{\partial y} \\ \end{array} } \right]$

For a more tangible example, let $f(x,y) = x^2 + 2xy$, then: $\nabla f(x,y) = \left [ {\begin{array}{c} 2x + 2y \\ 2x \\ \end{array} } \right]$

So far, so good. Now we can generalise this for a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ taking in a vector $\mathbf{x} = x_1, x_2, x_3, \dots, x_n$.…

• Gallery

## The (Central) Cauchy distribution

The core of this post comes from Mathematical Statistics and Data Analysis by John A. Rice which is a useful resource for subjects such as UCT’s STA2004F.

Introduction

The Cauchy distribution has a number of interesting properties and is considered a pathological (badly behaved) distribution. What is interesting about it is that it is a distribution that we can think about in a number of different ways*, and we can formulate the probability density function these ways. This post will handle the derivation of the Cauchy distribution as a ratio of independent standard normals and as a special case of the Student’s t distribution.

Like the normal- and t-distributions, the standard form is centred on, and symmetric about 0. But unlike these distributions, it is known for its very heavy (fat) tails. Whereas you are unlikely to see values that are significantly larger or smaller than 0 coming from a normal distribution, this is just not the case when it comes to the Cauchy distribution.…

• Gallery

## K-means: Intuitions, Maths and Percy Tau

Much of this content is based on lecture slides from slides from Professor David Barber at University College London: resources relating to this can be found at: www.cs.ucl.ac.uk/staff/D.Barber/brml

The K-means algorithm

The K-means algorithm is one of the simplest unsupervised learning* algorithms. The aim of the K-means algorithm is, given a set of observations $\mathbf{x}_1, \mathbf{x}_2, \dots \mathbf{x}_n$, to group these observations into K different groups in the best way possible (‘best way’ here refers to minimising a loss/cost/objective function).

This is a clustering algorithm, where we want to assign each observation to a group that has other similar observations in it. This could be useful, for example, to split Facebook users into groups that will each be shown a different advertisement.

* unsupervised learning is performed on data without labels, i.e. we have a group of data points $x_1, \dots, x_n$ (scalar or vector) and we want to find something out about how this data is structured.…

By | September 7th, 2019|English|0 Comments

## What’s the shortest known Normal Number?

Well, the answer is that it has to be infinitely long, but the question is what is the most compact form of a Normal Number possible.

I was motivated to look into this from a lovely Numberphile video about all the real numbers.

Normal numbers in base 10 are those for which, in the base 10 decimal expansion, you can find every natural number.

Champernowne’s number is a very simple example of this where it is simply written as:

0.12345678910111213…etc.

I thought that it might be interesting to see if one could write a more compact Normal Number, but using a similar procedure to Champernowne. I haven’t seen this done anywhere else. For example, in the above expression, you don’t need to include the 12 explicitly as it’s already there at the beginning. You could write

0.12345678910113

So you skip the 12, and also 11 and 13 becomes 113. We will do all of this just with the list of digits, rather than the number in base 10.…

• Gallery

## p-values (part 3): meta distribution of p-values

Introduction

So far we have discussed what p-values are and how they are calculated, as well as how bad experiments can lead to artificially small p-values. The next thing that we will look at comes from a paper by N.N. Taleb (1), in which he derives the meta-distribution of p-values i.e. what ranges of p-values we might expect if we repeatedly did an experiment where we sampled from the same underlying distribution.

The derivations are pretty in depth and this content and the implications of the results are pretty new to me, so any discrepancies/misinterpretations found should be pointed out and/or discussed.

Thankfully, in this video (2) there is an explanation that covers some of what the paper says as well as some Monte-Carlo simulations. My discussion will focus on some simulations of my own that are based on those that are done in the video.

We have already discussed what p-values mean and how they can go wrong.…

## p-values (part 2) : p-Hacking Why drinking red wine is not the same as exercising

What is p-hacking?

You might have heard about a reproducibility problem with scientific studies. Or you might have heard that drinking a glass of red wine every evening is equivalent to an hour’s worth of exercise.

Part of the reason that you might have heard about these things is p-hacking: ‘torturing the data until it confesses’. The reason for doing this is mostly pressure on researchers to find positive results (as these are more likely to be published) but it may also arise from misapplication of Statistical procedures or bad experimental design.

Some of the content here is based on a more serious video from Veritasium: https://www.youtube.com/watch?v=42QuXLucH3Q. John Oliver has also spoken about this on Last Week Tonight, for those who are interested in some more examples of science that makes its way onto morning talk shows.

p-hacking can be done in a number of ways- basically anything that is done either consciously or unconsciously to produce statistically significant results where there aren’t any.…