## The Wisdom of the Crowds

This content comes primarily from the notes of Mark Herbster (contributed to by Massi Pontil and John Shawe-Taylor) of University College London.

Introduction

The Wisdom of the Crowds, or majority rule and related ideas tend to come up pretty often. Democracy is based (partly) on the majority of people being able to make the correct decision, often you might make decisions in a group of friends based on what the most people want, and it is logical to take into account popular opinion when reasoning on issues where you have imperfect information. On the other hand, of course, there is the Argumentum ad Populum fallacy which states that a popular belief isn’t necessarily true.

This is idea appears also in Applied Machine Learning – ensemble methods such as Random Forests, Gradient Boosted Models (especially XGBoost) and stacking of Neural Networks have resulted in overall more powerful models. This is especially notable in Kaggle competitions, where it is almost always an ensemble model (combination of models) that achieves the best score.…

## Automatic Differentiation

Much of this content is based on lecture slides from slides from Professor David Barber at University College London: resources relating to this can be found at: www.cs.ucl.ac.uk/staff/D.Barber/brml

What is Autodiff?

Autodiff, or Automatic Differentiation, is a method of determining the exact derivative of a function with respect to its inputs. It is widely used in machine learning- in this post I will give an overview of what autodiff is and why it is a useful tool.

The above is not a very helpful definition, so we can compare autodiff first to symbolic differentiation and numerical approximations before going into how it works.

Symbolic differentiation is what we do when we calculate derivatives when we do it by hand, i.e. given a function $f$, we find a new function $f'$. This is really good when we want to know how functions behave across all inputs. For example if we had $f(x) = x^2 + 3x + 1$ we can find the derivative as $f'(x) = 2x + 3$ and then we can find the derivative of the function for all values of $x$.…

Gallery

## What did you expect? Some notes on the Expectation operator.

Introduction

A significant amount of focus in statistics is on making inference about the averages or means of phenomena. For example, we might be interested in the average number of goals scored per game by a football team, or the average global temperature or the average cost of a house in a particular area.

The two types of averages that we usually focus on are the sample mean from a set of data and the expectation that comes from a probability distribution. For example if three men weigh 70kg, 80kg, and 90kg respectively then the sample mean of their weight is $\bar x = \frac{70+80+90}{3} = 80$. Alternatively, we might say that the arrival times of trains are exponentially distributed with parameter $\lambda = 3$ we can use the properties of the exponential distribution to find the mean (or expectation). In this case the mean is $\mu = \frac{1}{\lambda} = \frac{1}{3}$.

It is this second kind of mean (which we will call the expectation from now on), along with the generalisation of taking the expectation of functions of random variables that we will focus on.…

## What’s the shortest known Normal Number?

Well, the answer is that it has to be infinitely long, but the question is what is the most compact form of a Normal Number possible.

I was motivated to look into this from a lovely Numberphile video about all the real numbers.

Normal numbers in base 10 are those for which, in the base 10 decimal expansion, you can find every natural number.

Champernowne’s number is a very simple example of this where it is simply written as:

0.12345678910111213…etc.

I thought that it might be interesting to see if one could write a more compact Normal Number, but using a similar procedure to Champernowne. I haven’t seen this done anywhere else. For example, in the above expression, you don’t need to include the 12 explicitly as it’s already there at the beginning. You could write

0.12345678910113

So you skip the 12, and also 11 and 13 becomes 113. We will do all of this just with the list of digits, rather than the number in base 10.…

## A quick argument for why we don’t accept the null hypothesis

Introduction

When doing hypothesis testing, an often-repeated rule is ‘never accept the null hypothesis’. The reason for this is that we aren’t making probability statements about true underlying quantities, rather we are making statements about the observed data, given a hypothesis.

We reject the null hypothesis if the observed data is unlikely to be observed given the null hypothesis. In a sense we are trying to disprove the null hypothesis and the strongest thing we can say about it is that we fail to reject the null hypothesis.

That is because observing data that is not unlikely given that a hypothesis is true does not make that hypothesis true. That is a bit of a mouthful, but basically what we are saying is that if we make some claim about the world and then we see some data that does not disprove this claim, we cannot conclude that the claim is true.…

## Cantor–Schröder–Bernstein Theorem

Knowledge this posts assumes: What is a set, set cardinality, a function, an image of a function and an injective (one-to-one) function.

David Hilbert imagines a hotel with an infinite number of rooms. In this hotel, each room can only be occupied by one guest, and each room is indeed occupied by exactly one guest. What happens if more guests show up? Can they be accommodated for?

PAUSE: WHAT DO YOU THINK AND WHY?

Suppose we propose they cannot be accommodated for, since all the rooms are occupied. Hilbert then claims that he can define the functions $f:A \mapsto B,$ and $g:B \mapsto C,$ where $A$ is a set containing all current guests, and $f$ simply maps each guest to a room in the set $B$, and $g$ maps each room in $B$ to a new one in $C$. Notice that these functions must be injective, since if a room contains two different guests, those two different guests must be the same guest; recall $f(a) = f(b) \rightarrow a = b$.…

## 1.6 Partitions

Recall the  relation $\equiv \text{ mod} (4)$ on the set $\mathbb{ N}.$

One of the equivalence classes is $[0] = \{ ..., -8, -4, 0, 4, 8, ...\}$ which is equivalent to writing $[0] = [4] = [-4] = [8] = [-8] ...$

We could do this because the equivalence class collects all the natural numbers that are related to zero under the relation $\equiv \text{ mod} (4)$

The following theorem generalises this idea for any relation $\equiv \text{ mod} (n)$ on the set $\mathbb{ N}:$ for the integer $n.$

Let $R$ be an equivalence relation on set $A.$ If $a, b \in A,$ then $[a] = [b] \iff aRb.$

Essentially, equivalence classes  $[a] = [b]$ are equal if the elements  $a, b \in A,$ are related under the relation $R.$ And simultaneously, knowing that elements $a, b \in A,$ are related under $R$ means their equivalence classes  $[a] = [b]$ are equal.

An equivalence class  $\equiv \text{ mod} (n)$ divides set a $A$ into $n$ equivalence classes. We call this situation a partition of set $A.$

A partition of a set $A$ is defined as a set of non-empty subsets of $A,$ such that both these conditions are simultaneously satisfied:

(i) the union of all these subsets equals $A.$

(ii) the intersection of any two different subsets is

Let’s return to our example: $\equiv \text{ mod} (4)$ on the set $\mathbb{ N}.$ We could represent this set as:

• NOTE: Each equivalence class above represents an infinite set and despite the drawing suggesting $[0]$ is larger than $[3]$ for instance, this is not true.

## Review: Calculus Reordered

Book title: Calculus Reordered: A History of the Big Ideas
Author : David M. Bressoud

Princeton University Press
Link to the book: Calculus Reordered: A History of the Big Ideas

Discussions on the history of different fields are usually dry, wordy and generally, when you are studying the field, hard to read. This is because they are usually geared towards the general audience, and in doing so most authors tend to strip away the very exciting technical details. I expected the same treatment from the author, but I was pleasantly surprised.

The book contains $5$ chapters, which are the following:

1) Accumulations
2) Ratios of Change
3) Sequences of Partial Sums
4) The Algebra of Inequalities
5) Analysis

Each of these chapters has a central theme that is being covered, but they are not at all disjoint. For instance, the last three contain the history of concepts that would normally be found in a first course for Real Analysis, while the first two are essentially the more applied spectrum to serve as some form of motivation for going through all this trouble, although they can certainly stand on their own.…

## Investigating Practical Ordering of Grids

In Reinforcement Learning there is an environment known as Gridworld. In this environment you have a grid and there is an agent that learns how to find the shortest path from one cell to another. The theme of reinforcement learning is that you do not want to hard-code the rules, but you want the agent to explore until it can find a set of moves that are optimal for the problem at hand. Usually you can alter the grids to make the tasks tough–set ‘traps’, add obstacles, etc. We are considering grids with obstacles, and an interesting question that came up is the following,

Given two grids of size ${N}$, say ${G, \,G'}$ which have respectively ${k,l}$ obstacles where ${k,l\in \mathbb{N},\,k,l\geq 0}$, what are reasonable ways to put an order on the ‘complexity’ of the grids?

In other words, we want to be able to say that, for instance, in ${G}$ the agent will find the optimal path more easily than in ${G'}$ given any two grids ${G,G'}$.…

## 1.5 Equivalence classes (Infinite sets)

Let’s find the equivalence classes of the following finite set S:

Given $S = \{ -1, 1, 2, 3, 4 \},$ we can form the following relation $R = \{ (-1, -1), (1,1), (2,2), (3,3), (4,4), (1,3), (3,1), (2,4), (4,2) \}.$

Note: writing the relation $R$ on set $S$ in the following ways is equivalent:

$-1R-1, 1R1, 2R2, 3R3, 4R4, 1R3, 3R1, 2R4, 4R2$

or

$-1\le -1, 1 \le1, 2 \le2, 3 \le3, 4 \le4, 1 \le 3, 3 \le 1, 2 \le 4, 4 \le 2$

This relation, $R$ has been given the symbol $\le$ but it means “the same sign and parity” in this case. For instance, $(1,3)$ or $1 \le 3$ tells us that one and three are both odd and both have the same sign in set $A$ (both positive).

The equivalence classes for this relation are the following sets:

$\{ -1 \}, \{ 1, 3\} \text{ and } \{2, 4 \}$

We obtained the above equivalence classes by asking ourselves:

• How is the element $-1$ related to any other element in the set $S$ under the definition of $R?$

Since R is defined as “the same sign and same parity,” then we’re really asking ourselves whether $-1$ has the same sign as any other element in $S.$ Since all the other elements are positive, then $-1$ has the equivalence class containing only itself. Another question we would’ve asked ourselves is whether $-1$ is even or odd. …