Jump to content

Everything is Predictable

From Slow Like Wiki

Introduction: A Theory of Not Quite Everything

  • All that we do all the time is predict the future. We couldn't function if we couldn't.
  • As dictated by Bayes' theorem, your response to new information is influenced by the beliefs you already hold.

1. From the Book of Common Prayer to the Full Monty Carlo

  • "An Essay towards solving a Problem in the Doctrine of Chances" - published posthumously.
  • Derivative - the rate of change of a slope on a graph - lets you work out the speed for any given distance or time.
  • Second derivative - divide speed by your time and find your acceleration
  • The study of probability starts in the 16th century with Gerolamo Cardano thinking about dice rolls:
    • Antoine Gombaud 1654
    • Blaise Pascal - The idea is not to look at the chances that something would happen, but to look at the chances it wouldn't happen.
    • Pierre de Fermat
  • The great insight of probability theory: that we should look at the possible outcomes from a given situation, not what has gone before.
  • Pascal's Triangle - Simplifies working out probabilities of outcomes with binomial (50/50) distribution.
  • When you're trying to work out how likely something is, we need to talk about the number of outcomes that result in what you're talking about and the total number of possible outcomes. The probability of some event is the number of ways that event can occur, divided by the total number of things that can occur.
  • Outside of games, what is the probability of the world being a certain way, given the results that we're seeing?:
    • Sampling probabilities - What can we predict about a sample of something, given what we know about the whole?
    • Inferential probabilities - What can we know about the whole, given a sample we've taken?
  • Jacob Bernoulli - You can never be truly certain of things. There are three components to the question and you can optimize two:
    • How big a sample do you take?
    • How close to the true answer do you need to be?
    • How confident in your answer do you need to be?
  • de Moivre - the "normal distribution" or "bell curve". The accuracy of your estimate grows in proportion to the square root of your sample size.
  • Standard deviation - A measure of how spread out your data is around the mean (average). Calculated by taking the variance from the mean (positive or negative), squaring it (to make them all positive) and then taking square root of the average of those numbers
    • Example 1: Three children are 157, 160, and 163cm. Variance squared are 9, 0, and 9, average is 6, and square root of 6 is approx 2.4. SD is 2.4 and so the shorter child is 3/2.4 = 1.25 SD below the mean. The taller child is 1.25 SD above the mean.
    • Example 2: Three children are 220, 130, and 130m. Variance squared are 3600, 900, and 900, average is 1800, and square root of 1800 is approx 42.4. SD is 42.4 and so the two shorter children are 30/42.4 = 0.7 SD below the mean. The taller child is 60/42.4 = 1.4 SD above the mean.
    • With normally distributed data and a sufficiently large sample, you can reliably predict what percentage of results fall within a given distance of the mean. In general:
      • 68.27% will be within 1 SD
      • 95.43% will be within 2 SD
      • 99.73% will be within 3 SD
    • Inverse probability - Given the results I've seen, what can I say about my hypothesis?
    • For Bayes, probability is an expression of our lack of knowledge about the world. Probability is subjective. It's a statement about our ignorance and our best guesses of the truth. It's not a property of the world around us, but of our understanding of the world. You must take into account how likely you thought the hypothesis was in the first place, ie take your subjective beliefs into account.
    • Pierre-Simon Laplace independently arrived at a similar conclusion in 1774.
    • Adolphe Quetelet (1896-1874) was interested in the "average man", producing normal distributions for all kinds of attributes of people. But his finding seemed to be in conflict with free will, with our behaviors and choices as the products of our attributes.
    • Bayes vs Frequentism:
      • Bayes asks - How likely is the hypothesis to be true, given the data I've seen?
        • Treats probability as subjective, a statement about our ignorance of the world.
      • Frequentism asks - How likely am I to see this data, assuming a given hypothesis is true?
        • Treats probability as objective, a statement about how often some outcome will happen, if you did it a huge number of times.
    • The problem of Bayesian priors is that they're subjective - a statement not about the world, but about our own knowledge and ignorance. Frequentists argue that if you don't know which outcome is the most likely, then you should treat them as equally likely.
    • Francis Galton - First to explain regression to the mean. Coined the phrase "nature and nurture".
    • Statistical Significance -
      • A p-value is the likelihood of seeing results at least as extreme as those you've seen, given the null hypothesis. P-values of
      • The null hypothesis is the hypothesis that whatever effect you're looking for isn't real. Eg hair color has no effect on soup eating.
    • All our lives, we are in a sense betting. Whenever we go to the station, we are betting that a train will really run, and if we had not a sufficient degree of belief in this, we should decline the bet and stay at home.

2. Bayes in Science

  • In 2011, science underwent the replication crisis.
    • Hypothesizing after results are known or HARKing or p-hacking.
    • Scientific literature prioritizes novel findings, incentivizes (even if unconsciously) p-hacking.
  • Popper's philosophy of science said that you never prove a scientific hypothesis - you only disprove it or fail to do so. The Bayesian idea that you can build up evidence for or against is very much opposed to it.
  • Hume: "All our experimental conclusions proceed upon the supposition that the future will be conformable to the past."
  • Popper: "We choose the theory which best holds its own in competition with other theories; the one which, by natural selection, proves itself the fittest to survive. This will be the one which not only has hitherto stood up to the severest tests, but the one which is also testable in the most rigorous way." He called such a theory "corroborated".
  • Instinctively, at least, scientists think like Bayesians.
  • In frequentist experiments, the p-value can rise and fall suddenly as the sample size gets bigger. for Bayesians, though, there is always prior data and so each new data point coming in moves your opinion much less, and form part of the new prior for your next bit of information.
  • Bayesian techniques don't just reject or accept the null hypothesis - they don't just say yes or no to a hypothesis, but give degrees of belief to a range of possible realities. In reality, there is no such thing as a null hypothesis, or rather the null hypothesis is always, ultimately false.
  • Bayesian's make an estimate of the size of an effect and give a probability distribution:
    • If your prior is very strong - your curve is really tall and narrow - and the new data is fairly weak, giving a likelihood curve that is low and wide, then the resulting curve will look more like the prior.
    • If your prior is weak but you've got really good data, so your likelihood curve is tall and pointy, the new data will wash our the prior, and the posterior will be more like the likelihood.
    • If the posterior curve is tall and pointy compared to the prior, then you have something noteworthy and worth following up.

3. Bayesian Decision Theory

4. Bayes in the World

5. The Bayesian Brain

Conclusion: Bayesian Life