Jump to content

Everything is Predictable: Difference between revisions

From Slow Like Wiki
Line 73: Line 73:


=== 4. Bayes in the World ===
=== 4. Bayes in the World ===
* People are confused by:
** The availability heuristic - How easily can I think of an example
** Conjunction fallacy - Where they think that two things happening can be more common than a single thing happening (which is logically false)
** Framing effects - Where the wording of a statement can dramatically impact the interpretation of it.
* Humans are amazingly rational if information is presented to us in ways that we are designed to process it.
* The gaze heuristic - is a shortcut for calculating the path of a ball - we do it automatically, fixing our gaze on the ball and adjusting our running speed so that the angle of gaze remains constant.
* We use other heuristics as shortcuts:
** Recency bias - Where we overweight more recent evidence
** Anchoring - Where the first thing we see tends to set our expectations
** Frequency bias - Where we shortcut to the thing we've seen most often.
* Making decisions under uncertainty is hard. We don't have access to all the information, and even if we did, integrating it all into the Bayes equation would be computationally impossible. So instead we use shortcuts and heuristics, and our instinctive decision-making, from a Bayesian perspective, isn't that bad. By and large, we have grey areas and weak spots, but we're kind of OK.
* Tetlock suggested that there are two distinct groups of experts:
** Hedgehogs - Who think the world is simple and can be explained and predicted simply using "one big idea". Hedgehogs tell nice, straightforward stories, which are easy to package for the media
** Foxes - Who think that the world is complicated - that the specifics and details of each situation matter, and that predictions are difficult and uncertain. Foxes do somewhat better at predicting things, with the top 2% called by Tetlock "Superforecasters".
* Good forecasters often use the "Fermi estimate" - You make several estimates of small things instead of one big estimate, and if there's no reason why those errors should be systematically high or low, then they will tend to cancel each other out. His calculation of the number of piano tuners in Chicago, broke down into the population of Chicago, the percentage of people that own pianos, and the time it takes to tune one once a year. He guessed 62.5 against the real answer, about 80.
* A huge amount of our public discourse comes down to our efforts to place things, groups, people, and concepts into categories. Are they fascist, a cult, a racist? But the categories are all fuzzy.
* Take the category games. you learned what defines "games" in a Bayesian way starting with one data point and then refining your category with further data points. As you learn to attach the label "game" to more and more concepts, you get more accurate estimates of the probability of seeing certain characteristics in those concepts. Your prior probability that games involve balls was low, then someone pointed out that hockey, football, tennis, cricket and ping-pong are all games, so you updated.
* This method works better than idealist philosophical notion.
* See also the paradox of the heap. When does a heap of sand from which you are removing grains turn into "not a heap"?


=== 5. The Bayesian Brain ===
=== 5. The Bayesian Brain ===


=== Conclusion: Bayesian Life ===
=== Conclusion: Bayesian Life ===

Revision as of 17:12, 18 November 2025

Introduction: A Theory of Not Quite Everything

  • All that we do all the time is predict the future. We couldn't function if we couldn't.
  • As dictated by Bayes' theorem, your response to new information is influenced by the beliefs you already hold.

1. From the Book of Common Prayer to the Full Monty Carlo

  • "An Essay towards solving a Problem in the Doctrine of Chances" - published posthumously.
  • Derivative - the rate of change of a slope on a graph - lets you work out the speed for any given distance or time.
  • Second derivative - divide speed by your time and find your acceleration
  • The study of probability starts in the 16th century with Gerolamo Cardano thinking about dice rolls:
    • Antoine Gombaud 1654
    • Blaise Pascal - The idea is not to look at the chances that something would happen, but to look at the chances it wouldn't happen.
    • Pierre de Fermat
  • The great insight of probability theory: that we should look at the possible outcomes from a given situation, not what has gone before.
  • Pascal's Triangle - Simplifies working out probabilities of outcomes with binomial (50/50) distribution.
  • When you're trying to work out how likely something is, we need to talk about the number of outcomes that result in what you're talking about and the total number of possible outcomes. The probability of some event is the number of ways that event can occur, divided by the total number of things that can occur.
  • Outside of games, what is the probability of the world being a certain way, given the results that we're seeing?:
    • Sampling probabilities - What can we predict about a sample of something, given what we know about the whole?
    • Inferential probabilities - What can we know about the whole, given a sample we've taken?
  • Jacob Bernoulli - You can never be truly certain of things. There are three components to the question and you can optimize two:
    • How big a sample do you take?
    • How close to the true answer do you need to be?
    • How confident in your answer do you need to be?
  • de Moivre - the "normal distribution" or "bell curve". The accuracy of your estimate grows in proportion to the square root of your sample size.
  • Standard deviation - A measure of how spread out your data is around the mean (average). Calculated by taking the variance from the mean (positive or negative), squaring it (to make them all positive) and then taking square root of the average of those numbers
    • Example 1: Three children are 157, 160, and 163cm. Variance squared are 9, 0, and 9, average is 6, and square root of 6 is approx 2.4. SD is 2.4 and so the shorter child is 3/2.4 = 1.25 SD below the mean. The taller child is 1.25 SD above the mean.
    • Example 2: Three children are 220, 130, and 130m. Variance squared are 3600, 900, and 900, average is 1800, and square root of 1800 is approx 42.4. SD is 42.4 and so the two shorter children are 30/42.4 = 0.7 SD below the mean. The taller child is 60/42.4 = 1.4 SD above the mean.
    • With normally distributed data and a sufficiently large sample, you can reliably predict what percentage of results fall within a given distance of the mean. In general:
      • 68.27% will be within 1 SD
      • 95.43% will be within 2 SD
      • 99.73% will be within 3 SD
    • Inverse probability - Given the results I've seen, what can I say about my hypothesis?
    • For Bayes, probability is an expression of our lack of knowledge about the world. Probability is subjective. It's a statement about our ignorance and our best guesses of the truth. It's not a property of the world around us, but of our understanding of the world. You must take into account how likely you thought the hypothesis was in the first place, ie take your subjective beliefs into account.
    • Pierre-Simon Laplace independently arrived at a similar conclusion in 1774.
    • Adolphe Quetelet (1896-1874) was interested in the "average man", producing normal distributions for all kinds of attributes of people. But his finding seemed to be in conflict with free will, with our behaviors and choices as the products of our attributes.
    • Bayes vs Frequentism:
      • Bayes asks - How likely is the hypothesis to be true, given the data I've seen?
        • Treats probability as subjective, a statement about our ignorance of the world.
      • Frequentism asks - How likely am I to see this data, assuming a given hypothesis is true?
        • Treats probability as objective, a statement about how often some outcome will happen, if you did it a huge number of times.
    • The problem of Bayesian priors is that they're subjective - a statement not about the world, but about our own knowledge and ignorance. Frequentists argue that if you don't know which outcome is the most likely, then you should treat them as equally likely.
    • Francis Galton - First to explain regression to the mean. Coined the phrase "nature and nurture".
    • Statistical Significance -
      • A p-value is the likelihood of seeing results at least as extreme as those you've seen, given the null hypothesis. P-values of
      • The null hypothesis is the hypothesis that whatever effect you're looking for isn't real. Eg hair color has no effect on soup eating.
    • All our lives, we are in a sense betting. Whenever we go to the station, we are betting that a train will really run, and if we had not a sufficient degree of belief in this, we should decline the bet and stay at home.

2. Bayes in Science

  • In 2011, science underwent the replication crisis.
    • Hypothesizing after results are known or HARKing or p-hacking.
    • Scientific literature prioritizes novel findings, incentivizes (even if unconsciously) p-hacking.
  • Popper's philosophy of science said that you never prove a scientific hypothesis - you only disprove it or fail to do so. The Bayesian idea that you can build up evidence for or against is very much opposed to it.
  • Hume: "All our experimental conclusions proceed upon the supposition that the future will be conformable to the past."
  • Popper: "We choose the theory which best holds its own in competition with other theories; the one which, by natural selection, proves itself the fittest to survive. This will be the one which not only has hitherto stood up to the severest tests, but the one which is also testable in the most rigorous way." He called such a theory "corroborated".
  • Instinctively, at least, scientists think like Bayesians.
  • In frequentist experiments, the p-value can rise and fall suddenly as the sample size gets bigger. for Bayesians, though, there is always prior data and so each new data point coming in moves your opinion much less, and form part of the new prior for your next bit of information.
  • Bayesian techniques don't just reject or accept the null hypothesis - they don't just say yes or no to a hypothesis, but give degrees of belief to a range of possible realities. In reality, there is no such thing as a null hypothesis, or rather the null hypothesis is always, ultimately false.
  • Bayesian's make an estimate of the size of an effect and give a probability distribution:
    • If your prior is very strong - your curve is really tall and narrow - and the new data is fairly weak, giving a likelihood curve that is low and wide, then the resulting curve will look more like the prior.
    • If your prior is weak but you've got really good data, so your likelihood curve is tall and pointy, the new data will wash our the prior, and the posterior will be more like the likelihood.
    • If the posterior curve is tall and pointy compared to the prior, then you have something noteworthy and worth following up.

3. Bayesian Decision Theory

  • George Boole described the operations of propositional logic as "the laws of thought".
  • In our reasoning we depend very much on prior information to help us in evaluating the degree of plausibility in a new problem. This reasoning process goes on unconsciously, almost instantaneously, and we conceal how complicated it really is by calling it common sense.
  • Cromwell's rule: You should never be certain.
  • Probability and utility together make expected value.
  • If humans were perfect reasoning machines with full access to our own underlying preferences, we could work out the maths, using a combination of Bayes and utility theorem. That is, roughly speaking, what modern AI does, in a much more explicit way.
  • Hyperpriors are to do with uncertaintly about what world you're in. You use hyperparameters to determine your hyperpriors, and then this will restrict the parameters from which you will choose your priors.

4. Bayes in the World

  • People are confused by:
    • The availability heuristic - How easily can I think of an example
    • Conjunction fallacy - Where they think that two things happening can be more common than a single thing happening (which is logically false)
    • Framing effects - Where the wording of a statement can dramatically impact the interpretation of it.
  • Humans are amazingly rational if information is presented to us in ways that we are designed to process it.
  • The gaze heuristic - is a shortcut for calculating the path of a ball - we do it automatically, fixing our gaze on the ball and adjusting our running speed so that the angle of gaze remains constant.
  • We use other heuristics as shortcuts:
    • Recency bias - Where we overweight more recent evidence
    • Anchoring - Where the first thing we see tends to set our expectations
    • Frequency bias - Where we shortcut to the thing we've seen most often.
  • Making decisions under uncertainty is hard. We don't have access to all the information, and even if we did, integrating it all into the Bayes equation would be computationally impossible. So instead we use shortcuts and heuristics, and our instinctive decision-making, from a Bayesian perspective, isn't that bad. By and large, we have grey areas and weak spots, but we're kind of OK.
  • Tetlock suggested that there are two distinct groups of experts:
    • Hedgehogs - Who think the world is simple and can be explained and predicted simply using "one big idea". Hedgehogs tell nice, straightforward stories, which are easy to package for the media
    • Foxes - Who think that the world is complicated - that the specifics and details of each situation matter, and that predictions are difficult and uncertain. Foxes do somewhat better at predicting things, with the top 2% called by Tetlock "Superforecasters".
  • Good forecasters often use the "Fermi estimate" - You make several estimates of small things instead of one big estimate, and if there's no reason why those errors should be systematically high or low, then they will tend to cancel each other out. His calculation of the number of piano tuners in Chicago, broke down into the population of Chicago, the percentage of people that own pianos, and the time it takes to tune one once a year. He guessed 62.5 against the real answer, about 80.
  • A huge amount of our public discourse comes down to our efforts to place things, groups, people, and concepts into categories. Are they fascist, a cult, a racist? But the categories are all fuzzy.
  • Take the category games. you learned what defines "games" in a Bayesian way starting with one data point and then refining your category with further data points. As you learn to attach the label "game" to more and more concepts, you get more accurate estimates of the probability of seeing certain characteristics in those concepts. Your prior probability that games involve balls was low, then someone pointed out that hockey, football, tennis, cricket and ping-pong are all games, so you updated.
  • This method works better than idealist philosophical notion.
  • See also the paradox of the heap. When does a heap of sand from which you are removing grains turn into "not a heap"?

5. The Bayesian Brain

Conclusion: Bayesian Life