Sociological Gobbledygook

Simulation for fun and profit

One really useful thing that you can do with programming skills is simulate things that you don't fully understand. This is something that hard science people do all the time: it turns out that there are some kinds of problems that aren't solvable in a deductive, analytic kind of way, but if you can write a simulation of some of the broad influences on the problem, then you can sometimes get a good idea what's going on.

For example, John Conway's famous Game of Life can be understood as a representation of an evolutionary process and has been the object of tons of study via computer simulation.

In statistics there are many many examples of simulation that actually help people learn more about data. For example, there's a technique called the "bootstrap" that involves taking your data and generating synthetic samples from it via simulation, which statisticans can use to squeeze out more information from the original data. There's also a wide array of techniques called Monte Carlo simulations in statistics that can be used to do freakishly powerful stuff. Simulation is great.

However, for our purposes in this class, the most useful thing about simulations is how they allow us to explore how probability affects real-world results. It turns out that a lot of truths in probability are very counterintuitive to our unaided imagines, but are really easy to simulate using a computer, and, due to a very happy mathematical fact called the law of large numbers (which we'll learn more about in the stats section of the course), as you repeat a probabilistic simulation a lot of times (which a computer can do very fast), you're likely, in many cases, to be able to observe behavior that reflects the aggregate behavior that we'd expect to see in the real world.

Here's an example. There's a very famous probability problem called "Monty Hall." I have some more details about it in a later lesson, but here's the crux of it. There was an old game show called "Let's Make a Deal" (Monty Hall was the host) and one the games they played involved presenting the contestant with three closed doors. Behind one of the door was a car, and behind the other two was a booby prize, like a goat or something. (I dunno, goats are cute, I'd be cool with that.)

The contestant would choose a door, and then Monty Hall would open one of the remaining doors not chosen by the contestant and reveal a goat. Then Monty Hall would ask the contestant: do they want to change with their original door, or do they want to switch? Then, after the contestant makes their final choice, Monty reveals what's behind the door they landed on, and the contestant gets the prize.

Pause for a moment, and think about the game. Can you figure out a good strategy for it? Should the contestant always stick with their original door? Should they always switch? Should they flip a coin? Does it matter at all? Think about it, and try to justify your answer. I'll be back below...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Almost everyone who hasn't seen this problem concludes that it doesn't matter---stick, switch, flip a coin, either way, you have a 1 in 3 chance of getting the car. Actually, this is wrong---the correct strategy is to always switch to the last closed door after Monty opens a door with a goat. (But don't feel bad if you didn't realize that, like I said, almost nobody who doesn't know the problem realizes that, including math professors.)

Suppose you don't believe me. This is really counterintuitive! How can it possibly make a difference, we start the game with one goat and three doors?!!!?!?? Well, this is a probability problem, so we can simulate it---and taking a look at the results might convince you that maybe I'm telling you the truth after all.

Here's some code to simulate the Monty Hall problem. It uses object-oriented programming concepts; please see the lesson about object-orientation to understand what's going on here. It turns out that object-oriented programming, as a style of writing code, is really well suited to simulations, because you can do things like make a player and its strategies one class, and a round of the game another class, and then just have them interact in order to produce your result.

import random

class Round(object):
    def __init__(self):
        self.doors = ["a", "b", "c"]
        self.car = random.choice(self.doors)

    def pick(self, door):
        self.player_choice = door

    def open_door(self):
        available_to_open = [x for x in self.doors if x not in [self.player_choice, self.car]]
        return random.choice(available_to_open)

    def evaluate(self):
        return self.player_choice == self.car  # true if player chose the car, false if not!


class Player(object):
    def __init__(self, strategy):
        self.strategy = strategy

    def first_pick(self):
        doors = ["a", "b", "c"]
        first_choice = random.choice(doors)
        self.first_choice = first_choice
        return first_choice

    def final_pick(self, open_door):
        switch_choice = [x for x in ["a", "b", "c"] if x not in [self.first_choice, open_door]][0]
        if strategy == "stick":
            return self.first_choice
        elif strategy == "switch":
            return switch_choice
        else:  # default strategy is assumed to be "random"
            return random.choice([self.first_choice, switch_choice])


class MontySimulation(object):
    def __init__(self, player):
        self.player = player
        self.wins = 0

    def report(self):
        percentage_won = 100 * self.wins / (self.rounds)
        print("Player with {} in {} rounds won {}% of rounds".format(self.player.strategy, self.rounds, percentage_won))

    def play(self, rounds):
        self.rounds = rounds
        for r in range(rounds):
            current_round = Round()
            first_pick = self.player.first_pick()
            current_round.pick(first_pick)
            opened = current_round.open_door()
            final_pick = self.player.final_pick(opened)
            current_round.pick(final_pick)  # it's perfectly ok to call the method a second time to reset the instance variable to the final result
            won = current_round.evaluate()
            if won:
                self.wins += 1
        self.report()

I've added some comments to the code to make it a bit more readable, but I won't explain it line by line (maybe in class). I want you to try to make sense of it, and then figure out how to run it. (I'll post an explanation of how to run it after a few lines of whitespace.)

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Here's a good way to run this code, at 10000 times per strategy, which should run very quickly and is highly likely to produce reports reflecting the underlying probabilities of winning. As you can see if you run this code, switching is a much better strategy than sticking!

strategies = ["stick", "switch", "random"]
for strategy in strategies:
    player = Player(strategy)
    game = MontySimulation(player)
    game.play(10000)

We will talk about Monty Hall later on in the course, and I'll give you the mathematical reasoning behind the correct strategy.

Incidentally, while object-oriented programming is a good match for simulation, it isn't necessary. To prove it, here's another Monty Hall simulation I wrote a couple years in a programming language that doesn't even support object-oriented programming. The part that does the simulation actually comes out shorter than the example above (although the linked code also generates a webpage to see the simulation live, which you can play with here if you want). If you want to see what the same simulation looks like in another language, go check it out... I'll bet you can understand the broad outlines, even though you've never been exposed to the other language.

In the remainder of this class, I will liberally use simulation to demonstrate important concepts in probability and statistics. I encourage you to poke around in those simulations, change their assumptions (for example, can you figure out a way to change the rules of Monty Hall to change the player's optimal strategy? If you can, simulate it to check!), and poke around the edges in order to get a feel for the knowledge they represent. I think you'll find that being able to dig your hands into a probability/statistics process will make it more understandable and intuitive than just looking at some mathematical proof.

Download lesson PDF

Contents:

Introduction to Python and setup.

Published: Mon 22 October 2018
By Paul Gowder

In Lessons.

tags: python programming week1 setup

In this class, we will be using the Python programming language. In this first week, we will have a basic introduction to Python and to the fundamentals of computer programming.

Before we get started programming, we have to set up a system to write our code and get the computer …
read more
Introduction to Programming and Python

Published: Sun 04 November 2018
By Paul Gowder

In Lessons.

tags: week1 python programming

In this first week of the course, we're going to cover some core concepts of computer programming. With this, we will build a foundation to do more interesting things in the coming weeks.

What is Computer Programming?¶
Computers basically just do one thing: they complete very simple operations at astonishing speed.

For example, suppose you wanted to calculate the thousandth number of the Fibonacci Sequence read more
Files and How Computers Represent Data

Published: Fri 30 November 2018
By Paul Gowder

In Lessons.

tags: python programming week2

In this lesson, we're going to learn how to open files and work with data from the disk. We'll start with the mechanical process of opening text files, and then move on to learn a little bit more about different kinds of data you'll see.

Here's the basic method of opening and reading text files. Suppose I have a file called hello.txt in my working directory. (Your working directory is the directory you run Python from on your hard drive. For those of you using Azure Notebooks, this should be your library, but talk to me if you see a file there and can't read it from Python.)
read more
Getting Data from the Internet With Python

Published: Thu 06 December 2018
By Paul Gowder

In Lessons.

tags: python programming networking internet week2

In addition to reading files locally, you can also read them over the internet.

When you use a web browser like Chrome to go to a URL ("uniform resource locator," or web address) like https://sociologicalgobbledygook.com, what you're actually doing is sending a request using the HTTPS protocol (which …
read more
Functions and Scope

Published: Thu 20 December 2018
By Paul Gowder

In Lessons.

tags: python programming week1

Recall how in the first Python lesson we looked at the while loop and saw how it allows us to repeat instructions to the computer as many times as you want.

The next step up from a loop is a function, which allows us to wrap up a series of commands into a single command on its own. Let's take a look at an example.
read more
More Loops and Control Flow

Published: Fri 21 December 2018
By Paul Gowder

In Lessons.

tags: python programming week1

In this lesson, we'll think about more ways to direct Python to do things repeatedly, or conditionally.

Let's start with more loops. I showed you the while loop before, remember?

read more
Simple Data Types (draft)

Published: Fri 21 December 2018
By Paul Gowder

In Lessons.

tags: python programming week1

In Python, the data you work with (like the things assigned to variables) have types, which specify the kinds of data they are and the things you can do with them.

A good way to understand this is to think about the difference between letters and numbers. While we can write both down, there are different things we can do to them. It wouldn't make sense (except in an algebra context) to multiply and divide letters; it would't make sense to talk about a capital and a lowercase number 3.
read more
Complex Data Types

Published: Sat 22 December 2018
By Paul Gowder

In Lessons.

tags: python programming week1

Some kinds of data can store other kinds of data.

Lists¶
We've actually seen the most common complex data type a few times before, I just haven't pointed it out to you. We make a list by enclosing the elements of a list in square brackets.

read more
Key Python Libraries for Working with Data
Published: Sat 22 December 2018
By Paul Gowder

In Lessons.

tags: python programming statistics week5
In this lesson I'm just going to describe the main libraries that we'll see when we work with data in Python.

Numpy¶
Numpy is the first library we work with. By convention, it's imported with import numpy as np. Numpy really provides two things to our workflow:

Math that goes faster than unadorned Python could do it---which is important when you're doing statistics, because under the hood computational stats can take a lot of calculations.
read more
Dealing with Errors

Published: Mon 31 December 2018
By Paul Gowder

In Lessons.

tags: python programming week2

An unavoidable fact of life for people who write code is error messages. You're happily programming along, and then, all of a sudden, you get a massive screen of terrifying text telling you that you screwed up!!

read more
Object-Oriented Programming

Published: Fri 18 January 2019
By Paul Gowder

In Lessons.

tags: python programming week3 object-oriented

Object-oriented programming (OOP) isn't all that special, it's just a particular style of programming that Python is particularly well designed for. This is a short lesson, we won't cover the theory of OOP, or features you might hear about elsewhere like "inheritance"---see your reading in the P4E book for more.
read more
Regular Expressions

Published: Fri 18 January 2019
By Paul Gowder

In Lessons.

tags: python programming week3 regex strings

Regular expressions (or "regex"/"regexes") are one of the most powerful programming tools for lawyers. Essentially, regular expressions are a powerful specialized programming language built into other languages like Python, which allow you to express complicated text searching operations.

The utility of this for lawyers should be obvious: lawyers have to deal with lots and lots and lots of documents, and sometimes need to search through those documents for specific information. If those documents are in electronic form, regular expressions can provide you with a much more powerful way of searching than what is built into ordinary applications.
read more
In-class example: talking to an API

Published: Tue 22 January 2019
By Paul Gowder

In class_examples.

tags: week2 python programming examples apis networking

This is a lightly edited version of the notebook that we worked through in class on 1/22/19.

In class, we went through how to make an API call end-to-end, to get a look at common tasks like figuring out documentation, using libraries, making HTTP requests, etc. Over the weekend, practice with this API and others (you might also try the one at opensecrets.org read more