Contents:


  1. Post-Week 1 Notes

    Well, we've survived our first week of Sociological Gobbledygook (and one room change---which we're keeping, BTW). We've learned the basic mechanics of programming, and, more importantly, we've worked on building a mental model for how talking to a computer works by completing a number of exercises.

    For the in-class time …

    read more
  2. Week 4 recap: Total Probability Rule

    The probability lecture this week kind of hit a wall at one point in our Bayes Rule example. Here's a clearer explanation.

    Remember, we had an example problem involving figuring out the posterior probability of someone being drunk, given that they blew a positive result on a breathalyzer.

    The place …

    read more
  3. Using libraries in Python

    Programming would be incredibly time consuming if you couldn't reuse code. You've seen the basic form of code reuse already---the function, which allows you to lock up a transformation from some data to some other data, give it a name, and then apply it repeatedly, and to arbitrary data.

    Well …

    read more
  4. Why Statistics for Lawyers?

    The brunt of this course will be devoted to statistics and exploratory data analysis.

    Exploratory data analysis is just looking at data to see what you see. We will spend some time, for example, looking at how to see the shape of data and what that can tell you about …

    read more
  5. Introduction to Programming and Python

    In this first week of the course, we're going to cover some core concepts of computer programming. With this, we will build a foundation to do more interesting things in the coming weeks.

    What is Computer Programming?

    Computers basically just do one thing: they complete very simple operations at astonishing speed.

    For example, suppose you wanted to calculate the thousandth number of the Fibonacci Sequence read more

  6. Introduction to Exploring Data in Python

    Introduction to Exploring Data in Python

    In this lesson, we're going to learn about how to get a feel for data in Python, using basic tools to look at our data.

    First, however, let's make sure we have the right version of a library we need called Seaborn. Run the code in the next block and make sure that you get the right version.

    read more
  7. Files and How Computers Represent Data

    In this lesson, we're going to learn how to open files and work with data from the disk. We'll start with the mechanical process of opening text files, and then move on to learn a little bit more about different kinds of data you'll see.

    Here's the basic method of opening and reading text files. Suppose I have a file called hello.txt in my working directory. (Your working directory is the directory you run Python from on your hard drive. For those of you using Azure Notebooks, this should be your library, but talk to me if you see a file there and can't read it from Python.)

    read more
  8. Introduction to Distributions

    What's a Distribution, Anyway?

    Statistics people often talk about distributions, like a normal distribution. Here's what they mean: suppose you could see all of the instances of the thing you're trying to study. What kind of pattern would their values have? That's the distribution.

    For example, suppose you expect that most of the values of the thing you care about will be clustered around some average value. IQ is a good example: most IQs in the population are around 100, and then as values get further away from 100 in either direction, the fraction of the total number of instances that takes that range of values gets smaller. There are lots of folks with an IQ between 85 and 115, fewer between 70 and 85 on one side, and 115 and 130, many fewer between 55 and 70 or 130 and 145, and a (proportionally) truly tiny number between 40-55 or 145-160.

    read more
  9. When Regressions Attack

    This lesson is all about what can go wrong in linear regression. Here's an outline of the ways things can go wrong.

    • data isn't linear
      • extreme outliers
    • heteroskadiscity
    • multicolinnearity
    • conditioning on a collider
    • counfounder bias
    • non-normal residuals

    There's also a problem known as "autocorrelation" which mainly appears in time series data (i.e., when one tries to run a regression on something that changes over time, like stock market prices). Time series analysis is a fairly advanced topic that is beyond the scope of this course, but you should have alarm bells ringing if anyone tries to do ordinary linear regression on data that spans time like that.

    read more
  10. The Basics of Probability

    What is Probability?

    Probability is the mathematical representation of the likelihood of an event under a given set of circumstances (conditions) in a given period of time. We will say, for example, that the probability of winning the jackpot in the lottery from buying one ticket this week is some …

    read more
  11. P-Values and Bayes Rule

    Recall from the previous lesson what a p-value is: it’s the probability of observing a value of your statistic as extreme (as far away from the null hypothesis statistic) as you in fact observed, if the null hypothesis were true.

    In other words, if you’re doing a (two-sided …

    read more
  12. Functions and Scope

    Recall how in the first Python lesson we looked at the while loop and saw how it allows us to repeat instructions to the computer as many times as you want.

    The next step up from a loop is a function, which allows us to wrap up a series of commands into a single command on its own. Let's take a look at an example.

    read more
  13. Common Data Transformations

    It's often useful in performing data analysis to transform some of your variables to fit a common scale; this is especially useful in exploratory data analysis, because these transformations often make it much easier to eyeball the relationship between variables. (Also, some statistical techniques require these transformations.)

    In this short lesson, we'll introduce two common methods of transforming data---the log transform read more

  14. Introduction to Linear Regression

    The standard technique for measuring the relationship between one or more continuous independent variables and a continuous dependent variable is linear regression.

    The basic idea of linear regression can be expressed simply. A linear regression is a line (or some more dimensional geometric thingy) that maps the independent variables to the best predicted value for the dependent variable.

    read more
  15. Simple Data Types (draft)

    In Python, the data you work with (like the things assigned to variables) have types, which specify the kinds of data they are and the things you can do with them.

    A good way to understand this is to think about the difference between letters and numbers. While we can write both down, there are different things we can do to them. It wouldn't make sense (except in an algebra context) to multiply and divide letters; it would't make sense to talk about a capital and a lowercase number 3.

    read more
  16. Complex Data Types

    Some kinds of data can store other kinds of data.

    Lists

    We've actually seen the most common complex data type a few times before, I just haven't pointed it out to you. We make a list by enclosing the elements of a list in square brackets.

  17. Key Python Libraries for Working with Data

    In this lesson I'm just going to describe the main libraries that we'll see when we work with data in Python.

    Numpy

    Numpy is the first library we work with. By convention, it's imported with import numpy as np. Numpy really provides two things to our workflow:

    1. Math that goes faster than unadorned Python could do it---which is important when you're doing statistics, because under the hood computational stats can take a lot of calculations.

      read more
  18. Practical Basic Hypothesis Tests

    In this lesson, we're going to very quickly rip through the basic hypothesis tests, their uses, and how to achieve them in Python. I won't spend a lot of time on this, because the mathematical details are covered in the assigned reading, and, at any rate, I think for practical purposes regression analysis is more important for lawyers. Also, this is basically AP/undergrad stats material, so you've probably seen it somewhere already.

    read more
  19. Object-Oriented Programming

    Object-oriented programming (OOP) isn't all that special, it's just a particular style of programming that Python is particularly well designed for. This is a short lesson, we won't cover the theory of OOP, or features you might hear about elsewhere like "inheritance"---see your reading in the P4E book for more.

    read more
  20. Regular Expressions

    Regular expressions (or "regex"/"regexes") are one of the most powerful programming tools for lawyers. Essentially, regular expressions are a powerful specialized programming language built into other languages like Python, which allow you to express complicated text searching operations.

    The utility of this for lawyers should be obvious: lawyers have to deal with lots and lots and lots of documents, and sometimes need to search through those documents for specific information. If those documents are in electronic form, regular expressions can provide you with a much more powerful way of searching than what is built into ordinary applications.

    read more
  21. Week 5 Recap

    In week 5, we began by continuing our probability lecture from last week, and then, as an exercise, tried to prove the correct answer to the Monty Hall problem using Bayes Rule.

    Monty Redux

    Here's that solution again. Remember our formula for Bayes Rule:

    $$P(B|A) = \frac{P(A …
    read more
  22. Causation and Counterfactuals

    In law as in science, one thing we typically want to resolve are questions of causation. For example, in biology we might ask things like "does this drug reduce deaths from heart disease," in economics we might ask "does raising the minimum wage increase unemploment." In law, such questions will …

    read more
  23. Post-Week 9 Notes

    Two notes following week 9 (our intro to regressions).

    First, a couple students asked whether you need to center and scale (standardize) data to make it work with linear regression. I think I had a brain freeze and gave an inconsistent answer, so let me clarify here: no, you don't …

    read more
  24. Confidence Intervals and Bayesian Statistics oh my!

    One of the readings for week 13, "The Bayesian New Statistics," covers a variety of different approaches to statistics, as contrasted with the standard frequentist hypothesis-testing method. I don't expect you to come out of this class being able to work any of those alternative paradigms, but you should be able to recognize them and understand broadly how they operate. That article is a very good summary of the landscape, but this supplemental lesson aims to provide a briefer and slightly more basic introduction.

    read more
  25. Prediction vs. Inference

    So far, our statistics material in this course has fallen into two buckets. The first, and most straightforward, is descriptive statistics, that is, just describing what our data looks like---mean, median, correlation, that kind of stuff. The second is "inferential statistics," that is, use of statistics to make inferences about …

    read more
  26. Installing an external library on the UI systems

    It's a bit difficult to install libraries to make them usable with jupyter notebooks in general; it's slightly more difficult to make it work in a permission-controlled environment. However, if you want to try my plottyprint library for problem set 2 (which has the advantage of being easy to use), here's some code to try.

    read more

links