Syllabus

Introduction to Quantitative & Computational Legal Reasoning (LAW:8645)

Revisions for coronavirus shutdown

We are losing a week of class. Accordingly, I have cancelled what was originally schedule for week 13. The schedule below is revised accordingly.
In order that we can minimize the burden on students who have had their lives disrupted, I have consolidated the last two problem sets, which will now be due on May 8.

Spring 2020; Monday + Tuesday 2:00-3:30; Classroom 265.

Professor: Paul Gowder

Office: 408
Email: paul-gowder@uiowa.edu
Phone: 319-384-3202
Office Hours: Mon., Tue. 11am-1pm, and by appointment

Assistant: Diana DeWalle

Office: 469
Email: diana-dewalle@uiowa.edu
Phone: 319-335-9036

Welcome

Welcome to Introduction to Quantitative and Computational Legal Reasoning, informally named "Sociological Gobbledygook" in honor of Chief Justice Roberts's slightly math-phobic remark at oral argument in Gill v. Whitford. This course is offered at the University of Iowa College of Law in Spring 2020, by Paul Gowder, and has previously been offered in Spring 2019: you can go look at last year's syllabus if you're curious. For a little bit more about the motivation for the course, you can check out the manifesto.

This is a totally open-source course. There is a Github repository which contains all of our materials and discussions. Please feel free to make a suggestion or start a conversation in the issues, or even make a pull request. For unstructured navigation of this website, you can use the [list of tags](/tags.html** that describe the subject matter of our lessons.

Note: If you're accessing this syllabus in PDF form on ICON, some of the links may not work, and it may not be fully updated. The canonical syllabus will always appear in HTML form on https://sociologicalgobbledygook.com. Much of the language below (like "this website") assumes that you are accessing the syllabus from there.

Course Summary

This course will review basic principles of probability, statistics, and computational reasoning (including elementary programming) for law students. Throughout, the emphasis will be on mathematically modest intuition, practical skills, and legal applications. No mathematical background beyond high school algebra will be assumed.

This course is not advised for students with substantial statistical or computational backgrounds---it is designed as a beginner course. Nor will it prepare students to be competent empirical researchers or computer programmers---the goal is to give students the capacity to critically evaluate and understand statistical reasoning, and to use computational methods to do so (as well as in their legal practices more generally). Focus will be on breadth rather than depth, as well as legal applications.

Introduction to Quantitative and Computational Legal Reasoning is experimental. This syllabus is not a contract; I reserve the right to make radical changes in how the course operates throughout the semester, depending on how student learning progresses. (However, as this is version 2 of the course, you can expect the changes to be a bit less radical compared to last year.)

Course Resources

Course website: https://sociologicalgobbledygook.com/
Course Github repository

Note that lessons are downloadable directly from this website, including (where lessons are in that format) Jupyter notebooks that you can execute on the IDAS service (about which below), but also in PDF.

Readings

Texts

The main readings for this course will be on this website. There may be a little bit of copyrighted stuff that I can't distribute publicly on ICON.

In addition, there will be supplemental readings drawn from Charles Severance, Python for Everybody, which is available for free online, and Michael Finkelstein & Bruce Levin, Statistics for Lawyers which should be free to download as PDF through our library's subscription. (You may have to be on the campus network to download it; you might also have to search for it through the library's directory.) We will also be using some excerpts from the Federal Judicial Center's Reference Manual on Scientific Evidence

We'll make use of some videos and exercises online. We'll talk more about this on the first day, but I will be assigning you mini-online-introductory courses to do in order to get practice with programming, from Dataquest's free tier (maybe) and from Hackerrank and Project Euler. You'll want to grab accounts at those places.

We will also use some free instructional materials from the nonprofit organizations Software Carpentry and Data Carpentry.

For the contents of this website, it's probably easiest to access them using the week-by-week links below or the content-based tags built into this website. There might be formatting glitches with some of the lessons, due to conversion between different file types and html, but every lesson will have a link to a downloadable and printable PDF at the bottom which will ordinarily have cleaner formatting (except for long lines of code, which may be cut off in PDF but should be fine on web). Some lessons should also have downloadable Jupyter notebooks associated with them, which will also be linked at the bottom.

Bonus Reading Suggestions

I am committed to only assigning resources which are free to students. However, the nature of this material is that sometimes one explanation will just "click" where another might not. So in addition to the assigned readings, I offer you this list of additional, non-free, readings which you might consult for a different perspective on the material---or for deeper engagement and exploration.

Statistics

I really like the Aspen textbook by Lawless, Robbennolt and Ulen, Empirical Methods in Law. It has very good clear explanations of a number of research methods topics, and is not overly math-y. If you want to dig deeper into stats and empirical research in law, I highly recommend it. I also recommend Lee Epstein & Andrew Martin, An Introduction to Empirical Legal Research.

If you want to do serious research on your own, you will need to move to more advanced texts, but the direction you go will depend on the particular kind of research you want to conduct. For expermental research, especially experimental research out in the world (like the kinds of things done by discrimination testers, about which we will talk), a classic text is Gerber & Green Field Experiments: Design, Analysis and Interpretation; if you are more interested in observational research, I really like Angrist & Pischke, Mastering Metrics. Both of those books are rather-more math-y than the others (or our class).

Charles Wheelan, Naked Statistics: Stripping the Dread from the Data, is a well-liked book on the other end of the spectrum---it focuses on intuitive and non-math-y explanations of statistics topics. It's quite good in that respect, but I'm not fully comfortable recommending it for other reasons. The author thinks he's funnier than he actually is, and the book features a number of fairly tasteless, and in some cases offensive, jokes. If you can put up with that, however, the book is good at explaining stats in a clear way.

Some other books that might be of interest to you, though I haven't reviewed them as closely and hence can't clearly endorse, include:

Peter Bruce & Andrew Bruce, Practical Statistics for Data Scientists
Michael A. Bailey, Real Stats: Using Econometrics for Political Science and Public Policy
Uri Bram, Thinking Statistically

Python Programming

My favorite introductory Python book (not free) is John Guttag, Introduction to Computation and Programming Using Python. This book is also the basis for a wonderful electronic course by almost the same name from MIT on EdX --- and you can go through the course for free, and without buying the book. I really do think that course (and the second course in the same series) is an amazing way to learn Python, and programming in general.

Blessedly, there are a lot of good introductory Python programming books out there which are also available online for free. One of my favorites is Al Sweigart, Automate the Boring Stuff with Python. For more advanced (and non-free) learning, I really love Luciano Ramalho's Fluent Python, although by the time you need that you should be looking at building fairly substantial programs.

It is better to use a Python book that is based on Python 3, not Python 2.

General Learning

I highly recommend Barbara Oakley's book A Mind for Numbers, which is basically a self-help book on the psychology of learning difficult things---which can help you not just in math-y classes but in law school and other classes in general. There's an online course based on her book on Coursera, called Learning How to Learn; I've never looked at that course but everyone who has done so has raved about it.

How Class Will Go

This course is structured as a vaguely "flipped" lab-style process. You will largely consume the talking-ish "content" kind of instruction outside of the classroom, primarily through readings. In classroom time, I will demonstrate the practical usage of the things you've learned about outside of class, maybe do a teeny tiny and (more commonly) assign exercises for you to carry out, with the opportunity to work together to figure them out and with me looming over your shoulder to help.

Please bring a computer to every class. Mac, Linux, or Windows computers will work best. Chromebooks and tablets will work less well, though we can get them to work if need be.

The coverage, pace, and workload in this class will be a continuing work in progress. Because this class isn't taught a lot in law schools, there is not much collective wisdom on how to do it successfully, and I expect to have to adapt the assignments and the pace to accommodate how readily the class takes to the material. So don't expect the syllabus to stay stable as we move through the course.

Class technological resources

This class will use the University's Interactive Data Analytics Service (IDAS). You do not need to request access to this, I've made arrangements to get accounts for the whole class, and I'll walk you through getting access to this resource on the first day of class. (You can also see the links on the bottom of this page.)

In addition, every session will be recorded on Panopto; and we'll use ICON to and distribute materials which (for copyright reasons, etc.) we're not allowed to distribute outside of the class. Also, you will use the discussion feature of ICON to share information and ask questions out of class.

This class is intended in part to produce resources which will be available to the legal profession at large in order to help your fellow lawyers understand code and stats as well; accordingly, many of the reading assignments will be to lessons posted on this website at sociologicalgobbledygook.com. Those assignments will also be available on the course GitHub repository; you will find it useful to get the assignments there in order to execute and mess around with the code yourself. I'll explain how GitHub works on the first day of class too.

Evaluation

Evaluation will be primarily based on four problem sets. The first two will be computer programming-based (with the second possibly including a probability problem or two), and will be worth 17.5% of the grade each. The third will be probability and statistics based and will be worth 25% of the grade. The fourth will be comprehensive, with emphasis on the statistics side, and will be worth 30% of the grade.

The weird fractions are to accommodate 10% of the grade which will be based on classroom participation and preparation, and which is meant to enforce the lab-style classroom format. Students who get full credit for that 10% will complete the simple out-of-class tasks which I will periodically assign and participate in good faith in collective problem-solving in the classroom. (This is an effort-based 10%, not a performance-based 10%.)

Under the policy described in Student Handbook section B.3 ("The curve is not applicable in upper-level seminars and other upper- level classes in which a student's grade is based primarily on the student’s performance on graded skills-oriented tasks (including writing) other than a final exam."), this course will not be curved.

In order to ensure that students in this course aren’t disadvantaged by unfamiliarity with the format or a collection of too-hard problem sets on which everyone struggles, there will be a floor for the distribution of grades for this course: at the lowest, the median grade for this course will be the official law school median of 3.3. In other words, you can’t do worse than the standard curve would otherwise dictate. But you can do better.

Last year's problem sets are available on this website, along with answers to them; looking them over will give you an idea of the approximate challenges that you'll be asked to complete.

All problem sets will be turned in via email to Diana Dewalle. The only place your name should appear in problem sets is in the filename.

Collaboration

Problem sets should be your own work. You are allowed to discuss the general approach to problem sets with one another, but you are not allowed to show one another your math or code.

For example: "I solved that problem by writing a loop over the list of cases" is acceptable. "Look at this code I wrote" is not.

Students will be asked to agree to an honor code.

Collaboration on in-class tasks and on homework assignments that are not one of the official problem sets is highly encouraged and probably necessary. However, you should try to do the homework assignments on your own first before consulting with your classmates, in order to maximize your learning. (Struggle and frustration are normal, expected, and healthy.)

Technology, Bugs, and Accommodations

This course will be technologically driven, obviously. Please let me know ASAP if there are any glitches of any kind.

Also, please contact me or the dean of students as soon as humanly possible if you need accommodations, so that these accommodations can be built into the tech. All course materials will be provided in formats that I believe are accessible (e.g., to screen readers), however, if I'm mistaken about their accessibility, please let me know and the problem will promptly be fixed.

Office Hours, Contacts, etc.

I will maintain office hours (Mon., Tue. 11am-1pm, and by appointment). I'm also happy to make appointments at other times, and you're always free to drop by when my door is open. I'm very good at replying to e-mail and very bad at checking telephone messages.

That being said, I very strongly encourage you to ask substantive questions in a way that will be accessible to your fellow students. This means using the copious time that will be made available in class time for that purpose, as well as making use of the discussion forum on ICON (in which I will very actively participate). If you have a question, it's almost certainly the case that several other people do too.

Some schedule notes

Spring Break is March 14-22. No class then, obviously. (Addendum: also no class the week after, for transition week)

Problem sets are due on the following Fridays, each at 5pm Central: Feb 14 (week 4), Mar 6 (week 7), and May 8 (combined problem sets 3 and 4, in the middle of finals period).

Learning outcomes

By the end of the course, you should be able to:

Write simple Python scripts to automate common tasks and explore data
Visualize data, and understand basic data visualizations
Understand common errors made in empirical research and quantitative claims, and identify those errors in published and expert reports.
Conduct basic statistical analyses.
Reason about probability and statistics at a sufficient level to be an informed consumer of quantitative claims.
Identify the confusions in the following, from oral argument in Department of Commerce v. New York:

embedded image: Justice Gorsuch makes a statistical oops

Coverage by Week

The first few weeks will be spent on computation; subsequent weeks will be spent on data analysis and statistics. As we get further into the future, the below becomes more subject to change, obviously.

Week 1 (Jan 21)

Coverage: Basic ideas of programming, units of computation, functions and loops. Computational logic and legal logic, law as computation. We will front-load the quantity of reading a little heavily to get us started quickly, but it'll ease off as we move on to more conceptually difficult material. (Also, I know the reading seems like a lot, but it goes faster than cases in 1L year!) In particular, don't feel obliged to fully absorb everything from Python for Everybody on the first reading. Just read it quickly so you get a feel for the terrain, and then more carefully read the stuff posted on this site, then dig back into Python for Everybody to fill out the details.

Read:

the Introduction to Python on this site
the first Python lesson.
functions and scope
more loops and control flow
simple data types
complex data types
Software Carpentry lesson on functions
Software Carpentry lesson on making choices
the Software Carpentry lesson on loops
pages 1-55 of Python for Everybody.

Like I said, I know this is a lot of reading. Don't worry, there won't be nearly so much as we move forward.

In the first day of class, we will get everyone set up with the different services and installation options for the software we need in the course, and, if there's time, demonstrate some basic programming ideas and work through some exercises.

For Week 1 practice homework, do the "Introduction" problems (click the introduction checkbox on the right of the screen) in the HackerRank Python Domain.

I'd also like you to complete the factorial exercise at the end of the first Python lesson; you don't need to turn it in, but let me know if you can't complete it; we'll look at people's solutions next week and use this as our test to make sure everyone is set up and functional.

Here is the in-class notebook that we saw on day 1.

Week 2 (Jan 27, 28)

Coverage: using Python to get access to other people's code, libraries. Accessing the filesystem and the internet from Python. Error handling. Strings.

Readings:

Python for Everybody pg. 55-126
my brief notes from week 1 in 2019 (some of this may be a little obsolete, like references to some of the specific resources we used last year, such as Microsoft Azure).
Libraries
Errors
Files and Strings
Network Requests

In class on Tuesday for this week last year we worked through an example of accessing the Openstates API. We'll probably do that exercise again this year, and when it's done you can take a look at the example here.

For Week 2 practice homework, do the "Basic Data Types" problems (click the Basic Data Types checkbox on the right of the screen) in the HackerRank Python Domain, except the "List Comprehensions," and "Lists" problems, which you can skip (we'll try to do those in class).

Here are our in-class notebooks for week 2: Jan 27 (Monday) and Jan 28 (Tuesday). The due dates for the first two problem sets have also changed, and this change is reflected on the page you're looking at. I've also moved around the practice homework a bit.

Week 3 (Feb 3, 4)

Coverage: Regular expressions. Simulation and why you might want to do it. A very light introduction to object-oriented programming. (But we're a little behind and so we'll start this week with the networking and API stuff from last week.)

Readings:

Automate the Boring Stuff With Python chapter 6 (strings)
Automate the Boring Stuff With Python chapter 7 (regular expressions)
Python for Everybody pp. 127-140 and 171-184 (chapters 11 and 14).
Regular Expressions
Object-Oriented Programming
Simulations

For Week 3 practice homework, go to the "Strings" problems in the HackerRank Python Domain and do:

Here are our in-class notebooks for week 3: Feb 3 (Monday) and Feb 4 (Tuesday).

"sWAP cASE"
"String Split and Join"
"Mutations"
"Find a string"
"String Validators"
"Text Wrap" and
"Capitalize!"

Week 4 (Feb 10, 11)

Basic probability math. Bayes rule and conditional probability.

Focused legal applications: probabilistic causation in torts, junk science in criminal trials.

Readings:

the Arbital Guide to Bayes Rule (at the beginning, where it asks you to pick a level of depth, choose the full/deep presentation)
Probability
Chapter 3 (pp. 61-100) of the Finkelstein and Levin Book. Direct link to F&L assignment via UI Library Proxy Server
After reading the F&L assignment read Abel and Baker Redux
(video) Peter Donnelly's TED talk How Statistics Fools Juries
NBC News 'We are going backward': How the justice system ignores science in the pursuit of convictions
A note from last year on a week 4 recap/further explanation of the probability issue the class that year got stuck on

Problem set 1 due Friday, February 14, at 5pm Central time. Here are my answers to pset 1

See last year's Problem set 1, which you can do for practice if you'd like. After doing that practice, you can check out my answers to it.

No week 4 practice homework because the problem set is due.

Here are our in-class notebooks for week 4: Feb 10 (Monday); and Feb 11(Tuesday).

Week 5 (Feb 17, 18)

Initial explorations into data with data visualization in Python. Basic properties of data, measures of central tendency, exploratory data analysis.

Reading:

Introduction to stats
Key Python Data Libraries
Exploring data
Common Data Transformations
Federal Judicial Center Manual on Scientific Evidence pp. 236-240
Finkelstein & Levin chapter 1 (pp. 1-45)
Explore the different data visualizations available with From Data to Viz
Skim over the following two Data Carpentry lessons, paying special attention to the parts about Pandas (and ignoring the parts about SQL and visualization libraries, except insofar as they involve Pandas):
- Data Analysis and Visualization in Python for Ecologists
- Data Analysis and Visualization with Python for Social Scientists
(TBD: other Pandas tutorial material)
Week 5 recap from last year.

For Week 5 practice homework, do the "Errors and Exceptions" problems in the HackerRank Python Domain, then go to the "Regular Expressions and Parsing" section and do:

"Detect Floating Point Number"
"Re.split()"
"Validating phone numbers"
"Validating and Parsing Email Addresses"

Here are our in-class notebooks for week 5: Feb 17 (Monday); for Feb 18, instead look in the class exercises for the answers to the data scavenger hunt.

Week 6 (Feb 24, 25)

Probability distributions, central limit theorem, hypothesis testing.

Reading:

Distributions
The Normal Distribtion
Hypothesis Testing
a Catalogue of Basic Hypothesis Tests
Finkelstein & Levin, pp. 101-126
A Concrete Introduction to Probability using Python by Google's director of research,
these two excellent blog posts by a Google data scientist:
- Statistics for People in a Hurry
- Never Start With a Hypothesis

For Week 6 practice homework, do the "Debugging" problems in the HackerRank Python Domain

Here is our one in-class notebook for week 6: Feb 25 (Tuesday); Feb 24 was a lecture without a notebook.

Week 7 (Mar 2, 3)

Experiments, random assignment. Causation and correlation. Focused legal application: audit tests in discrimination cases.

Readings:

Causation and Counterfactuals
Devah Pager, The Use of Field Experiments for Studies of Employment Discrimination: Contributions, Critiques, and Directions for the Future, 609 Annals of the American Academy of Political and Social Science 104 (2007). We have access to this article via our library's subscription, this proxy link should work to download it.
The Open Introduction to Statistics pp. 19-26. This book can be downloaded for free, but I will probably post an excerpt of the relevant parts on ICON (remind me!).
A collection of edited discrimination tester cases

Problem set 2 due Friday, March 6, at 5pm Central time. Here are my answers to pset 2

See last year's problem set 2, which, as before, you should do for practice; afterward you can look at my answers.

Here are our in-class notebooks for week 7: Mar 2 (Monday); plus a supplemental tutorial on installing third-party libraries; March 3.

Week 8 (Mar 9, 10)

Focus week: statistical extrapolation and simulation in the law. Shonubi case.

Readings:

United States v. Shonubi (Edited version kindly supplied by Josh Fischman of UVA.)
Commentary on Shonubi, posted on ICON

We'll spend this week catching up further, if necessary, discussing the Shonubi case, and replicating the data analysis used by experts in that case (approximately---we don't have quite the identical dataset). As time permits, I'll introduce the basic concepts of linear regression for next week.

For Week 8 practice homework, do the Multiples of 3 and 5 and Smallest multiple problems on Project Euler. (Note, you don't need to submit code on that site, just write the code to get the correct math answer.)

Week 9 (Mar 23, 24)

Regression analysis. Linear regression. Statistical evidence of discrimination.

Readings:

Regression Introduction
Federal Judicial Center reference manual on scientific evidence, appendix to reference guide on multiple regression, pp. 333-356
F&L pg. 369-371, bottom of 376-380, bottom of 385-393
Lindsey Kuper, Understanding the regression line with standard units
ATA Airlines, Inc. v. Federal Express Corp., 665 F.3d 882 (7th Cir., 2011), as edited
Post week 9 notes from last year

For Week 9 practice homework do Even Fibonacci numbers from Project Euler. Also, look at the hypothetical dataset mickel.csv, and use the techniques we've learned in class to come to some conclusion about whether discrimination is occurring in the provision of public benefits in this disability services agency context. We'll go over this assignment in class at an appropriate time.

Week 10 (April 6, 7, revised schedule)

Applications and reinforcement. Slow-down week, solidify our existing knowledge by thinking about a concrete use of statistics in law: determining disparate impact in Title VII cases.

Readings:

Stewart v. St. Louis
Ricci v. DeStefano excerpts
Excerpts from 29 C.F.R. 1607 for reference/skim, no need to carefully read the whole thing

No practice problems this week.

Week 11 (April 13, 14)

P-values, p-hacking, publication bias, the replication crisis in psychology, power and underpoweredness, multiple comparisons, and other terrible pitfalls of scientific research.

Readings:

P-Values and Bayes Rule
Power
Peter Norvig's blog post Warning Signs in Experimental Design and Interpretation
Statistics Gone Wrong. I'd like you to read that whole website eventually, but for the beginning of the week, you can focus on the sections on:
- Statistical Power and Underpowered Statistics
- The P Value and the Base Rate Fallacy
Fun reading: P-values explained with puppies

Discussion: what are the legal implications of scientific failures?

For Week 11 practice homework do Power Digit Sum from Project Euler. This will be the last of our practice homeworks, as I know that you're going to start freaking out about exams by now.

Week 12 (April 20, 21)

How regression analysis can go horribly wrong. Assumptions of regression. Failures of regression assumptions. Simpson's paradox.

Readings:

Regressions gone wrong
Federal Judicial Center reference manual, reference guide on multiple regression, pp. 303-332
The following two blog posts/tutorials by "Statistics by Jim":
- Confounding Variables
- Classical Assumptions of OLS

Week 14 (April 27, 28)

Look at real-life expert witness reports, how arguments about quantitative methodology are used in court.

From statistics to machine learning: what is it that fancy data science people actually do with their time? Prediction vs inference. Algorithmic accountability. Discrimination by computer, and legal implications of statistical discrimination (intentional racial profiling and unintentional racial profiling).

Reading:

Example expert witness reports: excerpts from one plaintiff and one defendant report in ongoing litigation around Harvard admissions (a claim that they discriminate against Asian-Americans). On ICON. Note: these documents are a bit choppy, as I tried to edit the PDFs to limit it to a handful of points of contention by deleting pages, so ignore any discontinuities. Please come to class prepared to comment on one question in particular: should any data analysis include people admitted as legacies or athletes?
Prediction vs. Inference
Galit Shmueli, "To Explain or to Predict?", Statistical Science 2010. (skim)
Harini Suresh and John V. Guttag, "A Framework for Understanding Unintended Consequences of Machine Learning"

Optional, bonus (but HIGHLY recommended) reading:

Adam M. Lauretig and Bear F. Braumoeller, "Statistics and International Security," in The Oxford Handbook of International Security (2018)
Andrew D. Martin, Kevin M. Quinn, Theodore W. Ruger, and Pauline T. Kim. "Competing Approaches to Predicting Supreme Court Decision Making." Perspectives on Politics 2004

Final exam period

Consolidated Problem sets 3 and 4 due on Friday, May 8, during the exam period, at 5pm Central.

See last year's problem set 3, last year's problem set 4 and a makeup assignment the class did.