Syllabus
Introduction to Quantitative & Computational Legal Reasoning (LAW:8645)
Spring 2019; Monday + Tuesday 12:402:10; Classroom 125.
Professor: Paul Gowder (Office: 408. Email: [email protected] Phone: 3193843202.).
Assistant: Diana DeWalle (Office: 469. Email: [email protected] Phone: 3193359036.).
Course Summary
This course will review basic principles of probability, statistics, and computational reasoning (including elementary programming) for law students. Throughout, the emphasis will be on mathematically modest intuition, practical skills, and legal applications. No mathematical background beyond high school algebra will be assumed.
This course is not advised for students with substantial statistical or computational backgroundsit is designed as a beginner course. Nor will it prepare students to be competent empirical researchers or computer programmersthe goal is to give students the capacity to critically evaluate and understand statistical reasoning, and to use computational methods to do so (as well as in their legal practices more generally). Focus will be on breadth rather than depth, as well as legal applications.
Introduction to Quantitative and Computational Legal Reasoning is experimental. This syllabus is not a contract; I reserve the right to make radical changes in how the course operates throughout the semester, depending on how student learning progresses.
Course Materials

Course website: https://sociologicalgobbledygook.com/
Note that lessons are downloadable directly from this website, including (where lessons are in that format) Jupyter notebooks that you can execute on your own machine and/or on Azure, Google CoLab, etc., but also in PDF.
Readings
Texts
The main readings for this course will be drawn from Charles Severance, Python for Everybody, which is available for free online, and Michael Finkelstein & Bruce Levin, Statistics for Lawyers which should be free to download as PDF through our library's subscription. (You may have to be on the campus network to download it; you might also have to search for it through the library's directory.) We will also be using some excerpts from the Federal Judicial Center's Reference Manual on Scientific Evidence
In addition, we'll make use of some videos and exercises from DataCamp, which has free accounts available for educational purposes; I will explain how to get access on the first day.
We will also use some free instructional materials from the nonprofit organizations Software Carpentry and (maybe) Data Carpentry.
There will also be copious readings written by me available online on the publicfacing website, and some copyrighted stuff that I can't distribute publicly on ICON.
For the contents of this website, it's probably easiest to access them using the weekbyweek links below or the contentbased tags build into this website. There might be formatting glitches with some of the lessons, due to conversion between different file types and html. However, I'm slowly improving that as we go through the course, and, in addition, every lesson will have a link to a downloadable and printable PDF at the bottom which will ordinarily have cleaner formatting (except for long lines of code, which may be cut off in PDF but should be fine on web). Some lessons should also have downloadable Jupyter notebooks associated with them, which will also be linked at the bottom.
I am committed to only assigning resources which are free to students. However, the nature of this material is that sometimes one explanation will just "click" where another might not. So in addition to the assigned readings, I offer you this list of additional, nonfree, readings which you might consult for a different perspective on the materialor for deeper engagement and exploration.
Bonus Reading Suggestions: Statistics
I really like the Aspen textbook by Lawless, Robbennolt and Ulen, Empirical Methods in Law. It has very good clear explanations of a number of research methods topics, and is not overly mathy. If you want to dig deeper into stats and empirical research in law, I highly recommend it. I also recommend Lee Epstein & Andrew Martin, An Introduction to Empirical Legal Research.
If you want to do serious research on your own, you will need to move to more advanced texts, but the direction you go will depend on the particular kind of research you want to conduct. For expermental research, especially experimental research out in the world (like the kinds of things done by discrimination testers, about which we will talk), a classic text is Gerber & Green Field Experiments: Design, Analysis and Interpretation; if you are more interested in observational research, I really like Angrist & Pischke, Mastering Metrics. Both of those books are rathermore mathy than the others (or our class).
Charles Wheelan, Naked Statistics: Stripping the Dread from the Data, is a wellliked book on the other end of the spectrumit focuses on intuitive and nonmathy explanations of statistics topics. It's quite good in that respect, but I'm not fully comfortable recommending it for other reasons. The author thinks he's funnier than he actually is, and the book features a number of fairly tasteless, and in some cases offensive, jokes. If you can put up with that, however, the book is good at explaining stats in a clear way.
Some other books that might be of interest to you, though I haven't reviewed them as closely and hence can't clearly endorse, include:

Peter Bruce & Andrew Bruce, Practical Statistics for Data Scientists

Michael A. Bailey, Real Stats: Using Econometrics for Political Science and Public Policy

Uri Bram, Thinking Statistically
Bonus Reading Suggestions: Python Programming
My favorite introductory Python book (not free) is John Guttag, Introduction to Computation and Programming Using Python. This book is also the basis for a wonderful electronic course by almost the same name from MIT on EdX  and you can go through the course for free, and without buying the book. I really do think that course (and the second course in the same series) is an amazing way to learn Python, and programming in general.
Blessedly, there are a lot of good introductory Python programming books out there which are also available online for free. One of my favorites is Al Sweigart, Automate the Boring Stuff with Python. For more advanced (and nonfree) learning, I really love Luciano Ramalho's Fluent Python, although by the time you need that you should be looking at building fairly substantial programs.
It is better to use a Python book that is based on Python 3, not Python 2.
Bonus Reading Suggestions: General Learning
I highly recommend Barbara Oakley's book A Mind for Numbers, which is basically a selfhelp book on the psychology of learning difficult thingswhich can help you not just in mathy classes but in law school and other classes in general. There's an online course based on her book on Coursera, called Learning How to Learn; I've never looked at that course but everyone who has done so has raved about it.
How Class Will Go
This course is structured as a "flipped class": there won't be many (or possibly any) lectures (or Socratic interrogation). Rather, you will consume the talking kind of instruction outside of the classroom, primarily through readings and through some thirdparty videos. Classroom time will mostly be a lab format: I will demonstrate the practical usage of the things you've learned about outside of class, and (more commonly) assign exercises for you to carry out, with the opportunity to work together to figure them out and with me looming over your shoulder to help.
Please bring a computer to every class. Mac, Linux, or Windows computers will work best. Chromebooks and tablets will work less well, though we can get them to work if need be.
The coverage, pace, and workload in this class will be a continuing work in progress. Because this class isn't taught a lot in law schools, there is not much collective wisdom on how to do it successfully, and I expect to have to adapt the assignments and the pace to accommodate how readily the class takes to the material. So don't expect the assignments at the bottom of this document to be stable. They will change. Possibly lots. I haven't even written the assignments beyond a few weeks out, because I expect that they'd need to be radically changed. In other words, this course is experimentalI'll do my best to make it worth your while, but I disclaim all warranties about how well it will work this first time around.
Class Platforms
The nature of this class is such that we will not be able to limit ourselves just to resources provided by the University of Iowa. We will avail ourselves of a variety of other platforms through the semester, including:

DataCamp (NO LONGER: SEE NOTE BELOW)
NOTE ON DATACAMP
In April 2019, it came to light that a DataCamp executive sexually assaulted an employee in 2017; numerous instructors for DataCamp as well as community organizations at large have noted that the organization's response has been wholly inadequate. See this post and this one for more details. In light of these events, and in solidarity with the women who have noted the ways in which DataCamp's response to the incident replicate toxic patterns of corporate responses to women's complaints of workplace discrimination and misconduct, a number of members of the data science community have declined to support the use of DataCamp products. This seems to me to be the right choice. Consequently, I no longer endorse or recommend DataCamp material; future iterations of this course will replace all DataCamp assignments with alternative sources for the same information.
I'll walk you through getting access to these resources on the first day of class.
In addition, every session will be recorded on Panopto and available on ICON; we'll also use ICON to administer some quizzes (probably), turn in assignments, and distribute materials which (for copyright reasons, etc.) we're not allowed to distribute outside of the class. Also, you will use the discussion feature of ICON to share information and ask questions out of class.
This class is intended in part to produce resources which will be available to the legal profession at large in order to help your fellow lawyers understand code and stats as well; accordingly, many of the reading assignments will be to lessons posted on this website at sociologicalgobbledygook.com. Those assignments will also be available on the course GitHub repository; you will find it useful to get the assignments there in order to execute and mess around with the code yourself.
Evaluation
Evaluation will be primarily based on four problem sets. The first two will be computer programmingbased (with the second possibly including a probability problem or two), and will be worth 17.5% of the grade each. The third will be probability and statistics based and will be worth 25% of the grade. The fourth will be comprehensive, with emphasis on the statistics side, and will be worth 30% of the grade.
The weird fractions are to accommodate 10% of the grade which will be based on classroom participation and preparation, and which is meant to enforce the flipped classroom format. Students who get full credit for that 10% will complete the simple outofclass tasks which I will periodically assign, pass the easy inclass popquizzes which I will periodically announce, and participate in good faith in collective problemsolving in the classroom. (This is an effortbased 10%, not a performancebased 10%.)
Under the policy described in Student Handbook section B.3 ("The curve is not applicable in upperlevel seminars and other upper level classes in which a student's grade is based primarily on the student’s performance on graded skillsoriented tasks (including writing) other than a final exam."), this course will not be curved.
In order to ensure that students in this course aren’t disadvantaged by unfamiliarity with the format or a collection of toohard problem sets on which everyone struggles, there will be a floor for the distribution of grades for this course: at the lowest, the median grade for this course will be the official law school median of 3.3. In other words, you can’t do worse than the standard curve would otherwise dictate. But you can do better.
Collaboration
Problem sets should be your own work. You are allowed to discuss the general approach to problem sets with one another, but you are not allowed to show one another your math or code.
For example: "I solved that problem by writing a loop over the list of cases" is acceptable. "Look at this code I wrote" is not.
Students will be asked to agree to an honor code.
Collaboration on inclass tasks and on homework assignments that are not one of the official problem sets is highly encouraged and probably necessary.
Technology, Bugs, and Accommodations
This course will be technologically driven, obviously. Please let me know ASAP if there are any glitches of any kind.
Also, please contact me or the dean of students as soon as humanly possible if you need accommodations, so that these accommodations can be built into the tech. All course materials will be provided in formats that I believe are accessible (e.g., to screen readers), however, if I'm mistaken about their accessibility, please let me know and the problem will promptly be fixed.
Office Hours, Contacts, etc.
I will maintain office hours Mon., Tue. 1030am to noon, and 2:303:30pm. I'm also happy to make appointments at other times, and you're always free to drop by when my door is open. I'm very good at replying to email and very bad at checking telephone messages.
That being said, I very strongly encourage you to ask substantive questions in a way that will be accessible to your fellow students. This means using the copious time that will be made available in class time for that purpose, as well as making use of the discussion forum on ICON (in which I will very actively participate). If you have a question, it's almost certainly the case that several other people do too.
Some schedule notes
Monday, Jan 21 is Martin Luther King Day. No classes then, obviously. Spring Break is March 1624.
Learning outcomes
By the end of the course, you should be able to:

Write simple Python scripts to automate common tasks and explore data

Visualize data, and understand basic data visualizations

Understand common errors made in empirical research and quantitative claims, and identify those errors in published and expert reports.

Conduct basic statistical analyses.

Reason about probability and statistics at a sufficient level to be an informed consumer of quantitative claims.
Coverage by Week
The first few weeks will be spent on computation; subsequent weeks will be spent on data analysis and statistics. As we get further into the future, the below becomes more subject to change, obviously. (From week 3 onward, reading is still being added.)
I fully expect that we won't get to the material in the last few weeks, unless we somehow move at rocket speed. They're more aspirational topics. Everything past week 3 will probably be shuffled around several times as I see how fast the class progresses.
Week 1
Coverage: Basic ideas of programming, units of computation, functions and loops. Computational logic and legal logic, law as computation. We will frontload the quantity of reading a little heavily to get us started quickly, but it'll ease off as we move on to more conceptually difficult material. (Also, I know the reading seems like a lot, but it goes faster than cases in 1L year!) In particular, don't feel obliged to fully absorb everything from Python for Everybody on the first reading. Just read it quickly so you get a feel for the terrain, and then more carefully read the stuff posted on this site, then dig back into Python for Everybody to fill out the details.
For Monday, read:

the first Python lesson.

pages 155 of Python for Everybody.
In the first day of class, we will get everyone set up with the different services and installation options for the software we need in the course, and, if there's time, demonstrate some basic programming ideas and work through some exercises.
On Monday evening, clone the course assignments Github repository and complete the factorial exercise at the end of the first Python lesson notebook (basic_intro.ipynb); you don't need to turn it in, but let me know if you can't complete it; we'll look at people's solutions on Tuesday and use this as our test to make sure everyone is set up and functional.
For Tuesday, read:

Software Carpentry lesson on functions

Software Carpentry lesson on making choices
Week 2
Coverage: using Python to get access to other people's code, libraries. Accessing the filesystem and the internet from Python. Error handling. Strings.
Monday is a holiday (MLK day).
Readings:

(video and exercises) chapters 13 of DataCamp's Introduction to Python course. You don't need to do chapter 4 (about the Numpy library) right now, though you can if you want to. (UPDATE: no longer recommended, for reasons noted above; will be replaced before next semester.)

Python for Everybody pg. 55126
In class on Tuesday we worked through an example of accessing the Openstates API, that example is here.
Week 3
Coverage: Regular expressions. Simulation and why you might want to do it. A very light introduction to objectoriented programming.
Readings:

Automate the Boring Stuff With Python chapter 6 (strings)

Automate the Boring Stuff With Python chapter 7 (regular expressions)

Python for Everybody pp. 127140 and 171184 (chapters 11 and 14).
Problem set 1 due Friday, February 1, at 5pm Central time.
Week 4
Basic probability math. Bayes rule and conditional probability.
Focused legal applications: probabilistic causation in torts, junk science in criminal trials.
Readings:

the Arbital Guide to Bayes Rule (at the beginning, where it asks you to pick a level of depth, choose the full/deep presentation)

Chapter 3 (pp. 61100) of the Finkelstein and Levin Book. Direct link to F&L assignment via UI Library Proxy Server

After reading the F&L assignment read Abel and Baker Redux

(video) Peter Donnelly's TED talk How Statistics Fools Juries

NBC News 'We are going backward': How the justice system ignores science in the pursuit of convictions
Week 4 recap/further explanation of the probability issue we got stuck on
Week 5
Initial explorations into data with data visualization in Python. Basic properties of data, measures of central tendency, exploratory data analysis.
Reading:

Federal Judicial Center Manual on Scientific Evidence pp. 236240

Finkelstein & Levin chapter 1 (pp. 145)

Explore the different data visualizations available with From Data to Viz

(video and exercises) DataCamp Intermediate Python for Data Science chapter 2 (dictionaries and pandas) (UPDATE: no longer recommended, for reasons noted above; will be replaced before next semester.)
Week 6
Probability distributions, central limit theorem, hypothesis testing.
Reading:

Finkelstein & Levin, pp. 101126.

A Concrete Introduction to Probability using Python by Google's director of research,

these two excellent blog posts by a Google data scientist: Statistics for People in a Hurry and Never Start With a Hypothesis.
Problem set 2 due Friday, February 22, at 5pm Central time.
Week 7
Experiments, random assignment. Causation and correlation. Focused legal application: audit tests in discrimination cases.
Readings:

Devah Pager, The Use of Field Experiments for Studies of Employment Discrimination: Contributions, Critiques, and Directions for the Future, 609 Annals of the American Academy of Political and Social Science 104 (2007). We have access to this article via our library's subscription, this proxy link should work to download it.

The Open Introduction to Statistics pp. 1926. This book can be downloaded for free, but I will probably post an excerpt of the relevant parts on ICON (remind me!).
Note to students: I'd like you to fill out a quick threequestion midsemester survey sometime this week. It's anonymous, and I could really use your feedback to make sure we're on course in this experimental class.
Week 8
Focus week: statistical extrapolation and simulation in the law. Shonubi case.
Readings:

United States v. Shonubi (Edited version kindly supplied by Josh Fischman of UVA.).

Commentary on Shonubi, posted on ICON.
We'll spend this week catching up further, if necessary, discussing the Shonubi case, and replicating the data analysis used by experts in that case (approximatelywe don't have quite the identical dataset). As time permits, I'll introduce the basic concepts of linear regression for next week.
Week 9
Regression analysis. Linear regression. Statistical evidence of discrimination.
Readings:

Federal Judicial Center reference manual on scientific evidence, appendix to reference guide on multiple regression, pp. 333356.

F&L pg. 369371, bottom of 376380, bottom of 385393.

Lindsey Kuper, Understanding the regression line with standard units

ATA Airlines, Inc. v. Federal Express Corp., 665 F.3d 882 (7th Cir., 2011**, as edited.
Week 10
Applications and reinforcement. Slowdown week, solidify our existing knowledge by thinking about a concrete use of statistics in law: determining disparate impact in Title VII cases.
Readings:

Excerpts from 29 C.F.R. 1607 for reference/skim, no need to carefully read the whole thing.
Homework to be done before this week begins: look at the hypothetical dataset mickel.csv, and use the techniques we've learned in class to come to some conclusion about whether discrimination is occurring in the provision of public benefits in this disability services agency context. We'll go over this assignment in class at the end of week 9.
Problem set 3 due Friday, March 29, at 5pm (Now extended to the Monday after.)
Week 11
Pvalues, phacking, publication bias, the replication crisis in psychology, power and underpoweredness, multiple comparisons, and other terrible pitfalls of scientific research.
Readings:

Peter Norvig's blog post Warning Signs in Experimental Design and Interpretation.

Statistics Gone Wrong. (I'd like you to read that whole website eventually, but for the beginning of the week, you can focus on the sections on Statistical Power and Underpowered Statistics and The P Value and the Base Rate Fallacy.

Fun reading: Pvalues explained with puppies.
Discussion: what are the legal implications of scientific failures?
Week 12
How regression analysis can go horribly wrong. Assumptions of regression. Failures of regression assumptions. Simpson's paradox.
Readings:

Federal Judicial Center reference manual, reference guide on multiple regression, pp. 303332.

Blog posts/tutorials by "Statistics by Jim": Confounding Variables, [Classical Assumptions of OLS](https://statisticsbyjim.com/regression/olslinearregressionassumptions/**
Week 13
Focus week: statistics, probability and legal proof. Bayesian approaches.
This week will be seminarstyle. There will be some reading from academic articles, and Professor Sullivan will come and join us for Monday to talk through probability and proof. On Tuesday, we will continue the discussion, and perhaps do an exercise, as time permits.
Reading:
For Monday*

Jonah Gelbach, Estimation Evidence (posted on ICON), pages 131.

Sean Sullivan, A Likelihood Story: The Theory of Legal FactFinding p. 1137.
For Tuesday*

Kiel BrennanMarquez, Plausible Cause

The Bayesian New Statistics: Hypothesis testing, estimation, metaanalysis, and power analysis from a Bayesian perspective, 25 Psychonomic Bulletin and Review 178 (2018) (You may skip the sections entitled "Bayesian Hypothesis Test" and "Another example of frequentist and Bayesian approaches to hypothesis testing and estimation" from pp. 186190.

Optional bonus reading: I've added a more basiclevel explanation of some of the ideas in the above article to this website.
Week 14
Look at reallife expert witness reports, how arguments about quantitative methodology are used in court.
From statistics to machine learning: what is it that fancy data science people actually do with their time? Prediction vs inference. Algorithmic accountability. Discrimination by computer, and legal implications of statistical discrimination (intentional racial profiling and unintentional racial profiling).
Reading:

Example expert witness reports: excerpts from one plaintiff and one defendant report in ongoing litigation around Harvard admissions (a claim that they discriminate against AsianAmericans). On ICON. Note: these documents are a bit choppy, as I tried to edit the PDFs to limit it to a handful of points of contention by deleting pages, so ignore any discontinuities. Please come to class prepared to comment on one question in particular: should any data analysis include people admitted as legacies or athletes?

Galit Shmueli, "To Explain or to Predict?", Statistical Science 2010. (skim)

Harini Suresh and John V. Guttag, "A Framework for Understanding Unintended Consequences of Machine Learning"
Optional, bonus (but HIGHLY recommended) reading:

Adam M. Lauretig and Bear F. Braumoeller, "Statistics and International Security," in The Oxford Handbook of International Security (2018).

Andrew D. Martin, Kevin M. Quinn, Theodore W. Ruger, and Pauline T. Kim. "Competing Approaches to Predicting Supreme Court Decision Making." Perspectives on Politics 2004.
Final exam period
Problem set 4 and Makeup Assignment due (no standard final exam) on May 9, the last day of exams. Please turn these two assignments in together, in one file.