Problem Set 2, Spring 2020

This problem set is worth 17.5% of the grade in this course. It is due on Friday, March 6, at 5pm, via email to Diana Dewalle.

How to turn in this problem set: As before, please turn in all your code, and, if you have any prose to write, your prose (formatted as markdown cells), for all of the problems in a single Jupyter Notebook file, entitled lastname_ps1.ipynb (obviously with your last name), which you will turn in to the dropbox on ICON. The filename should be the only place with your name in it. Diana Dewalle will anonymize the files by replacing the name with a number before sending them to me to grade.

For each of the problems, you need not package all of your code in a single function. For example, you may write helper functions, import libraries outside of your function, etc. However, you must turn in a single Jupyter Notebook file with all the code that makes your functions work as well as the functions themselves. I expect to be able to run all the cells in the notebook you turn in, and, at the end, have a working function for each problem, so that I can call those functions with appropriate input and get the correct result. In all cases, your functions should return None if they receive inappropriate input.

HONOR CODE

Each student must agree to the following honor code; if you do not agree to this honor code, do not turn in this problem set and drop the course.

I will not solicit, receive, or provide any unauthorized assistance on this or any problem set in this course. I understand that unauthorized assistance includes seeing the work of any other student, including code, calculations, data analysis, or answers to problems. Students are permitted to discuss their general strategies for solving problems with one another, help one another understand lesson material or documentation for Python libraries, and the like, or other conversation short of sharing actual answers, calculations, data analysis, or code. Students are not permitted to seek or receive assistance or advice whatsoever from any person not enrolled in this course---this prohibition includes asking for help on public online forums such as stackoverflow.com, although students are free to search such websites and learn from preexisting discussions in which they are not participants.

If I have any doubt about the permissibility of any kind of assistance, I will seek a ruling from the professor. I will report any violations of this honor code of which I am aware. Violating this honor code is grounds for immediate disenrollment from this course, or other sanctions as provided for by College of Law and University policy on academic dishonesty.

Problem 1: Fun with APIs, continued (30 points)

Remember problem 3 from the previous pset? I'd like you to go back and use the Caselaw Access Project API again, only, this time, I'd like you to plot line charts of the following two time series, on the same chart:

  • The total number of the uses of the words "pork," "pig," "pigs," "hog," or "hogs" in the Iowa state courts within the CAP dataset, and

  • The total number of the uses of the words "corn," "sweetcorn," "maize," or "ethanol" in the same courts.

In other words, I'd like you to produce a chart with two lines: one is the number of uses of the pork words, the other is the number of uses of the corn words. On the y axis is number of uses of words; on the x axis is year.

Hint: Check out the last example chart for the 'plottyprint' library, which I have released. You may feel free to use this library, or to borrow some of the code from it.

Problem 2: The Inexpert Witness (70 points)

Harriet Hawkeye, the owner of a sports fan apparel store in Iowa City, is being prosecuted by the State of Iowa for sales tax fraud. The key piece of evidence is that the sales figures that Harriet reported to the state did not reliably increase on home football game days. There were 5 home games that year (note to students: I don't actually know or care how many football games there are, don't fight the hypo), and Harriet's sales didn't increase beyond her average range on any of them.

The prosecution calls Dr. Carl Cyclone, a professor of Economics at Iowa State, to testify, and Carl testifies as follows:

  • Based on his research, the average sales of sports fan apparel in Iowa increased substantially in almost every store in the state on home game days. Of 1000 store-homegame pairs in the state in the last year other than at Harriet's store (i.e., 1000 is the product of the number of stores and the number of home games per store, so 3 stores that each had 3 home games in their town would be 9 store-homegame pairs), 800 of them had a substantial increase in sales over the store in question's daily average.

  • Therefore, there's a 80% change that sales on any given day will increase substantially if there's a home game.

  • The probability of Harriet not experiencing an increase in sales for all five days is thus \(0.2^5 = 0.00032\), which is so vanishingly small that, in his expert opinion, Harriet must have actually had more sales than she reported on at least one home game day.

You represent Harriet.

Subquestion 1 (20 points): Describe in words what problematic mistake or assumption Carl makes.

Subquestion 2 (20 points): What kind of evidence could you look for to undermine Carl's argument?

Subquestion 3 (20 points): Write a simulation to illustrate the effects of correcting the problem that you identified in subquestion 1. I know that this subquestion is a little ambigious and open-ended, so here's all I can really say as a hint without giving away the answer: there is at least one, and possible several, parameter(s) that you could plausibly vary, which would change the ultimate probability calculation that Carl made. Your simulation should vary that parameter/those parameters and see what outcomes come out.

Subquestion 4 (10 points): Use an appropriate data visualization to illustrate the simulation you wrote in the previous subquestion.

links