Problem Set 1: Answers and Explanations

Problem 1: Your First Class

Write a class, called Citation, which takes the following required parameters: first_page (an integer), last_page (an integer), reporter (a string), and year (an integer), as well as the optional parameter name (a string). That class should have the method cite() which will print out a citation to the case, using all of the information it has.

For example, I should be able to create a citation in either of the following forms:

my_case1 = Citation(20, 15, "U.S.", 2050)

my_case2 = Citation(20, 15, "U.S.", 2050, name="Gowder v. Fictional Opponent")

and then if I call print(my_case2.cite()) I should get back Gowder v. Fictional Opponent, 20 U.S. 15 (2050).


This problem serves two roles: first, it just is meant to reinforce the lesson that we had on object-oriented programming, and to make you write your own class; second, it will give you a class to use on the next problem. Here's my answer:

In [1]:
class Citation(object):
    def __init__(self, first_page, last_page, reporter, year, name = None):
        self.first_page = first_page
        self.last_page = last_page
        self.reporter = reporter
        self.year = year = name
    def cite(self):
            return f'{}, {self.first_page} {self.reporter} {self.last_page} ({self.year})'
        return f'{self.first_page} {self.reporter} {self.last_page} ({self.year})'
In [2]:
my_case1 = Citation(20, 15, "U.S.", 2050)

my_case2 = Citation(20, 15, "U.S.", 2050, name="Gowder v. Fictional Opponent")
In [3]:
20 U.S. 15 (2050)
In [4]:
Gowder v. Fictional Opponent, 20 U.S. 15 (2050)

A couple of quick notes on this code.

  1. There was a small mistake in the problem as written, as I said that your cite method should print a citation to the case, but then the examples I gave showed me calling the print function on the return value of that method. That was my fault; I will accept versions of the code that return a properly formatted string (as above), or versions that simply print it directly from the method.

  2. I used a feature of Python that you haven't seen before called "f-strings" to build up the formatted string. You are, of course, totally free to use string concatenation (adding up the bits of the string with plus symbols, like str(self.first page) + ' ' + self.reporter + ' ' + str(self.last_page)' etc. etc.) or anything else. F-strings are just a little more attractive. This web page has a nice explanation of f-strings.

  3. The best way to make the name parameter optional is to give it a default value of None, as I did here. Then you can check to see if it's there (using the fact that None is falsey, that is, it evaluates to False in a conditional) and produce different strings depending on whether it is or not.

  4. My cite() method also uses a technique called "early return." This is where, in a function, instead of using an if-else structure, you use an if, and then if the conditional is true, you return from inside that conditional (and thus never execute the rest of the function, since a return statement immediately exits the context of the function and goes back to the rest of the code); if not, then the rest of the function continues and returns the alternate path. Again, this is totally optional; you're free to use an ordinary if-else.

Problem 2: Extracting Citations

Write a function called scotus_finder() that can take a string with some unknown number of citations to the U.S. Supreme Court in it, and return a list of Citation objects as in the previous problem. Your function should be able to handle cases cited to either the U.S. reporter ("U.S.") or to the Supreme Court reporter ("S.Ct.") (you don't need to handle "L.Ed." or any of the weird old reporters). You do not need to be able to identify the name of the case. You may assume that citations are in the standard form, e.g., 123 S.Ct. 456 (2001).

For example,

my_string = "The best case in the world is Prince v. The End of the Century, 22 U.S. 50124 (1999), and I like it"

found_cases = scotus_finder(my_string)


should print 22. U.S. 50124 (1999)


The goal of this problem is to get you comfortable with constructing regular expressions, as well as sorting through the mess of Python library documentation to figure out how to extract multiple matching citations etc.

The best way to approach this problem is to use a site like regex101 to incrementally build up your solution---write a sample paragraph with several citations of different formats, as well as some other random junky things like stray numbers that you don't want to accidentally identify. Then, first, figure out how to identify U.S. reporter cites, then figure out how to identify S.Ct. reporter cites, and so forth. Then figure out how to extract the individual bits of the citation.

Also, you don't need to use just one regular expression! You can have a separate regex for U.S. reporter citations and for S.Ct. citations. That might run a nanosecond or two slower, but who cares?

In [5]:
sample_paragraph = "The best case in the world is Prince v. The End of the Century, 22 U.S. 50124 (1999), and I like it.  The worst case in the world is Gowder v. Fictional Opponent, 20 U.S. 15 (2050), because Gowder lost, and as we all know, Gowder should never lose. Unless Fictional Opponent is a fictional cat. Another kind of case that could exist is Dean Washburn v. Gowder, 20 S.Ct. 1234 (2020), but the dean would never be mean enough to sue Gowder."

us_cite = r'(\d+) (U\.S\.) (\d+) \((\d{4})\)'
sct_cite = r'(\d+) (S\.Ct\.) (\d+) \((\d{4})\)'
In [6]:
import re
In [7]:
us_cites = re.finditer(us_cite, sample_paragraph)
In [8]:
def make_citation_from_match(match):
    return Citation(,,,
In [9]:
def scotus_finder(paragraph):
    out = []
    us_matches = re.finditer(us_cite, paragraph)
    sct_matches = re.finditer(sct_cite, paragraph)
    for x in us_matches:
    for x in sct_matches:
    return out
In [10]:
sample_results = scotus_finder(sample_paragraph)

The above is a totally correct answer (I didn't say that you had to print anything), but it would be useful to prove it. So let's just loop over the results and see what we got.

In [11]:
for x in sample_results:
22 U.S. 50124 (1999)
20 U.S. 15 (2050)
20 S.Ct. 1234 (2020)

This answer leverages careful reading of the Python documentation on the re module. In particular, notice that it uses re.finditer() rather than re.findall() because the former returns match objects while the latter only returns strings, and match objects allow us to use capture groups in order to extract the individual bits that we need to use our Citation object that we created in the previous problem.

You may have noticed that the documentation for re.finditer() says it returns an "iterator" rather than a list. I hope you looked that up if it looked unfamiliar. For your information, however, an iterator is basically a list where individual items aren't loaded into memory until you try to access them. So you can still loop over an iterator just like a list, like I did above, but printing it and the like won't give you the contents.

You may also notice that I created a helper function, make_citation_from_match(), so that I didn't have to put all of the logic in one big complicated function. That's often a good idea, just to make things easier to read and to debug.

Problem 3: Fun with APIs

Using the Caselaw Access Project API, find answers to the following question; please show your code as well as your answer:

What is the citation for the most recent case in Iowa that uses the word "feline?"


This problem requires making use of the API, figuring out how to read its documentation and process its results.

Note that you don't even need an API key to access this data. If you look at the documentation, you can see that unregistered users can access everything except for full text cases from non-whitelisted jurisisdictions, and we don't need full text...

In [12]:
import requests

endpoint = ''

felines = requests.get(endpoint)

results = felines.json()

Now, here, if we actually look at the results, the format is pretty easy to make sense of. We can actually eyeball the below and see that the correct answer is 461 N.W.2d 478 --- or, if you want to be all full-fledged, In the Interest of N.M.W., 461 N.W.2d 478 (1990).

Practice makes it easy to eyeball results like this and see how deep you need to drill down to get particular information, by the way, but take a look at our video and in-class example from Feb 10 if you need a more structured method.

But let's use code to get our answer. Looking at the results also gives us a clue as to how to figure out the most recent case: the dates are in the format YYYY-MM-DD.

In [13]:
{'count': 2,
 'next': None,
 'previous': None,
 'facets': {},
 'results': [{'id': 10603516,
   'url': '',
   'name': 'In the Interest of N.M.W., A Child. Appeal of B.W., Mother',
   'name_abbreviation': 'In the Interest of N.M.W.',
   'decision_date': '1990-08-30',
   'docket_number': 'No. 89-1620',
   'first_page': '478',
   'last_page': '483',
   'citations': [{'cite': '461 N.W.2d 478', 'type': 'official'}],
   'volume': {'volume_number': '461',
    'url': '',
    'barcode': '32044061417150'},
   'reporter': {'url': '',
    'full_name': 'North Western Reporter 2d',
    'id': 892},
   'court': {'name': 'Iowa Court of Appeals',
    'url': '',
    'id': 18945,
    'name_abbreviation': 'Iowa Ct. App.',
    'slug': 'iowa-ct-app'},
   'jurisdiction': {'slug': 'iowa',
    'id': 45,
    'name_long': 'Iowa',
    'url': '',
    'name': 'Iowa',
    'whitelisted': False},
   'frontend_url': '',
   'preview': ["the bathroom, the cats had defecated along the bathtub and some of N.M.W.’s clothing was stuck to the <em class='search_highlight'>feline</em>",
    "the bathroom, the cats had defecated along the bathtub and some of N.M.W.’s clothing was stuck to the <em class='search_highlight'>feline</em>",
    "home where the cats defecated along the bathtub where some of the child’s clothing was stuck to the <em class='search_highlight'>feline</em>"]},
  {'id': 4446653,
   'url': '',
   'name': 'State of Iowa, appellee, v. Thomas Zbornik, appellant',
   'name_abbreviation': 'State v. Zbornik',
   'decision_date': '1957-02-05',
   'docket_number': 'No. 49080',
   'first_page': '450',
   'last_page': '458',
   'citations': [{'cite': '248 Iowa 450', 'type': 'official'},
    {'cite': '80 N.W.2d 735', 'type': 'parallel'}],
   'volume': {'volume_number': '248',
    'url': '',
    'barcode': '32044078640489'},
   'reporter': {'url': '',
    'full_name': 'Iowa Reports',
    'id': 474},
   'court': {'name': 'Iowa Supreme Court',
    'url': '',
    'id': 9299,
    'name_abbreviation': 'Iowa',
    'slug': 'iowa'},
   'jurisdiction': {'slug': 'iowa',
    'id': 45,
    'name_long': 'Iowa',
    'url': '',
    'name': 'Iowa',
    'whitelisted': False},
   'frontend_url': '',
   'preview': ["There is no showing what was meant by the police “kitty” or in fact whether such a pseudo-<em class='search_highlight'>feline</em> existed"]}]}

There are lots of different ways we could figure out which case is most recent by code. For example, we could use the datetime library to parse these strings into Python's internal date representation and then compare by that. However, that's a lot of work, we'd have to burn our eyeballs out looking at the terrible date format codes that all the programmers use and such. Meh. Let's just write a function to compare them as strings. After all, Python is pretty good at comparing strings, and we can do some experiments to see what the behavior is like:

In [14]:
"2020-01-02" > "2019-12-30"
In [15]:
"2020-01-02" > "2020-01-03"
In [16]:
"2020-01-02" < "2020-01-03"
In [17]:
"2020-01-02" < "2020-02-01"

That looks like correct behavior, right? What we're doing here (cheating horribly, in some cosmic sense) is making use of the apparent fact that Python's version of alphabetical order can also handle digits, and appears to count higher digits as higher. So when we compare two dates in the very convenient year-month-date format that the CAP API gives us, they're already in a natural ordering where most recent dates come first! Yay!

The other trick that we can use is that the Python list sorting functions have an optional parameter that allows you to take a list of some kind of complex data structure (like a dictionary) and sort it by the results of some function called on each item of those functions. Usually, people use what are called "anonymous functions" or "lambda functions" to do this, and the example of sorting a list called student_tuples in the documentation demonstrates that technique. But, since we haven't talked about that Python feature, I'll just use a perfectly standard function to do it.

Remember, a sorted list starts with the lowest one, so since we want the most recent, we want the last item, indexed by -1 in a list.

In [18]:
def get_year(case):
    return case["decision_date"]

answer = sorted(results["results"], key=get_year)[-1]['citations'][0]['cite']
In [19]:
461 N.W.2d 478

The sharp-eyed among you, however, will have done even less work than this, because you'll have noticed that the CAP API allows you to get results that come out sorted anyway. (Go look at the documentation under "searching" if you don't believe me!

Hence, the following is an alternate (and much more compact) way to get the same result:

In [20]:
concise_endpoint = ''
concise_answer = requests.get(concise_endpoint).json()['results'][0]['citations'][0]['cite']
In [21]:
461 N.W.2d 478

Problem 4: Fun with APIs continued

Using the same API as before: How many total times has the word "pork" been used in cases from the Iowa state courts in the CAP database? (Note: not number of cases, I want number of uses of the word.)


As you know, I cancelled this problem. But actually, I shouldn't have. It's a lot easier than I let your fellow students convince me that it was!

When I wrote the problem, I had actually intended you to use the ngrams function of the CAP API, which... directly returns the answer you want, without even having to read individual cases, search for terms, etc.

I got a bunch of panicked students coming to see me who were trying to do full case searches, and were having trouble accessing full text, were receiving results that came in different chunks ("paginated" API people sometimes say, where you have to use a "cursor" to get at the next one), etc. etc. And I had visions of having people have to figure out how to get elevated API privileges to get full texts, writing complicated functions to get the next page of results, etc. etc. etc. And so I cancelled the problem.

But then I sat down and decided to do it myself for this answer... only to have my memory triggered, and realize that the CAP api literally provides this functionality. Let's do it in just one line (after already importing the requests library), just to be egregiously fancy!

In [22]:
total_pork_utterances = sum([y['count'][0] for y in requests.get('').json()['results']['pork']['iowa']])

I swear I didn't deliberately look for something with the answer "420."

Anyway, a good homework assignment for yourself would be to try to understand that one really complicated line of code that I just ran. Look up "list comprehensions" in Python for a start.

In [ ]: