You see what we did there? We used one search to match both the dollar amount $1,280,012.05
and dollar amounts like 100.00
and 9.99
while not matching 100
in the first example letter, which doesn't look like a dollar amount. Try doing that with Google or Control-F (I think old versions of MS Word used to have something resembling regex capacity, but I can't find that in any online documentation anymore. Sorry law review editors, but you might try saving into plain text formats in order to try tasks like searching for citations...)
Let's look at the code.
import re
pattern = r'\d*,?\d*,?\d{0,3}\.\d\d'
matches = []
for doc in docs:
if re.search(pattern, doc):
matches.append(doc)
Most of that code is pretty self-explanatory, but the one part that's new is the line where we define the pattern and has this wild r'\d*,?\d*,?\d{0,3}\.\d\d'
thing that sort of looks like Klingon. Unsurprisingly, that's a regular expression pattern.
One very useful way to build regular expressions is to use an application that lets you enter in sample text and try out different regular expressions to see what they match. My favorite is the free webapp Regex101. The neat thing about Regex101 is that it lets you save your examples and share them (using the little menu hamburger icon thingey on the left). So I've created a saved version of this search with our sample text that shows exactly how it works. Check it out here!
But before we get to our complicated pattern, let's start simpler. Regular expressions are basically just amped up search strings, and we know how to search, right? We can use ordinary strings as regexes as well.
First, let's abstract out our search into a function, then let's look at some simple examples.