There are a bunch of different ways we might think about the example of the application from the fake data from yesterday. We saw a test where the null hypothesis was that the application offering rate for black renters is equal to the overall application offering rate. With that hypothesis, we saw a binomial test, and here's a slightly more filled out version of that test.

In [1]:

import pandas as pd
from scipy.stats import binom_test
df = pd.read_csv("classdata/simulated_housing_test.csv")

In [5]:

# looking to refresh our memory of the names of variables and such
df.head()

Out[5]:

	application	race	rent
0	1	white	526.0
1	1	white	514.0
2	1	white	512.0
3	1	white	485.0
4	1	white	505.0

In [6]:

general_prob_app = df.application.value_counts()[1] / len(df)

In [7]:

general_prob_app

Out[7]:

0.625

In [8]:

number_black_app_offered = df[df.race == 'black'].application.value_counts()[1]
number_black_testers = len(df[df.race == 'black'])
p = binom_test(number_black_app_offered, number_black_testers, general_prob_app)

In [9]:

print(p)

0.11180948620251761

In [ ]:

Sociological Gobbledygook

In-Class Notebook, Mar 3, 2020

links