Visualizing Lansing Traffic Stop Data

Visualizing Lansing Traffic Stop Data

<p style="text-align: right;"> Brandon McIntyre </p>

<p style="text-align: right;"> Analysis of Lansing Traffic Stops

CMSE 402 SS20 </p>

Final Writeup

Data Used

Traffic Stop Data of Lansing Police Department (02/01/2016 - 02/13/2019): http://data-lansing.opendata.arcgis.com/datasets/32b48df10c674a7aace77071046ba272_0?selectedAttribute=Date

Census Data/ ACS Data: https://data.census.gov/cedsci/table?q=Lansing%20MI&g=1600000US2646000&hidePreview=false&tid=ACSDP5Y2018.DP05&vintage=2010&layer=VT_2018_160_00_PY_D1&cid=DP05_0001E

Note

Looking at Lansing Police Department MATS Data Analysis for Year 17 (https://lansingmi.gov/DocumentCenter/View/6979/MATS-Months193-204-May-2018) it is mentioned that using Census Data is not 100% accurate as driver demographics do not necessarily match census demographics. This makes using Census data troublesome. This site (https://nij.ojp.gov/topics/articles/racial-profiling-and-traffic-stops) goes into more detail on the subject. Thus it appears that to get proper comparsion, one must look at the demographics of all licensed holding drivers. This information is only avalible on a state level and only on Sex. https://www.fhwa.dot.gov/policyinformation/statistics/2016/dl1c.cfm . (Atleast offically recognized datasets that I could find). So with this, I will continue to the use ACS data. However, it may not produce 100% accurate results. I also tried looking for more MATS traffic stop data from lansing department, however I was unable to.
(https://data.census.gov/cedsci/table?q=Lansing%20MI&g=1600000US2646000&hidePreview=false&tid=ACSDP5Y2018.DP05&vintage=2010&layer=VT_2018_160_00_PY_D1&cid=DP05_0001E)

Upon more research I have also found since the writeup a traffic stop analysis for Year 18 (https://lansingmi.gov/ArchiveCenter/ViewFile/Item/655)
and Year 16 https://www.lansingmi.gov/Archive/ViewFile/Item/603

Interesting News since April 1st

Here are some interesting articles just posted on 4/1/20 about this particular issue in the East lansing police department. (They all say the same excat thing, some with visualizations (That are not particularly good)) (https://www.lansingcitypulse.com/stories/mayor-east-lansing-police-over-stop-black-drivers-which-is-not-acceptable,14053) (https://www.lansingstatejournal.com/story/news/2020/04/01/east-lansing-wants-changes-after-study-shows-race-bias-police-stops/5102549002/) (https://eastlansinginfo.org/content/newly-released-data-show-racial-bias-east-lansing-policing) (https://statenews.com/article/2020/04/elpd-race-data-report-shows-african-americans-over-stopped-in-east-lansing?ct=content_open&cv=cbox_featured) (https://www.wilx.com/content/news/City-of-East-Lansing-releases-race-data-for-ELPD-officer-initiated-contacts-569281531.html)

The articles state that in the past 2 months there have been an “over-stop of black drivers”. This came from “The East Lansing Police Department began gathering race data on officer-initiated contacts, like traffic stops, in February after a black 19-year-old accused an officer of using excessive force during an arrest. The study was requested by the City Council. “ The study states “Though black residents comprise 8% of East Lansing’s population, they accounted for 22% of police officer-initiated contact in February and March, according to a department study”.

Now this is particualry interesting based upon the fact that the EL study is based upon census data. The Lansing Police department made it specific that Census data is not conclusive based upon the difference between census and actual licensed drivers. The city pulse article actaully references the Lansing Data that I am looking at now and say that similar trends are in the Lansing Traffic stop data as well.

This information is of particular interest because honeslty I don’t know which side to belive. I feel like the issue in this situation stems from the lack of information when it comes to demographics on lisenced drivers. Not having this causes Lansing police to say that there is no racial biasness, yet EL police department to say that there is a bias and they want to correct it. Who is in the wrong and who is in the right?

I have tried to find the data from EL that the study is on, but all I can seem to find are there weekly reports that do not include the race (https://www.cityofeastlansing.com/Archive.aspx?AMID=47). Further more, another point of interest on this topic, the annual report that EL police deparment puts out does not include race in any manner (https://www.cityofeastlansing.com/DocumentCenter/View/910/ELPD-Annual-Report-2019-PDF?bidId=). Leading me to think that EL is not as experinced when it comes to the lansing department when it comes to answering the question of racial biasing.

Variable: Exploration and Plots

Questions

Libraries

import pandas as pd
import numpy as np
import pytz
import seaborn as sns
import matplotlib
import matplotlib.ticker as mtick
import matplotlib.pyplot as plt

Load in Data

Traffic Stop Data

Data came not sorted. Data was sorted by time. Time was also off by 5 hours as it was given in GMT. The data was corrected to be moved to EST

traffic = pd.read_csv("./Traffic_Stops.csv", parse_dates=["Date"])
traffic.sort_values("Date", axis=0, ascending=True, inplace=True)
traffic.reset_index(inplace=True,drop=True)
traffic["Date"] = traffic["Date"].dt.tz_convert(pytz.timezone('US/Eastern'))
traffic
Date Team_Area Street_Area Reason_for_Stop Race_Ethnicity Gender Age Search_Performed Search_Authority Discovery_If_Searched Result_of_Stop Officer_Badge Traffic_Crash Serial_Number ObjectId
0 2016-02-01 00:03:00-05:00 4 Jolly Equipment Violation White Male 23 None Citation 158 No 48071 1
1 2016-02-01 00:04:00-05:00 1 Oakland Equipment Violation Unknown Male 42 None Citation 225 No 48072 3
2 2016-02-01 00:12:00-05:00 3 Other Equipment Violation African-American/Black Male 31 None Citation 62 No 48073 4
3 2016-02-01 00:37:00-05:00 4 Oakland Moving Violation African-American/Black Female 21 None Citation 158 No 48074 7
4 2016-02-01 00:40:00-05:00 4 Cedar Moving Violation White Female 33 None Warning 124 No 48075 8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
20644 2019-02-12 14:00:00-05:00 1 M.L. King Jr. Other African-American/Black Male 42 Driver Incident to Arrest Nothing Arrest 89 No 73879 20645
20645 2019-02-12 14:06:00-05:00 1 Saginaw Moving Violation White Male 33 None Warning 202 No 73875 20646
20646 2019-02-12 14:39:00-05:00 1 Other Registration Unknown Male 28 None Citation 202 No 73877 20647
20647 2019-02-12 16:02:00-05:00 2 Michigan Registration White Female 26 None Citation 245 No 73881 20648
20648 2019-02-12 19:36:00-05:00 2 Michigan Equipment Violation African-American/Black Female 23 None Nothing Warning 149 No 73883 20649

20649 rows × 15 columns

Add an “Hour” Column for later use

traffic["Hour"] = traffic["Date"].dt.hour
traffic.columns
Index(['Date', 'Team_Area', 'Street_Area', 'Reason_for_Stop', 'Race_Ethnicity',
       'Gender', 'Age', 'Search_Performed', 'Search_Authority',
       'Discovery_If_Searched', 'Result_of_Stop', 'Officer_Badge',
       'Traffic_Crash', 'Serial_Number', 'ObjectId', 'Hour'],
      dtype='object')

Subsets of MATS report’s Data

year16 = traffic[(traffic["Date"] >= '2016-02-12 00:00:00-05:00') & (traffic["Date"] <=
'2017-02-11 23:59:59-05:00')]
year17 = traffic[(traffic["Date"] >= '2017-02-12 00:00:00-05:00') & (traffic["Date"] <=
'2018-02-11 23:59:59-05:00')]
#Not sure why they skipped 02-12-2018?
year18 = traffic[(traffic["Date"] >= '2018-02-13 00:00:00-05:00') & (traffic["Date"] <=
'2019-02-12 23:59:59-05:00')]

Census Data

Another issue with the census Data I have found is th the races are not clearly seperated. I cannot make 100% devided between the values of the traffic stop races and the races reported on the census. This is due to the fact that hispanic and latino is a seperate section in the census. There is a section called “Two or more races” that makes this hard to calculate. I will therefore only add together Under the Latino Section the single races plus latino and leave out the double races. This will hopefully evenly distribute the 6% that is more than one race.

The excel is included (census_calc.xlsx) that has all the numbers and calculations I used to get the percentages of each race and gender. What I am loading in is the csv that uses the same values from the calcualtion excel file, but minus all of the formulas for easy load in to pandas

census = pd.read_csv("./ACS_data/census.csv")
census_2016 = census.iloc[0]
census_2017 = census.iloc[1]
census_2018 = census.iloc[2]
census_ALL = census.iloc[3]
census
Year Male Female White Black Native Asian_Pacific Hispanic
0 2016 0.479 0.521 0.593992 0.228385 0.004449 0.041471 0.131703
1 2017 0.475 0.525 0.562655 0.234408 0.003599 0.063806 0.135532
2 2018 0.462 0.538 0.576798 0.234277 0.003184 0.037995 0.147745
3 ALL 0.480 0.520 0.590836 0.230128 0.004314 0.039975 0.134747

Visualization Exploration of Data

Analysis of Traffic Stops Overtime

traffic.groupby("Hour").count()["Date"].plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x2907411c9c8>

png

Now looking at the MATS report for year 18 (February 13, 2018, through February 12, 2019).

year18.groupby(traffic["Date"].dt.hour).count()["Date"].plot(kind="bar")
<matplotlib.axes._subplots.AxesSubplot at 0x2907407e248>

png

MATS report plot for comparison from Year 18

Reason for Stop

traffic["Reason_for_Stop"].unique()
array(['Equipment Violation', 'Moving Violation', 'Other', 'Registration',
       'Investigative Stop'], dtype=object)
values, counts = np.unique(traffic["Reason_for_Stop"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 5 artists>

png

by_reason = traffic.groupby(by=["Hour","Reason_for_Stop"],as_index=False).count()
by_reason = by_reason.pivot(index="Hour",columns="Reason_for_Stop", values= "Date")
by_reason.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()
#plt.title("Reason for Traffic Stop by Hour")

png

Race/Ethnicity

traffic["Race_Ethnicity"].unique()
array(['White', 'Unknown', 'African-American/Black',
       'Asian-Pacific Islander', 'Hispanic', 'Native American'],
      dtype=object)
values, counts = np.unique(traffic["Race_Ethnicity"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 6 artists>

png

counts_race = "{}:{} , {}:{} , {}:{} , {}:{} , {}:{} , {}:{}".format(values[0],counts[0],values[1],counts[1],values[2],counts[2],values[3],counts[3],values[4],counts[4],values[5],counts[5])
print(counts_race)
African-American/Black:6728 , Asian-Pacific Islander:319 , Hispanic:1334 , Native American:11 , Unknown:2109 , White:10148
races_total = {}
j=0
for i in values:
    races_total[i] = counts[j]
    j = j+1
races_total
{'African-American/Black': 6728,
 'Asian-Pacific Islander': 319,
 'Hispanic': 1334,
 'Native American': 11,
 'Unknown': 2109,
 'White': 10148}
race_percentage = {}
j=0
for i in values:
    if(i == "Unknown"):
        continue
    race_percentage[i] = counts[j]/len(traffic[traffic["Race_Ethnicity"] != "Unknown"])
    j = j+1
race_percentage
{'African-American/Black': 0.362891046386192,
 'Asian-Pacific Islander': 0.01720604099244876,
 'Hispanic': 0.07195253505933118,
 'Native American': 0.0005933117583603021,
 'White': 0.11375404530744336}
by_race = traffic.groupby(by=["Hour","Race_Ethnicity"],as_index=False).count()
by_race = by_race.pivot(index="Hour",columns="Race_Ethnicity", values= "Date")
by_race.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()
#plt.title("Race by Traffic Stop by Hour")

png

Gender

traffic["Gender"].unique()
array(['Male', 'Female'], dtype=object)
values, counts = np.unique(traffic["Gender"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 2 artists>

png

by_gender = traffic.groupby(by=["Hour","Gender"],as_index=False).count()
by_gender = by_gender.pivot(index="Hour",columns="Gender", values= "Date")
by_gender.plot(kind="bar",subplots=True,layout=(1,2),sharex=True, sharey=True,figsize=(10,3))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Analysis of Age

Sorted_age_indx = np.argsort(traffic["Age"].values)
traffic["Age"][Sorted_age_indx].unique()
array([-7167, -7154, -6181,  -977,  -955,     0,     1,     2,    11,
          12,    13,    14,    15,    16,    17,    18,    19,    20,
          21,    22,    23,    24,    25,    26,    27,    28,    29,
          30,    31,    32,    33,    34,    35,    36,    37,    38,
          39,    40,    41,    42,    43,    44,    45,    46,    47,
          48,    49,    50,    51,    52,    53,    54,    55,    56,
          57,    58,    59,    60,    61,    62,    63,    64,    65,
          66,    67,    68,    69,    70,    71,    72,    73,    74,
          75,    76,    77,    78,    79,    80,    81,    82,    83,
          84,    85,    86,    87,    88,    89,    90,    91,    92,
          93,   101,   120,   121], dtype=int64)

Ok Lets assume anyone lower than 14 is not correct, however, it could be true. I will just take 14 and up since typically you can start drivers training at 14 and 9 months in michigan. Also there is a jump after 93 to 101. It may be true that are drivers older than 93 however, I will exclude the data because it could be an error.

Age_corrected_filter = ((traffic["Age"] > 13) & (traffic["Age"] < 94))
traffic["Age"][Sorted_age_indx][Age_corrected_filter].unique()
array([14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
       31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
       48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
       65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
       82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93], dtype=int64)
traffic["Age"][Age_corrected_filter].hist(bins=80,width=1)
<matplotlib.axes._subplots.AxesSubplot at 0x29074c7bd48>

png


Search Performed

traffic['Search_Performed'].unique()
array(['None', 'Driver', 'Vehicle', nan, 'Passenger'], dtype=object)
search_p_clean = traffic[~traffic["Search_Performed"].isnull()]
# Will need to look into values, probably something to do with the nan value
values, counts = np.unique(search_p_clean["Search_Performed"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 4 artists>

png

by_door = search_p_clean.groupby(by=["Hour","Search_Performed"],as_index=False).count()
by_door = by_door.pivot(index="Hour",columns="Search_Performed", values= "Date")
by_door.plot(kind="bar",subplots=True,layout=(2,2),sharex=True, sharey=True,figsize=(10,5))
[search_p_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

search_p_positive_clean = search_p_clean[search_p_clean["Search_Performed"] != "None"]
by_door = search_p_positive_clean.groupby(by=["Hour","Search_Performed"],as_index=False).count()
by_door = by_door.pivot(index="Hour",columns="Search_Performed", values= "Date")
by_door.plot(kind="bar",subplots=True,layout=(2,2),sharex=True, sharey=True,figsize=(10,5))
[search_p_positive_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Search Authority

traffic['Search_Authority'].unique()
array(['      ', 'Incident to Arrest', 'Terry Cursory', 'Consent',
       'Plain View', 'Tow Inventory', 'Parole/Probation'], dtype=object)
print("Total number not searched",len(traffic[traffic["Search_Authority"] == '      ']))
print("Total number searched",len(traffic[traffic["Search_Authority"] != '      ']))
Total number not searched 18748
Total number searched 1901
search_clean = traffic[traffic["Search_Authority"] != '      ']
values, counts = np.unique(search_clean["Search_Authority"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 6 artists>

png

by_search = search_clean.groupby(by=["Hour","Search_Authority"],as_index=False).count()
by_search = by_search.pivot(index="Hour",columns="Search_Authority", values= "Date")
by_search.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7))
[search_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Discovery If Searched

traffic['Discovery_If_Searched'].unique()
array(['       ', 'Nothing', 'Alcohol', 'Drugs', 'Drugs/ Alcohol', 'Cash',
       'Other Property', 'Drugs/ Cash', 'Weapon/ Drugs', 'Weapon',
       'Vehicle/ Drugs', 'Drugs/ Other Property', 'Cash/ Other Property',
       'Vehicle/ Alcohol/ Cash', 'Drugs/ Alcohol/ Cash/ Other Property',
       'Vehicle/ Alcohol', 'Vehicle', 'Weapon/ Drugs/ Alcohol',
       'Weapon/ Drugs/ Cash', 'Weapon/ Vehicle', 'Weapon/ Vehicle/ Cash',
       'Weapons/ Drugs/ Other Property', 'Weapon/ Drugs/ Other Property',
       'Vehicle/ Other Property', 'Vehicle/ Drugs/ Cash',
       'Vehicle/ Drugs/ Cash/ Other Property',
       'Weapon/ Vehicle/ Drugs/ Cash', 'Vehicle/ Cash/ Other Property',
       'Drugs/ Alcohol/ Cash', 'Weapon/ Other Property',
       'Alcohol/ Other Property', 'Alcohol/ Cash'], dtype=object)
print("Total number not searched",len(traffic[traffic["Discovery_If_Searched"] == '       ']))
print("Total number searched",len(traffic[traffic["Discovery_If_Searched"] != '       ']))
print("Total number searched, but nothing found",len(traffic[(traffic["Discovery_If_Searched"] != '       ') & (traffic["Discovery_If_Searched"] == 'Nothing')]))
print("Total number searched and discovery made",len(traffic[(traffic["Discovery_If_Searched"] != '       ') & (traffic["Discovery_If_Searched"] != 'Nothing')]))
Total number not searched 18512
Total number searched 2137
Total number searched, but nothing found 1547
Total number searched and discovery made 590
discovery_clean = traffic[traffic["Discovery_If_Searched"] != '       ']
values, counts = np.unique(discovery_clean['Discovery_If_Searched'], return_counts=True)
fig, ax = plt.subplots(figsize=(5,10))
ax.barh(values,counts)
<BarContainer object of 31 artists>

png

counts
array([  95,    1,    1,   20,   10,  216,   17,    1,    1,   12,   10,
       1547,   66,   41,    4,    1,    3,   14,    6,    1,    2,   36,
         19,    5,    2,    1,    1,    1,    1,    1,    1], dtype=int64)
by_discovery = discovery_clean.groupby(by=["Hour","Discovery_If_Searched"],as_index=False).count()
by_discovery = by_discovery.pivot(index="Hour",columns="Discovery_If_Searched", values= "Date")
by_discovery.plot(kind="bar",subplots=True,layout=(16,2),sharex=True, sharey=True,figsize=(10,40))
[discovery_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

discovery_clean_positive = traffic[(traffic["Discovery_If_Searched"] != '       ') & (traffic["Discovery_If_Searched"] != 'Nothing')]
values, counts = np.unique(discovery_clean_positive['Discovery_If_Searched'], return_counts=True)
fig, ax = plt.subplots(figsize=(5,10))
ax.barh(values,counts)
<BarContainer object of 30 artists>

png

by_discovery_positive = discovery_clean_positive.groupby(by=["Hour","Discovery_If_Searched"],as_index=False).count()
by_discovery_positive = by_discovery_positive.pivot(index="Hour",columns="Discovery_If_Searched", values= "Date")
by_discovery_positive.plot(kind="bar",subplots=True,layout=(16,2),sharex=True, sharey=True,figsize=(10,40))
[discovery_clean_positive.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Result of Stop

traffic['Result_of_Stop'].unique()
array(['Citation', 'Warning', 'Arrest', '    ', 'Report'], dtype=object)
values, counts = np.unique(traffic["Result_of_Stop"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 5 artists>

png

by_result = traffic.groupby(by=["Hour","Result_of_Stop"],as_index=False).count()
by_result = by_result.pivot(index="Hour",columns="Result_of_Stop", values= "Date")
by_result.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7.5))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Officer Badge

traffic['Officer_Badge'].unique()
array([158, 225,  62, 124, 152,  71,  50, 112, 337, 334, 132, 169, 390,
        35,  87,  44,  96, 220, 165,  74,  49,  97, 154,  82,  86, 143,
       176,   2,  89, 391, 215, 160,  37,  85, 197, 185,  52,  59,  28,
       222, 305,  40, 212, 211, 110, 442, 153,  64,  22,  84, 376,   0,
       118,   8, 312,  36, 345, 378, 140,  63,  79, 100,  54, 177,   3,
       324, 339, 398,   7,  23, 319, 142, 344,  46, 128,  81, 317, 168,
       175,  21,  27, 201, 307,  67, 226, 164, 358, 146, 127, 351, 353,
       101, 347,  38, 355, 134, 230, 129, 309, 457,  47,  48, 326,  29,
        14,  66,  83, 188, 103,  94,  11, 136,  80,  34, 335, 122, 414,
       408, 181,  69,  75, 308,  12,  43,  57,  45, 108, 125, 145,  51,
        72, 117, 120, 182, 428, 354, 472, 245, 111, 123,  70, 130, 387,
        31,  53, 135,   6, 203, 106,  91, 139, 167,  90,  41,  99, 191,
       121, 196, 202, 131, 200,  76,  92,   4, 228,  16, 186, 126, 115,
       119, 151, 104, 150, 190, 149, 193, 137, 170, 148, 113], dtype=int64)
Sorted_badge_indx = np.argsort(traffic['Officer_Badge'].values)
traffic['Officer_Badge'][Sorted_badge_indx].unique()
array([  0,   2,   3,   4,   6,   7,   8,  11,  12,  14,  16,  21,  22,
        23,  27,  28,  29,  31,  34,  35,  36,  37,  38,  40,  41,  43,
        44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  57,  59,
        62,  63,  64,  66,  67,  69,  70,  71,  72,  74,  75,  76,  79,
        80,  81,  82,  83,  84,  85,  86,  87,  89,  90,  91,  92,  94,
        96,  97,  99, 100, 101, 103, 104, 106, 108, 110, 111, 112, 113,
       115, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
       129, 130, 131, 132, 134, 135, 136, 137, 139, 140, 142, 143, 145,
       146, 148, 149, 150, 151, 152, 153, 154, 158, 160, 164, 165, 167,
       168, 169, 170, 175, 176, 177, 181, 182, 185, 186, 188, 190, 191,
       193, 196, 197, 200, 201, 202, 203, 211, 212, 215, 220, 222, 225,
       226, 228, 230, 245, 305, 307, 308, 309, 312, 317, 319, 324, 326,
       334, 335, 337, 339, 344, 345, 347, 351, 353, 354, 355, 358, 376,
       378, 387, 390, 391, 398, 408, 414, 428, 442, 457, 472], dtype=int64)
# Why is there some officers with a bunch?
values, counts = np.unique(traffic["Officer_Badge"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)
<BarContainer object of 180 artists>

png

#Officers that have made over 100 stops
officer_100 = values[counts > 100]
mask =[]
for row in traffic["Officer_Badge"]:
    if row in officer_100:
        mask.append(True)
    else:
        mask.append(False)
officer_top = traffic[mask]

Questions

Individuals who were searched vs not searched is there a link to demographics? / Is there any demographic subjected to harsher punishment across traffic stop results?

Disclaimer

Due to the nature of the stops and the fact that there can be multiple autorites to arrest and search. The values are hard to correct if they are errored or not. For example there may be situations where there exist multiple columns that logically do not flow together (No search, but incident to arrest, and finding something, and no arrest). Thus no values will be purged as purging will result in data loss that may be not deserving of purging.

However, I will only be going off the accuracy if the table says the search was preformed. I will not edit the table if other rows seem like this answer should be yes.

Also this would be best served as a combined visualization. I do not have it made yet, however, the idea is that I will create a count for each column under the “Search” and make a giant heat map with a diverging colormap from the census data.

traffic.columns
Index(['Date', 'Team_Area', 'Street_Area', 'Reason_for_Stop', 'Race_Ethnicity',
       'Gender', 'Age', 'Search_Performed', 'Search_Authority',
       'Discovery_If_Searched', 'Result_of_Stop', 'Officer_Badge',
       'Traffic_Crash', 'Serial_Number', 'ObjectId', 'Hour'],
      dtype='object')
print(counts_race)
African-American/Black:6728 , Asian-Pacific Islander:319 , Hispanic:1334 , Native American:11 , Unknown:2109 , White:10148

census_race = [census_ALL["White"],census_ALL["Black"],census_ALL["Hispanic"],census_ALL["Asian_Pacific"],census_ALL["Native"]]
census_race_mask = [census_race for i in range(5)]
census_race_mask
[[0.5908361910000001,
  0.230127567,
  0.13474726199999998,
  0.039975298,
  0.004313682],
 [0.5908361910000001,
  0.230127567,
  0.13474726199999998,
  0.039975298,
  0.004313682],
 [0.5908361910000001,
  0.230127567,
  0.13474726199999998,
  0.039975298,
  0.004313682],
 [0.5908361910000001,
  0.230127567,
  0.13474726199999998,
  0.039975298,
  0.004313682],
 [0.5908361910000001,
  0.230127567,
  0.13474726199999998,
  0.039975298,
  0.004313682]]
census_race_mask = [census_race for i in range(5)]
by_result = traffic.groupby(by=["Race_Ethnicity","Result_of_Stop"],as_index=False).count()
by_result = by_result.pivot(index="Race_Ethnicity",columns="Result_of_Stop", values= "Date")
by_result = by_result.drop(index="Unknown")
total_mask=by_result.sum().values
by_result = by_result.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_result = by_result.rename(columns={'    ':"None"})
by_result_comp = (by_result/total_mask)#-census_race_mask
by_result_comp
Result_of_Stop None Arrest Citation Report Warning
Race_Ethnicity
White 0.491525 0.340153 0.561575 0.411765 0.517007
African-American/Black 0.355932 0.570332 0.352537 0.470588 0.379906
Hispanic 0.135593 0.084399 0.068627 0.058824 0.082156
Asian-Pacific Islander NaN 0.005115 0.016911 0.058824 0.019623
Native American 0.016949 NaN 0.000351 NaN 0.001308
census_race_mask = [census_race for i in range(5)]
by_reason = traffic.groupby(by=["Race_Ethnicity","Reason_for_Stop"],as_index=False).count()
by_reason = by_reason.pivot(index="Race_Ethnicity",columns="Reason_for_Stop", values= "Date")
by_reason = by_reason.drop(index="Unknown")
total_mask=by_reason.sum().values
by_reason = by_reason.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_reason_comp = (by_reason/total_mask)#-census_race_mask
by_reason_comp#.plot()
Reason_for_Stop Equipment Violation Investigative Stop Moving Violation Other Registration
Race_Ethnicity
White 0.482554 0.520436 0.567977 0.372263 0.556291
African-American/Black 0.423077 0.408719 0.338536 0.558394 0.369915
Hispanic 0.079699 0.064033 0.072651 0.060219 0.064333
Asian-Pacific Islander 0.013878 0.006812 0.020282 0.005474 0.009461
Native American 0.000793 NaN 0.000555 0.003650 NaN

I don’t think how they were searched as much as the fact they were searched is important

census_race_mask = [census_race for i in range(7)]
by_authority = traffic.groupby(by=["Race_Ethnicity","Search_Authority"],as_index=False).count()
by_authority = by_authority.pivot(index="Race_Ethnicity",columns="Search_Authority", values= "Date")
by_authority = by_authority.drop(index="Unknown")
total_mask=by_authority.sum().values
by_authority = by_authority.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_authority_comp = (by_authority/total_mask)-census_race_mask
by_authority_comp#.plot()
Search_Authority Consent Incident to Arrest Parole/Probation Plain View Terry Cursory Tow Inventory
Race_Ethnicity
White -0.020120 -0.262760 -0.266736 -0.340836 -0.322086 -0.344459 -0.313058
African-American/Black 0.108231 0.356623 0.372827 0.269872 0.419872 0.451032 0.380984
Hispanic -0.062874 -0.049574 -0.066419 0.115253 -0.065997 -0.062283 -0.060673
Asian-Pacific Islander -0.021517 NaN -0.036282 NaN -0.027475 NaN -0.002938
Native American -0.003720 NaN -0.003390 NaN NaN NaN NaN
search = by_authority[["Consent","Incident to Arrest","Parole/Probation","Plain View","Terry Cursory","Tow Inventory"]].sum(axis=1)
no_search = by_authority['      ']
A_search_nosearch = pd.DataFrame([no_search,search])
A_search_nosearch = A_search_nosearch.T.rename(columns={'      ': "Not_Searched","Unnamed 0": "Searched"})

total_mask=A_search_nosearch.sum().values
census_race_mask = [census_race for i in range(2)]
A_search_nosearch_comp = (A_search_nosearch/total_mask)#-census_race_mask
A_search_nosearch
Not_Searched Searched
Race_Ethnicity
White 9616.0 532.0
African-American/Black 5701.0 1027.0
Hispanic 1211.0 123.0
Asian-Pacific Islander 311.0 8.0
Native American 10.0 1.0

I don’t think what was discoverd is as important as if they discovered anything

traffic["Discovery_If_Searched"].unique()
array(['       ', 'Nothing', 'Alcohol', 'Drugs', 'Drugs/ Alcohol', 'Cash',
       'Other Property', 'Drugs/ Cash', 'Weapon/ Drugs', 'Weapon',
       'Vehicle/ Drugs', 'Drugs/ Other Property', 'Cash/ Other Property',
       'Vehicle/ Alcohol/ Cash', 'Drugs/ Alcohol/ Cash/ Other Property',
       'Vehicle/ Alcohol', 'Vehicle', 'Weapon/ Drugs/ Alcohol',
       'Weapon/ Drugs/ Cash', 'Weapon/ Vehicle', 'Weapon/ Vehicle/ Cash',
       'Weapons/ Drugs/ Other Property', 'Weapon/ Drugs/ Other Property',
       'Vehicle/ Other Property', 'Vehicle/ Drugs/ Cash',
       'Vehicle/ Drugs/ Cash/ Other Property',
       'Weapon/ Vehicle/ Drugs/ Cash', 'Vehicle/ Cash/ Other Property',
       'Drugs/ Alcohol/ Cash', 'Weapon/ Other Property',
       'Alcohol/ Other Property', 'Alcohol/ Cash'], dtype=object)
census_race_mask = [census_race for i in range(32)]
by_discovery = traffic.groupby(by=["Race_Ethnicity",'Discovery_If_Searched'],as_index=False).count()
by_discovery = by_discovery.pivot(index="Race_Ethnicity",columns='Discovery_If_Searched', values= "Date")
by_discovery = by_discovery.drop(index="Unknown")
total_mask=by_discovery.sum().values
by_discovery = by_discovery.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_discovery_comp = (by_discovery/total_mask)-census_race_mask
by_discovery_comp
Discovery_If_Searched Alcohol Alcohol/ Cash Alcohol/ Other Property Cash Cash/ Other Property Drugs Drugs/ Alcohol Drugs/ Alcohol/ Cash Drugs/ Alcohol/ Cash/ Other Property ... Weapon Weapon/ Drugs Weapon/ Drugs/ Alcohol Weapon/ Drugs/ Cash Weapon/ Drugs/ Other Property Weapon/ Other Property Weapon/ Vehicle Weapon/ Vehicle/ Cash Weapon/ Vehicle/ Drugs/ Cash Weapons/ Drugs/ Other Property
Race_Ethnicity
White -0.025718 -0.268997 0.409164 NaN -0.473189 -0.162265 -0.277705 -0.296719 NaN NaN ... -0.215836 -0.355542 -0.390836 0.409164 0.409164 0.409164 NaN 0.409164 NaN NaN
African-American/Black 0.113823 0.379068 NaN 0.769872 0.652225 0.341301 0.386034 0.358108 0.769872 0.769872 ... 0.238622 0.475755 0.369872 NaN NaN NaN 0.769872 NaN NaN 0.769872
Hispanic -0.062748 -0.077276 NaN NaN NaN NaN -0.064040 -0.017100 NaN NaN ... 0.021503 -0.075924 0.065253 NaN NaN NaN NaN NaN NaN NaN
Asian-Pacific Islander -0.021645 -0.028481 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 0.960025 NaN
Native American -0.003713 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 32 columns

search = by_authority[["Consent","Incident to Arrest","Parole/Probation","Plain View","Terry Cursory","Tow Inventory"]].sum(axis=1)
discovery = by_discovery.drop(columns=['       ','Nothing']).sum(axis=1)
no_discovery = search - discovery
discovery_nodiscovery = pd.DataFrame([no_discovery,discovery])
discovery_nodiscovery = discovery_nodiscovery.T.rename(columns={0:"No_Discovery", 1:"Discovery"})

total_mask=discovery_nodiscovery.sum().values
census_race_mask = [census_race for i in range(2)]
discovery_nodiscovery_comp = (discovery_nodiscovery/total_mask)#-census_race_mask
discovery_nodiscovery
No_Discovery Discovery
Race_Ethnicity
White 366.0 166.0
African-American/Black 703.0 324.0
Hispanic 87.0 36.0
Asian-Pacific Islander 5.0 3.0
Native American 1.0 0.0
by_result_comp.T
Race_Ethnicity White African-American/Black Hispanic Asian-Pacific Islander Native American
Result_of_Stop
None 0.491525 0.355932 0.135593 NaN 0.016949
Arrest 0.340153 0.570332 0.084399 0.005115 NaN
Citation 0.561575 0.352537 0.068627 0.016911 0.000351
Report 0.411765 0.470588 0.058824 0.058824 NaN
Warning 0.517007 0.379906 0.082156 0.019623 0.001308
by_reason_comp.T
Race_Ethnicity White African-American/Black Hispanic Asian-Pacific Islander Native American
Reason_for_Stop
Equipment Violation 0.482554 0.423077 0.079699 0.013878 0.000793
Investigative Stop 0.520436 0.408719 0.064033 0.006812 NaN
Moving Violation 0.567977 0.338536 0.072651 0.020282 0.000555
Other 0.372263 0.558394 0.060219 0.005474 0.003650
Registration 0.556291 0.369915 0.064333 0.009461 NaN
binary_comp = pd.concat([A_search_nosearch_comp, discovery_nodiscovery_comp], axis=1)
binary_comp
Not_Searched Searched No_Discovery Discovery
Race_Ethnicity
White 0.570716 0.314607 0.314974 0.313800
African-American/Black 0.338358 0.607333 0.604991 0.612476
Hispanic 0.071874 0.072738 0.074871 0.068053
Asian-Pacific Islander 0.018458 0.004731 0.004303 0.005671
Native American 0.000594 0.000591 0.000861 0.000000
# split into 4 different arrays for spacing
race = ["White", "African-American/Black","Hispanic","Asian-Pacific Islander","Native American"]

results_name = by_result_comp.T.index.values
results_values = by_result_comp.T.values

reasons_name = by_reason_comp.T.index.values
reasons_values = by_reason_comp.T.values

binary_name = binary_comp.T.index.values
binary_value = binary_comp.T.values

#Got this code from https://matplotlib.org/3.1.0/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py
def heatmap(data, row_labels, col_labels, ax=None,colorbar=True, title= " ",
            cbar_kw={}, cbarlabel="", **kwargs):
    """
    Create a heatmap from a numpy array and two lists of labels.

    Parameters
    ----------
    data
        A 2D numpy array of shape (N, M).
    row_labels
        A list or array of length N with the labels for the rows.
    col_labels
        A list or array of length M with the labels for the columns.
    ax
        A `matplotlib.axes.Axes` instance to which the heatmap is plotted.  If
        not provided, use current axes or create a new one.  Optional.
    cbar_kw
        A dictionary with arguments to `matplotlib.Figure.colorbar`.  Optional.
    cbarlabel
        The label for the colorbar.  Optional.
    **kwargs
        All other arguments are forwarded to `imshow`.
    """

    if not ax:
        ax = plt.gca()

    # Plot the heatmap
    im = ax.imshow(data, **kwargs)
    
    
    # Create colorbar
    if(colorbar == True):
        cbar = ax.figure.colorbar(im, ax=ax, **cbar_kw)
        cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")
    ax.set_xlabel(title,labelpad =10, fontsize=25)
    # We want to show all ticks...
    ax.set_xticks(np.arange(data.shape[1]))
    ax.set_yticks(np.arange(data.shape[0]))
    # ... and label them with the respective list entries.
    ax.set_xticklabels(col_labels, fontsize=22)
    ax.set_yticklabels(row_labels, fontsize=22)

    # Let the horizontal axes labeling appear on top.
    ax.tick_params(top=True, bottom=False,
                   labeltop=True, labelbottom=False)

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=-30, ha="right",
             rotation_mode="anchor")

    # Turn spines off and create white grid.
    for edge, spine in ax.spines.items():
        spine.set_visible(False)

    ax.set_xticks(np.arange(data.shape[1]+1)-.5, minor=True)
    ax.set_yticks(np.arange(data.shape[0]+1)-.5, minor=True)
    ax.grid(which="minor", color="w", linestyle='-', linewidth=3)
    ax.tick_params(which="minor", bottom=False, left=False)
    
    norm = matplotlib.colors.Normalize(vmin=0,vmax=0.61)
    norm(0.)
    
    if(colorbar == True):
        return im, cbar
    if(colorbar == False):
        return im


def annotate_heatmap(im, data=None, valfmt="{x:.2f}",
                     textcolors=["black", "white"],
                     threshold=None, **textkw):
    """
    A function to annotate a heatmap.

    Parameters
    ----------
    im
        The AxesImage to be labeled.
    data
        Data used to annotate.  If None, the image's data is used.  Optional.
    valfmt
        The format of the annotations inside the heatmap.  This should either
        use the string format method, e.g. "$ {x:.2f}", or be a
        `matplotlib.ticker.Formatter`.  Optional.
    textcolors
        A list or array of two color specifications.  The first is used for
        values below a threshold, the second for those above.  Optional.
    threshold
        Value in data units according to which the colors from textcolors are
        applied.  If None (the default) uses the middle of the colormap as
        separation.  Optional.
    **kwargs
        All other arguments are forwarded to each call to `text` used to create
        the text labels.
    """

    if not isinstance(data, (list, np.ndarray)):
        data = im.get_array()

    # Normalize the threshold to the images color range.
    if threshold is not None:
        threshold = im.norm(threshold)
    else:
        threshold = im.norm(data.max())/2.

    # Set default alignment to center, but allow it to be
    # overwritten by textkw.
    kw = dict(horizontalalignment="center",
              verticalalignment="center")
    kw.update(textkw)

    # Get the formatter in case a string is supplied
    if isinstance(valfmt, str):
        valfmt = matplotlib.ticker.StrMethodFormatter(valfmt)

    # Loop over the data and create a `Text` for each "pixel".
    # Change the text's color depending on the data.
    texts = []
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            kw.update(color=textcolors[int(im.norm(data[i, j]) > threshold)])
            text = im.axes.text(j, i, valfmt(data[i, j], None), **kw)
            texts.append(text)

    return texts

fig, ((ax1,ax2,ax3)) = plt.subplots(1, 3, figsize=(22, 16))

im = heatmap(results_values, results_name, race,  ax=ax1,colorbar=False, title= "Result of Stop",
                   cmap='Greens', cbarlabel="Percentage per individual weapon")
annotate_heatmap(im,fontsize=17)

im = heatmap(reasons_values, reasons_name, race,  ax=ax2, colorbar=False, title= "Reason for Stop",
                   cmap='Greens', cbarlabel="Percentage per individual weapon")
annotate_heatmap(im,fontsize=17)

im = heatmap(binary_value, binary_name, race,  ax=ax3, colorbar=False, title= "Search Results",
                   cmap='Greens', cbarlabel="Percentage per individual weapon")
annotate_heatmap(im,fontsize=17)


fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.08, .2, 0.9, 0.04])
cbar = fig.colorbar(im, cax=cbar_ax,orientation="horizontal")
cbar.set_label("Percent of Traffic Stops", rotation=0,size=25, labelpad = 50)
cbar.ax.tick_params(labelsize=18)

fig.tight_layout()
plt.savefig("./outcomes.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)
plt.show()
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\colors.py:933: UserWarning: Warning: converting a masked element to nan.
  dtype = np.min_scalar_type(value)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\ma\core.py:713: UserWarning: Warning: converting a masked element to nan.
  data = np.array(a, copy=False, subok=subok)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:150: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.

png

Is there a trend by Age of which certain demographics are pulled over?

traffic["Age"][Age_corrected_filter].hist(bins=80,width=1)
<matplotlib.axes._subplots.AxesSubplot at 0x29074ee3ec8>

png

traffic["Race_Ethnicity"].unique()
array(['White', 'Unknown', 'African-American/Black',
       'Asian-Pacific Islander', 'Hispanic', 'Native American'],
      dtype=object)
races = ['Native American','African-American/Black','Asian-Pacific Islander','Hispanic','White']

fig, ax = plt.subplots()

cmap = matplotlib.cm.get_cmap('Greens')
#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]
colors = [cmap(2/7),cmap(3/7),cmap(4/7),cmap(5/7),cmap(6/7)]


# Iterate through the races
i=0
for race in races:
    # Subset to the race
    
    subset = traffic[traffic['Race_Ethnicity'] == race][Age_corrected_filter]
    
    
    # Draw the density plot
    sns.distplot(subset['Age'].dropna(), hist = False, kde = True, color = colors[i],
                 label = race,kde_kws={'linewidth':5})
    i = i +1

sns.distplot(traffic['Age'][Age_corrected_filter].dropna(), hist = False, kde = True, color = "black",
                 label = "All Races Together",kde_kws={'linewidth':3,'linestyle':'--','alpha':.5})

# Plot formatting
fig.set_size_inches(22.5, 14.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.yticks([0.01,0.04], fontsize =35)
plt.xticks(np.arange(0,100,5),fontsize=35)
legend = plt.legend(title = 'Race',prop={'size': 35})
legend.get_title().set_fontsize('30')
plt.title('Age distribution of Traffic Stops by race \n (All races same sample size)', pad = 50,fontdict = 
          {'fontsize': 45,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "center"})
plt.xlabel('Age (years)', fontsize = 40, labelpad= 30)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Density of Traffic Stops", rotation=90,fontsize = 40,labelpad=30)
plt.savefig("./Age_by_race.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app

png


Is there any officer that is pulling over a disproportionate percentage of individuals from a given demographic?

percents = [census_ALL["Black"],census_ALL["White"],census_ALL["Asian_Pacific"],census_ALL["Hispanic"],census_ALL["Native"]]
percents = np.array(percents)*100
by_officer = officer_top.groupby(by=["Race_Ethnicity","Officer_Badge"],as_index=False).count()
by_officer = by_officer.pivot(index="Race_Ethnicity",columns="Officer_Badge", values= "Date")
by_officer = by_officer.drop("Unknown", axis=0)
by_officer.reset_index()
for i in by_officer.columns:
    sum_c = by_officer[i].sum()
    for j in range(len(by_officer[i])):
        by_officer[i].iloc[j] = (by_officer[i].iloc[j]/sum_c)*100
officer_black = by_officer.T["African-American/Black"]
officer_asain = by_officer.T["Asian-Pacific Islander"]
officer_hispanic = by_officer.T["Hispanic"]
officer_native = by_officer.T["Native American"]
officer_white = by_officer.T["White"]
print("Number of officers surveryed {}".format(len(by_officer.T)))
Number of officers surveryed 43
fig, (axes1,axes2) = plt.subplots(2,1,figsize=(13,17))

axes1.violinplot(dataset = [officer_black.dropna().values,
                           officer_white.dropna().values] )

axes1.set_title("Percentage of Race Stopped by Individual Officers",fontsize = 30)
axes1.spines['top'].set_visible(False)
axes1.spines['right'].set_visible(False)
axes1.spines['left'].set_visible(False)
axes1.spines['bottom'].set_visible(False)
axes1.tick_params(axis='both', which='major', labelsize=25)
axes1.yaxis.set_major_formatter(mtick.PercentFormatter())
#axes1.set_ylabel('Percentage of Population Stopped',labelpad =10,fontsize=25)
axes1.set_xticks(np.arange(1,3,1))
axes1.set_xticklabels(["African America/Black","White"],rotation=45)
axes1.scatter([1,2],percents[0:2],marker="D",s=100,color="Black",label = "Population Percentage \n from Census",alpha=0.5)
boxprops = dict(linestyle=' ', linewidth=0.1)
capprops=dict(linestyle=' ', linewidth=0.1)
medianprops=dict(linestyle='-', linewidth=2)
bp = by_officer.T[["African-American/Black","White"]].boxplot(figsize=(15,10),boxprops=boxprops,capprops=capprops,medianprops=medianprops,ax=axes1)
#axes1.legend(loc="upper left",fontsize=17)
axes1.grid(True,axis='y')
axes1.grid(False,axis='x')


axes2.violinplot(dataset = [officer_asain.dropna().values,
                           officer_hispanic.dropna().values,
                           officer_native.dropna().values] )

#axes2.set_title("Percentage of Race Stopped by Individual Officers",fontsize = 30)
axes2.spines['top'].set_visible(False)
axes2.spines['right'].set_visible(False)
axes2.spines['left'].set_visible(False)
axes2.spines['bottom'].set_visible(False)
axes2.tick_params(axis='both', which='major', labelsize=25)
axes2.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.0f%%'))
#axes2.set_ylabel('Percentage of Population Stopped by individual officer (N = 43)',labelpad =10, fontsize=25)
fig.text(0, 0.5, "Percentage of Race Stopped in Individual Officer's Total Traffic Stops (N = 43)", va='center', rotation='vertical', fontsize=25)
axes2.set_xticks(np.arange(1,4,1))
axes2.set_xticklabels(["Asian-Pacific Islander","Hispanic","Native American"],rotation=45)
axes2.scatter([1,2,3],percents[2::],marker="D",s=100,color="Black",label = "Population Percentage \n from Census",alpha=0.5)
boxprops = dict(linestyle=' ', linewidth=0.1)
capprops=dict(linestyle=' ', linewidth=0.1)
medianprops=dict(linestyle='-', linewidth=2)
by_officer.T[["Asian-Pacific Islander","Hispanic","Native American"]].boxplot(figsize=(15,10),boxprops=boxprops,capprops=capprops,medianprops=medianprops,ax=axes2)
#xes2.grid(True,axis='y')
axes2.legend(loc="upper left",fontsize=17)
axes2.grid(False,axis='x')
plt.savefig("./Officer_Traffic_Race.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)
plt.show()

png

Are certain demographics more targeted per time of day of the traffic stop?

The last plot is the one I ended up using. This plot can be represented in many different ways, but I feel comparing it to the census gives the most imformation and observations of pattern

traffic_corrected = traffic[traffic['Race_Ethnicity'] != "Unknown"]
race_percentage
{'African-American/Black': 0.362891046386192,
 'Asian-Pacific Islander': 0.01720604099244876,
 'Hispanic': 0.07195253505933118,
 'Native American': 0.0005933117583603021,
 'White': 0.11375404530744336}
max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = abs(by_race["White"].values/max_hour)#-census_ALL["White"])
black = abs(by_race["African-American/Black"].values/max_hour)#-census_ALL["Black"])
asian = abs(by_race["Asian-Pacific Islander"].values/max_hour)#-census_ALL["Asian_Pacific"])
hispanic = abs(by_race["Hispanic"].values/max_hour)#-census_ALL["Hispanic"])
native = abs(by_race["Native American"].values/max_hour)#-census_ALL["Native"])

white_m = abs(white - race_percentage['White'])
black_m = abs(black - race_percentage['African-American/Black'])
asian_m = abs(asian - race_percentage['Asian-Pacific Islander'])
hispanic_m = abs(hispanic - race_percentage['Hispanic'])
native_m = abs(native - race_percentage['Native American'])


native_estimate_m=pd.DataFrame(native_m).fillna(method="bfill").values
asian_estimate_m=pd.DataFrame(asian_m).fillna(method="bfill").values
by_race["White"].values/max_hour
array([0.47021277, 0.4054878 , 0.46610169, 0.38461538, 0.41836735,
       0.31707317, 0.58706468, 0.61942959, 0.63086172, 0.60101243,
       0.56942102, 0.56384505, 0.5725938 , 0.55119215, 0.48765432,
       0.46764706, 0.48816029, 0.50173611, 0.50630631, 0.54291845,
       0.42950108, 0.44904815, 0.462     , 0.47295597])
races = [black_m,native_m,asian_m,hispanic_m,white_m]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White","Unkown"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

#plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate_m*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate_m*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 3,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(18.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=30)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
plt.legend(bbox_to_anchor=(0.5, -0.05),prop={'size': 16}, title = 'Race')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 35,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "right"})
plt.xlabel('Hours (24hr Time)', fontsize = 30,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Mean Percentage", rotation=90,fontsize = 30,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour_abs.png", dpi=400,bbox_inches="tight", pad_inches=0)

png

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = abs(by_race["White"].values/max_hour)#-census_ALL["White"])
black = abs(by_race["African-American/Black"].values/max_hour)#-census_ALL["Black"])
asian = abs(by_race["Asian-Pacific Islander"].values/max_hour)#-census_ALL["Asian_Pacific"])
hispanic = abs(by_race["Hispanic"].values/max_hour)#-census_ALL["Hispanic"])
native = abs(by_race["Native American"].values/max_hour)#-census_ALL["Native"])

white_m = white - race_percentage['White']
black_m = black - race_percentage['African-American/Black']
asian_m = asian - race_percentage['Asian-Pacific Islander']
hispanic_m = hispanic - race_percentage['Hispanic']
native_m = native - race_percentage['Native American']


native_estimate_m=pd.DataFrame(native_m.copy()).fillna(method="bfill").values.flatten()
asian_estimate_m=pd.DataFrame(asian_m.copy()).fillna(method="bfill").values.flatten()
races = [black_m,native_m,asian_m,hispanic_m,white_m]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

#plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate_m*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate_m*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 3,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(18.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=20)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
plt.legend(bbox_to_anchor=(1.00, 1.0),prop={'size': 16}, title = 'Race')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 30,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "right"})
plt.xlabel('Hours (24hr Time)', fontsize = 25,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Mean Percentage", rotation=90,fontsize = 25,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour.png", dpi=400,bbox_inches="tight", pad_inches=0)

png

asian = by_race["Asian-Pacific Islander"]/max_hour-census_ALL["Asian_Pacific"]
native = by_race["Native American"]/max_hour-census_ALL["Native"]
native_estimate=abs(native.fillna(method="bfill").values)
asian_estimate=abs(asian.fillna(method="bfill").values)

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = abs(by_race["White"].values/max_hour-census_ALL["White"])
black = abs(by_race["African-American/Black"].values/max_hour-census_ALL["Black"])
asian = abs(by_race["Asian-Pacific Islander"].values/max_hour-census_ALL["Asian_Pacific"])
hispanic = abs(by_race["Hispanic"].values/max_hour-census_ALL["Hispanic"])
native = abs(by_race["Native American"].values/max_hour-census_ALL["Native"])

#white_m = abs(white - white.mean())
#black_m = abs(black - black.mean())
#asain_m = abs(asain - np.nanmean(asain))
#hispanic_m = abs(hispanic - hispanic.mean())
#native_m = abs(native - np.nanmean(native))


#native_estimate_m = abs(native_estimate - np.nanmean(native))
#asian_estimate_m = abs(asain_estimate - np.nanmean(asain))
races = [black,native,asian,hispanic,white]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White","Unkown"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

#plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 3,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(18.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=20)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
plt.legend(bbox_to_anchor=(1.00, 1.0),prop={'size': 16}, title = 'Race')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 30,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "right"})
plt.xlabel('Hours (24hr Time)', fontsize = 25,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Population Percentage", rotation=90,fontsize = 25,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour_census_abs.png", dpi=400,bbox_inches="tight", pad_inches=0)

png

asian = by_race["Asian-Pacific Islander"]/max_hour-census_ALL["Asian_Pacific"]
native = by_race["Native American"]/max_hour-census_ALL["Native"]
native_estimate=native.fillna(method="bfill").values
asian_estimate=asian.fillna(method="bfill").values

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = by_race["White"].values/max_hour-census_ALL["White"]
black = by_race["African-American/Black"].values/max_hour-census_ALL["Black"]
asian = by_race["Asian-Pacific Islander"].values/max_hour-census_ALL["Asian_Pacific"]
hispanic = by_race["Hispanic"].values/max_hour-census_ALL["Hispanic"]
native = by_race["Native American"].values/max_hour-census_ALL["Native"]

#white_m = abs(white - white.mean())
#black_m = abs(black - black.mean())
#asain_m = abs(asain - np.nanmean(asain))
#hispanic_m = abs(hispanic - hispanic.mean())
#native_m = abs(native - np.nanmean(native))


#native_estimate_m = abs(native_estimate - np.nanmean(native))
#asian_estimate_m = abs(asain_estimate - np.nanmean(asain))
races = [black,native,asian,hispanic,white]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White","Unkown"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 6,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(24.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=25)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
legend = plt.legend(loc='upper center',bbox_to_anchor=(0.5, -0.25),prop={'size': 25}, title = 'Race',ncol=3)
legend.get_title().set_fontsize('25')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 40,
            'fontweight' : 3,
             'verticalalignment': 'top'})
plt.xlabel('Hours (24hr Time)', fontsize = 35,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Population Percentage", rotation=90,fontsize = 35,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour_census.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)

png