<p style="text-align: right;"> Brandon McIntyre </p>

<p style="text-align: right;"> Analysis of Lansing Traffic Stops

CMSE 402 SS20 </p>

Final Writeup

Data Used

Traffic Stop Data of Lansing Police Department (02/01/2016 - 02/13/2019): http://data-lansing.opendata.arcgis.com/datasets/32b48df10c674a7aace77071046ba272_0?selectedAttribute=Date

Census Data/ ACS Data: https://data.census.gov/cedsci/table?q=Lansing%20MI&g=1600000US2646000&hidePreview=false&tid=ACSDP5Y2018.DP05&vintage=2010&layer=VT_2018_160_00_PY_D1&cid=DP05_0001E

Note

Looking at Lansing Police Department MATS Data Analysis for Year 17 (https://lansingmi.gov/DocumentCenter/View/6979/MATS-Months193-204-May-2018) it is mentioned that using Census Data is not 100% accurate as driver demographics do not necessarily match census demographics. This makes using Census data troublesome. This site (https://nij.ojp.gov/topics/articles/racial-profiling-and-traffic-stops) goes into more detail on the subject. Thus it appears that to get proper comparsion, one must look at the demographics of all licensed holding drivers. This information is only avalible on a state level and only on Sex. https://www.fhwa.dot.gov/policyinformation/statistics/2016/dl1c.cfm . (Atleast offically recognized datasets that I could find). So with this, I will continue to the use ACS data. However, it may not produce 100% accurate results. I also tried looking for more MATS traffic stop data from lansing department, however I was unable to.
(https://data.census.gov/cedsci/table?q=Lansing%20MI&g=1600000US2646000&hidePreview=false&tid=ACSDP5Y2018.DP05&vintage=2010&layer=VT_2018_160_00_PY_D1&cid=DP05_0001E)

Upon more research I have also found since the writeup a traffic stop analysis for Year 18 (https://lansingmi.gov/ArchiveCenter/ViewFile/Item/655)
and Year 16 https://www.lansingmi.gov/Archive/ViewFile/Item/603

Interesting News since April 1st

Here are some interesting articles just posted on 4/1/20 about this particular issue in the East lansing police department. (They all say the same excat thing, some with visualizations (That are not particularly good)) (https://www.lansingcitypulse.com/stories/mayor-east-lansing-police-over-stop-black-drivers-which-is-not-acceptable,14053) (https://www.lansingstatejournal.com/story/news/2020/04/01/east-lansing-wants-changes-after-study-shows-race-bias-police-stops/5102549002/) (https://eastlansinginfo.org/content/newly-released-data-show-racial-bias-east-lansing-policing) (https://statenews.com/article/2020/04/elpd-race-data-report-shows-african-americans-over-stopped-in-east-lansing?ct=content_open&cv=cbox_featured) (https://www.wilx.com/content/news/City-of-East-Lansing-releases-race-data-for-ELPD-officer-initiated-contacts-569281531.html)

The articles state that in the past 2 months there have been an “over-stop of black drivers”. This came from “The East Lansing Police Department began gathering race data on officer-initiated contacts, like traffic stops, in February after a black 19-year-old accused an officer of using excessive force during an arrest. The study was requested by the City Council. “ The study states “Though black residents comprise 8% of East Lansing’s population, they accounted for 22% of police officer-initiated contact in February and March, according to a department study”.

Now this is particualry interesting based upon the fact that the EL study is based upon census data. The Lansing Police department made it specific that Census data is not conclusive based upon the difference between census and actual licensed drivers. The city pulse article actaully references the Lansing Data that I am looking at now and say that similar trends are in the Lansing Traffic stop data as well.

This information is of particular interest because honeslty I don’t know which side to belive. I feel like the issue in this situation stems from the lack of information when it comes to demographics on lisenced drivers. Not having this causes Lansing police to say that there is no racial biasness, yet EL police department to say that there is a bias and they want to correct it. Who is in the wrong and who is in the right?

I have tried to find the data from EL that the study is on, but all I can seem to find are there weekly reports that do not include the race (https://www.cityofeastlansing.com/Archive.aspx?AMID=47). Further more, another point of interest on this topic, the annual report that EL police deparment puts out does not include race in any manner (https://www.cityofeastlansing.com/DocumentCenter/View/910/ELPD-Annual-Report-2019-PDF?bidId=). Leading me to think that EL is not as experinced when it comes to the lansing department when it comes to answering the question of racial biasing.

Links

Variable: Exploration and Plots

Date: Year,Month,Day,Hour,Min
Reason for Stop
Race/Ethnicity
Gender
Analysis of Age
Search Performed
Search Authority
Discovery If_Searched
Result of Stop
Officer Badge

Questions

Individuals who were searched vs not searched is there a link to demographics?
Is there any demographic subjected to harsher punishment across traffic stop results by Reason for Stop, Reason for Search, Search Preformed, Search Discovery and Search Authority, and Result?
Is there a trend by Age of which certain demographics are pulled over?
Is there any officer that is pulling over a disproportionate percentage of individuals from a given demographic?
Are certain demographics more targeted per time of day of the traffic stop?

Libraries

import pandas as pd
import numpy as np
import pytz
import seaborn as sns
import matplotlib
import matplotlib.ticker as mtick
import matplotlib.pyplot as plt

Load in Data

Traffic Stop Data

Data came not sorted. Data was sorted by time. Time was also off by 5 hours as it was given in GMT. The data was corrected to be moved to EST

traffic = pd.read_csv("./Traffic_Stops.csv", parse_dates=["Date"])
traffic.sort_values("Date", axis=0, ascending=True, inplace=True)
traffic.reset_index(inplace=True,drop=True)
traffic["Date"] = traffic["Date"].dt.tz_convert(pytz.timezone('US/Eastern'))
traffic

	Date	Team_Area	Street_Area	Reason_for_Stop	Race_Ethnicity	Gender	Age	Search_Performed	Search_Authority	Discovery_If_Searched	Result_of_Stop	Officer_Badge	Traffic_Crash	Serial_Number	ObjectId
0	2016-02-01 00:03:00-05:00	4	Jolly	Equipment Violation	White	Male	23	None			Citation	158	No	48071	1
1	2016-02-01 00:04:00-05:00	1	Oakland	Equipment Violation	Unknown	Male	42	None			Citation	225	No	48072	3
2	2016-02-01 00:12:00-05:00	3	Other	Equipment Violation	African-American/Black	Male	31	None			Citation	62	No	48073	4
3	2016-02-01 00:37:00-05:00	4	Oakland	Moving Violation	African-American/Black	Female	21	None			Citation	158	No	48074	7
4	2016-02-01 00:40:00-05:00	4	Cedar	Moving Violation	White	Female	33	None			Warning	124	No	48075	8
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
20644	2019-02-12 14:00:00-05:00	1	M.L. King Jr.	Other	African-American/Black	Male	42	Driver	Incident to Arrest	Nothing	Arrest	89	No	73879	20645
20645	2019-02-12 14:06:00-05:00	1	Saginaw	Moving Violation	White	Male	33	None			Warning	202	No	73875	20646
20646	2019-02-12 14:39:00-05:00	1	Other	Registration	Unknown	Male	28	None			Citation	202	No	73877	20647
20647	2019-02-12 16:02:00-05:00	2	Michigan	Registration	White	Female	26	None			Citation	245	No	73881	20648
20648	2019-02-12 19:36:00-05:00	2	Michigan	Equipment Violation	African-American/Black	Female	23	None		Nothing	Warning	149	No	73883	20649

20649 rows × 15 columns

Add an “Hour” Column for later use

traffic["Hour"] = traffic["Date"].dt.hour

traffic.columns

Index(['Date', 'Team_Area', 'Street_Area', 'Reason_for_Stop', 'Race_Ethnicity',
       'Gender', 'Age', 'Search_Performed', 'Search_Authority',
       'Discovery_If_Searched', 'Result_of_Stop', 'Officer_Badge',
       'Traffic_Crash', 'Serial_Number', 'ObjectId', 'Hour'],
      dtype='object')

Subsets of MATS report’s Data

year16 = traffic[(traffic["Date"] >= '2016-02-12 00:00:00-05:00') & (traffic["Date"] <=
'2017-02-11 23:59:59-05:00')]
year17 = traffic[(traffic["Date"] >= '2017-02-12 00:00:00-05:00') & (traffic["Date"] <=
'2018-02-11 23:59:59-05:00')]
#Not sure why they skipped 02-12-2018?
year18 = traffic[(traffic["Date"] >= '2018-02-13 00:00:00-05:00') & (traffic["Date"] <=
'2019-02-12 23:59:59-05:00')]

Census Data

Another issue with the census Data I have found is th the races are not clearly seperated. I cannot make 100% devided between the values of the traffic stop races and the races reported on the census. This is due to the fact that hispanic and latino is a seperate section in the census. There is a section called “Two or more races” that makes this hard to calculate. I will therefore only add together Under the Latino Section the single races plus latino and leave out the double races. This will hopefully evenly distribute the 6% that is more than one race.

The excel is included (census_calc.xlsx) that has all the numbers and calculations I used to get the percentages of each race and gender. What I am loading in is the csv that uses the same values from the calcualtion excel file, but minus all of the formulas for easy load in to pandas

census = pd.read_csv("./ACS_data/census.csv")
census_2016 = census.iloc[0]
census_2017 = census.iloc[1]
census_2018 = census.iloc[2]
census_ALL = census.iloc[3]
census

	Year	Male	Female	White	Black	Native	Asian_Pacific	Hispanic
0	2016	0.479	0.521	0.593992	0.228385	0.004449	0.041471	0.131703
1	2017	0.475	0.525	0.562655	0.234408	0.003599	0.063806	0.135532
2	2018	0.462	0.538	0.576798	0.234277	0.003184	0.037995	0.147745
3	ALL	0.480	0.520	0.590836	0.230128	0.004314	0.039975	0.134747

Visualization Exploration of Data

Analysis of Traffic Stops Overtime

traffic.groupby("Hour").count()["Date"].plot(kind="bar")

<matplotlib.axes._subplots.AxesSubplot at 0x2907411c9c8>

png

Now looking at the MATS report for year 18 (February 13, 2018, through February 12, 2019).

year18.groupby(traffic["Date"].dt.hour).count()["Date"].plot(kind="bar")

<matplotlib.axes._subplots.AxesSubplot at 0x2907407e248>

png

MATS report plot for comparison from Year 18

Reason for Stop

traffic["Reason_for_Stop"].unique()

array(['Equipment Violation', 'Moving Violation', 'Other', 'Registration',
       'Investigative Stop'], dtype=object)

values, counts = np.unique(traffic["Reason_for_Stop"], return_counts=True)

fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 5 artists>

png

by_reason = traffic.groupby(by=["Hour","Reason_for_Stop"],as_index=False).count()
by_reason = by_reason.pivot(index="Hour",columns="Reason_for_Stop", values= "Date")
by_reason.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()
#plt.title("Reason for Traffic Stop by Hour")

png

Race/Ethnicity

traffic["Race_Ethnicity"].unique()

array(['White', 'Unknown', 'African-American/Black',
       'Asian-Pacific Islander', 'Hispanic', 'Native American'],
      dtype=object)

values, counts = np.unique(traffic["Race_Ethnicity"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 6 artists>

png

counts_race = "{}:{} , {}:{} , {}:{} , {}:{} , {}:{} , {}:{}".format(values[0],counts[0],values[1],counts[1],values[2],counts[2],values[3],counts[3],values[4],counts[4],values[5],counts[5])
print(counts_race)

African-American/Black:6728 , Asian-Pacific Islander:319 , Hispanic:1334 , Native American:11 , Unknown:2109 , White:10148

races_total = {}
j=0
for i in values:
    races_total[i] = counts[j]
    j = j+1
races_total

{'African-American/Black': 6728,
 'Asian-Pacific Islander': 319,
 'Hispanic': 1334,
 'Native American': 11,
 'Unknown': 2109,
 'White': 10148}

race_percentage = {}
j=0
for i in values:
    if(i == "Unknown"):
        continue
    race_percentage[i] = counts[j]/len(traffic[traffic["Race_Ethnicity"] != "Unknown"])
    j = j+1
race_percentage

{'African-American/Black': 0.362891046386192,
 'Asian-Pacific Islander': 0.01720604099244876,
 'Hispanic': 0.07195253505933118,
 'Native American': 0.0005933117583603021,
 'White': 0.11375404530744336}

by_race = traffic.groupby(by=["Hour","Race_Ethnicity"],as_index=False).count()
by_race = by_race.pivot(index="Hour",columns="Race_Ethnicity", values= "Date")
by_race.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()
#plt.title("Race by Traffic Stop by Hour")

png

Gender

traffic["Gender"].unique()

array(['Male', 'Female'], dtype=object)

values, counts = np.unique(traffic["Gender"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 2 artists>

png

by_gender = traffic.groupby(by=["Hour","Gender"],as_index=False).count()
by_gender = by_gender.pivot(index="Hour",columns="Gender", values= "Date")
by_gender.plot(kind="bar",subplots=True,layout=(1,2),sharex=True, sharey=True,figsize=(10,3))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Analysis of Age

Sorted_age_indx = np.argsort(traffic["Age"].values)

traffic["Age"][Sorted_age_indx].unique()

array([-7167, -7154, -6181,  -977,  -955,     0,     1,     2,    11,
          12,    13,    14,    15,    16,    17,    18,    19,    20,
          21,    22,    23,    24,    25,    26,    27,    28,    29,
          30,    31,    32,    33,    34,    35,    36,    37,    38,
          39,    40,    41,    42,    43,    44,    45,    46,    47,
          48,    49,    50,    51,    52,    53,    54,    55,    56,
          57,    58,    59,    60,    61,    62,    63,    64,    65,
          66,    67,    68,    69,    70,    71,    72,    73,    74,
          75,    76,    77,    78,    79,    80,    81,    82,    83,
          84,    85,    86,    87,    88,    89,    90,    91,    92,
          93,   101,   120,   121], dtype=int64)

Ok Lets assume anyone lower than 14 is not correct, however, it could be true. I will just take 14 and up since typically you can start drivers training at 14 and 9 months in michigan. Also there is a jump after 93 to 101. It may be true that are drivers older than 93 however, I will exclude the data because it could be an error.

Age_corrected_filter = ((traffic["Age"] > 13) & (traffic["Age"] < 94))
traffic["Age"][Sorted_age_indx][Age_corrected_filter].unique()

array([14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
       31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
       48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
       65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,
       82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93], dtype=int64)

traffic["Age"][Age_corrected_filter].hist(bins=80,width=1)

<matplotlib.axes._subplots.AxesSubplot at 0x29074c7bd48>

png

Search Performed

traffic['Search_Performed'].unique()

array(['None', 'Driver', 'Vehicle', nan, 'Passenger'], dtype=object)

search_p_clean = traffic[~traffic["Search_Performed"].isnull()]

# Will need to look into values, probably something to do with the nan value
values, counts = np.unique(search_p_clean["Search_Performed"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 4 artists>

png

by_door = search_p_clean.groupby(by=["Hour","Search_Performed"],as_index=False).count()
by_door = by_door.pivot(index="Hour",columns="Search_Performed", values= "Date")
by_door.plot(kind="bar",subplots=True,layout=(2,2),sharex=True, sharey=True,figsize=(10,5))
[search_p_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

search_p_positive_clean = search_p_clean[search_p_clean["Search_Performed"] != "None"]

by_door = search_p_positive_clean.groupby(by=["Hour","Search_Performed"],as_index=False).count()
by_door = by_door.pivot(index="Hour",columns="Search_Performed", values= "Date")
by_door.plot(kind="bar",subplots=True,layout=(2,2),sharex=True, sharey=True,figsize=(10,5))
[search_p_positive_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Search Authority

traffic['Search_Authority'].unique()

array(['      ', 'Incident to Arrest', 'Terry Cursory', 'Consent',
       'Plain View', 'Tow Inventory', 'Parole/Probation'], dtype=object)

print("Total number not searched",len(traffic[traffic["Search_Authority"] == '      ']))
print("Total number searched",len(traffic[traffic["Search_Authority"] != '      ']))

Total number not searched 18748
Total number searched 1901

search_clean = traffic[traffic["Search_Authority"] != '      ']

values, counts = np.unique(search_clean["Search_Authority"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 6 artists>

png

by_search = search_clean.groupby(by=["Hour","Search_Authority"],as_index=False).count()
by_search = by_search.pivot(index="Hour",columns="Search_Authority", values= "Date")
by_search.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7))
[search_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Discovery If Searched

traffic['Discovery_If_Searched'].unique()

array(['       ', 'Nothing', 'Alcohol', 'Drugs', 'Drugs/ Alcohol', 'Cash',
       'Other Property', 'Drugs/ Cash', 'Weapon/ Drugs', 'Weapon',
       'Vehicle/ Drugs', 'Drugs/ Other Property', 'Cash/ Other Property',
       'Vehicle/ Alcohol/ Cash', 'Drugs/ Alcohol/ Cash/ Other Property',
       'Vehicle/ Alcohol', 'Vehicle', 'Weapon/ Drugs/ Alcohol',
       'Weapon/ Drugs/ Cash', 'Weapon/ Vehicle', 'Weapon/ Vehicle/ Cash',
       'Weapons/ Drugs/ Other Property', 'Weapon/ Drugs/ Other Property',
       'Vehicle/ Other Property', 'Vehicle/ Drugs/ Cash',
       'Vehicle/ Drugs/ Cash/ Other Property',
       'Weapon/ Vehicle/ Drugs/ Cash', 'Vehicle/ Cash/ Other Property',
       'Drugs/ Alcohol/ Cash', 'Weapon/ Other Property',
       'Alcohol/ Other Property', 'Alcohol/ Cash'], dtype=object)

print("Total number not searched",len(traffic[traffic["Discovery_If_Searched"] == '       ']))
print("Total number searched",len(traffic[traffic["Discovery_If_Searched"] != '       ']))
print("Total number searched, but nothing found",len(traffic[(traffic["Discovery_If_Searched"] != '       ') & (traffic["Discovery_If_Searched"] == 'Nothing')]))
print("Total number searched and discovery made",len(traffic[(traffic["Discovery_If_Searched"] != '       ') & (traffic["Discovery_If_Searched"] != 'Nothing')]))

Total number not searched 18512
Total number searched 2137
Total number searched, but nothing found 1547
Total number searched and discovery made 590

discovery_clean = traffic[traffic["Discovery_If_Searched"] != '       ']

values, counts = np.unique(discovery_clean['Discovery_If_Searched'], return_counts=True)
fig, ax = plt.subplots(figsize=(5,10))
ax.barh(values,counts)

<BarContainer object of 31 artists>

png

counts

array([  95,    1,    1,   20,   10,  216,   17,    1,    1,   12,   10,
       1547,   66,   41,    4,    1,    3,   14,    6,    1,    2,   36,
         19,    5,    2,    1,    1,    1,    1,    1,    1], dtype=int64)

by_discovery = discovery_clean.groupby(by=["Hour","Discovery_If_Searched"],as_index=False).count()
by_discovery = by_discovery.pivot(index="Hour",columns="Discovery_If_Searched", values= "Date")
by_discovery.plot(kind="bar",subplots=True,layout=(16,2),sharex=True, sharey=True,figsize=(10,40))
[discovery_clean.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

discovery_clean_positive = traffic[(traffic["Discovery_If_Searched"] != '       ') & (traffic["Discovery_If_Searched"] != 'Nothing')]
values, counts = np.unique(discovery_clean_positive['Discovery_If_Searched'], return_counts=True)
fig, ax = plt.subplots(figsize=(5,10))
ax.barh(values,counts)

<BarContainer object of 30 artists>

png

by_discovery_positive = discovery_clean_positive.groupby(by=["Hour","Discovery_If_Searched"],as_index=False).count()
by_discovery_positive = by_discovery_positive.pivot(index="Hour",columns="Discovery_If_Searched", values= "Date")
by_discovery_positive.plot(kind="bar",subplots=True,layout=(16,2),sharex=True, sharey=True,figsize=(10,40))
[discovery_clean_positive.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Result of Stop

traffic['Result_of_Stop'].unique()

array(['Citation', 'Warning', 'Arrest', '    ', 'Report'], dtype=object)

values, counts = np.unique(traffic["Result_of_Stop"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 5 artists>

png

by_result = traffic.groupby(by=["Hour","Result_of_Stop"],as_index=False).count()
by_result = by_result.pivot(index="Hour",columns="Result_of_Stop", values= "Date")
by_result.plot(kind="bar",subplots=True,layout=(3,2),sharex=True, sharey=True,figsize=(10,7.5))
[traffic.groupby(by=["Hour"]).count()["Date"].plot(kind="bar", ax=ax,color="gray",alpha=0.3) for ax in plt.gcf().axes]
plt.tight_layout()

png

Officer Badge

traffic['Officer_Badge'].unique()

array([158, 225,  62, 124, 152,  71,  50, 112, 337, 334, 132, 169, 390,
        35,  87,  44,  96, 220, 165,  74,  49,  97, 154,  82,  86, 143,
       176,   2,  89, 391, 215, 160,  37,  85, 197, 185,  52,  59,  28,
       222, 305,  40, 212, 211, 110, 442, 153,  64,  22,  84, 376,   0,
       118,   8, 312,  36, 345, 378, 140,  63,  79, 100,  54, 177,   3,
       324, 339, 398,   7,  23, 319, 142, 344,  46, 128,  81, 317, 168,
       175,  21,  27, 201, 307,  67, 226, 164, 358, 146, 127, 351, 353,
       101, 347,  38, 355, 134, 230, 129, 309, 457,  47,  48, 326,  29,
        14,  66,  83, 188, 103,  94,  11, 136,  80,  34, 335, 122, 414,
       408, 181,  69,  75, 308,  12,  43,  57,  45, 108, 125, 145,  51,
        72, 117, 120, 182, 428, 354, 472, 245, 111, 123,  70, 130, 387,
        31,  53, 135,   6, 203, 106,  91, 139, 167,  90,  41,  99, 191,
       121, 196, 202, 131, 200,  76,  92,   4, 228,  16, 186, 126, 115,
       119, 151, 104, 150, 190, 149, 193, 137, 170, 148, 113], dtype=int64)

Sorted_badge_indx = np.argsort(traffic['Officer_Badge'].values)
traffic['Officer_Badge'][Sorted_badge_indx].unique()

array([  0,   2,   3,   4,   6,   7,   8,  11,  12,  14,  16,  21,  22,
        23,  27,  28,  29,  31,  34,  35,  36,  37,  38,  40,  41,  43,
        44,  45,  46,  47,  48,  49,  50,  51,  52,  53,  54,  57,  59,
        62,  63,  64,  66,  67,  69,  70,  71,  72,  74,  75,  76,  79,
        80,  81,  82,  83,  84,  85,  86,  87,  89,  90,  91,  92,  94,
        96,  97,  99, 100, 101, 103, 104, 106, 108, 110, 111, 112, 113,
       115, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
       129, 130, 131, 132, 134, 135, 136, 137, 139, 140, 142, 143, 145,
       146, 148, 149, 150, 151, 152, 153, 154, 158, 160, 164, 165, 167,
       168, 169, 170, 175, 176, 177, 181, 182, 185, 186, 188, 190, 191,
       193, 196, 197, 200, 201, 202, 203, 211, 212, 215, 220, 222, 225,
       226, 228, 230, 245, 305, 307, 308, 309, 312, 317, 319, 324, 326,
       334, 335, 337, 339, 344, 345, 347, 351, 353, 354, 355, 358, 376,
       378, 387, 390, 391, 398, 408, 414, 428, 442, 457, 472], dtype=int64)

# Why is there some officers with a bunch?
values, counts = np.unique(traffic["Officer_Badge"], return_counts=True)
fig, ax = plt.subplots()
ax.barh(values,counts)

<BarContainer object of 180 artists>

png

#Officers that have made over 100 stops
officer_100 = values[counts > 100]

mask =[]
for row in traffic["Officer_Badge"]:
    if row in officer_100:
        mask.append(True)
    else:
        mask.append(False)
officer_top = traffic[mask]

Questions

Individuals who were searched vs not searched is there a link to demographics? / Is there any demographic subjected to harsher punishment across traffic stop results?

Disclaimer

Due to the nature of the stops and the fact that there can be multiple autorites to arrest and search. The values are hard to correct if they are errored or not. For example there may be situations where there exist multiple columns that logically do not flow together (No search, but incident to arrest, and finding something, and no arrest). Thus no values will be purged as purging will result in data loss that may be not deserving of purging.

However, I will only be going off the accuracy if the table says the search was preformed. I will not edit the table if other rows seem like this answer should be yes.

Also this would be best served as a combined visualization. I do not have it made yet, however, the idea is that I will create a count for each column under the “Search” and make a giant heat map with a diverging colormap from the census data.

traffic.columns

Index(['Date', 'Team_Area', 'Street_Area', 'Reason_for_Stop', 'Race_Ethnicity',
       'Gender', 'Age', 'Search_Performed', 'Search_Authority',
       'Discovery_If_Searched', 'Result_of_Stop', 'Officer_Badge',
       'Traffic_Crash', 'Serial_Number', 'ObjectId', 'Hour'],
      dtype='object')

print(counts_race)

African-American/Black:6728 , Asian-Pacific Islander:319 , Hispanic:1334 , Native American:11 , Unknown:2109 , White:10148

census_race = [census_ALL["White"],census_ALL["Black"],census_ALL["Hispanic"],census_ALL["Asian_Pacific"],census_ALL["Native"]]
census_race_mask = [census_race for i in range(5)]
census_race_mask

[[0.5908361910000001,
230127567,
13474726199999998,
039975298,
004313682],
 [0.5908361910000001,
230127567,
13474726199999998,
039975298,
004313682],
 [0.5908361910000001,
230127567,
13474726199999998,
039975298,
004313682],
 [0.5908361910000001,
230127567,
13474726199999998,
039975298,
004313682],
 [0.5908361910000001,
230127567,
13474726199999998,
039975298,
004313682]]

census_race_mask = [census_race for i in range(5)]
by_result = traffic.groupby(by=["Race_Ethnicity","Result_of_Stop"],as_index=False).count()
by_result = by_result.pivot(index="Race_Ethnicity",columns="Result_of_Stop", values= "Date")
by_result = by_result.drop(index="Unknown")
total_mask=by_result.sum().values
by_result = by_result.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_result = by_result.rename(columns={'    ':"None"})
by_result_comp = (by_result/total_mask)#-census_race_mask
by_result_comp

Result_of_Stop	None	Arrest	Citation	Report	Warning
Race_Ethnicity
White	0.491525	0.340153	0.561575	0.411765	0.517007
African-American/Black	0.355932	0.570332	0.352537	0.470588	0.379906
Hispanic	0.135593	0.084399	0.068627	0.058824	0.082156
Asian-Pacific Islander	NaN	0.005115	0.016911	0.058824	0.019623
Native American	0.016949	NaN	0.000351	NaN	0.001308

census_race_mask = [census_race for i in range(5)]
by_reason = traffic.groupby(by=["Race_Ethnicity","Reason_for_Stop"],as_index=False).count()
by_reason = by_reason.pivot(index="Race_Ethnicity",columns="Reason_for_Stop", values= "Date")
by_reason = by_reason.drop(index="Unknown")
total_mask=by_reason.sum().values
by_reason = by_reason.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_reason_comp = (by_reason/total_mask)#-census_race_mask
by_reason_comp#.plot()

Reason_for_Stop	Equipment Violation	Investigative Stop	Moving Violation	Other	Registration
Race_Ethnicity
White	0.482554	0.520436	0.567977	0.372263	0.556291
African-American/Black	0.423077	0.408719	0.338536	0.558394	0.369915
Hispanic	0.079699	0.064033	0.072651	0.060219	0.064333
Asian-Pacific Islander	0.013878	0.006812	0.020282	0.005474	0.009461
Native American	0.000793	NaN	0.000555	0.003650	NaN

I don’t think how they were searched as much as the fact they were searched is important

census_race_mask = [census_race for i in range(7)]
by_authority = traffic.groupby(by=["Race_Ethnicity","Search_Authority"],as_index=False).count()
by_authority = by_authority.pivot(index="Race_Ethnicity",columns="Search_Authority", values= "Date")
by_authority = by_authority.drop(index="Unknown")
total_mask=by_authority.sum().values
by_authority = by_authority.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_authority_comp = (by_authority/total_mask)-census_race_mask
by_authority_comp#.plot()

Search_Authority		Consent	Incident to Arrest	Parole/Probation	Plain View	Terry Cursory	Tow Inventory
Race_Ethnicity
White	-0.020120	-0.262760	-0.266736	-0.340836	-0.322086	-0.344459	-0.313058
African-American/Black	0.108231	0.356623	0.372827	0.269872	0.419872	0.451032	0.380984
Hispanic	-0.062874	-0.049574	-0.066419	0.115253	-0.065997	-0.062283	-0.060673
Asian-Pacific Islander	-0.021517	NaN	-0.036282	NaN	-0.027475	NaN	-0.002938
Native American	-0.003720	NaN	-0.003390	NaN	NaN	NaN	NaN

search = by_authority[["Consent","Incident to Arrest","Parole/Probation","Plain View","Terry Cursory","Tow Inventory"]].sum(axis=1)
no_search = by_authority['      ']
A_search_nosearch = pd.DataFrame([no_search,search])
A_search_nosearch = A_search_nosearch.T.rename(columns={'      ': "Not_Searched","Unnamed 0": "Searched"})

total_mask=A_search_nosearch.sum().values
census_race_mask = [census_race for i in range(2)]
A_search_nosearch_comp = (A_search_nosearch/total_mask)#-census_race_mask
A_search_nosearch

	Not_Searched	Searched
Race_Ethnicity
White	9616.0	532.0
African-American/Black	5701.0	1027.0
Hispanic	1211.0	123.0
Asian-Pacific Islander	311.0	8.0
Native American	10.0	1.0

I don’t think what was discoverd is as important as if they discovered anything

traffic["Discovery_If_Searched"].unique()

array(['       ', 'Nothing', 'Alcohol', 'Drugs', 'Drugs/ Alcohol', 'Cash',
       'Other Property', 'Drugs/ Cash', 'Weapon/ Drugs', 'Weapon',
       'Vehicle/ Drugs', 'Drugs/ Other Property', 'Cash/ Other Property',
       'Vehicle/ Alcohol/ Cash', 'Drugs/ Alcohol/ Cash/ Other Property',
       'Vehicle/ Alcohol', 'Vehicle', 'Weapon/ Drugs/ Alcohol',
       'Weapon/ Drugs/ Cash', 'Weapon/ Vehicle', 'Weapon/ Vehicle/ Cash',
       'Weapons/ Drugs/ Other Property', 'Weapon/ Drugs/ Other Property',
       'Vehicle/ Other Property', 'Vehicle/ Drugs/ Cash',
       'Vehicle/ Drugs/ Cash/ Other Property',
       'Weapon/ Vehicle/ Drugs/ Cash', 'Vehicle/ Cash/ Other Property',
       'Drugs/ Alcohol/ Cash', 'Weapon/ Other Property',
       'Alcohol/ Other Property', 'Alcohol/ Cash'], dtype=object)

census_race_mask = [census_race for i in range(32)]
by_discovery = traffic.groupby(by=["Race_Ethnicity",'Discovery_If_Searched'],as_index=False).count()
by_discovery = by_discovery.pivot(index="Race_Ethnicity",columns='Discovery_If_Searched', values= "Date")
by_discovery = by_discovery.drop(index="Unknown")
total_mask=by_discovery.sum().values
by_discovery = by_discovery.reindex(["White", "African-American/Black", "Hispanic","Asian-Pacific Islander","Native American"])
by_discovery_comp = (by_discovery/total_mask)-census_race_mask
by_discovery_comp

Discovery_If_Searched		Alcohol	Alcohol/ Cash	Alcohol/ Other Property	Cash	Cash/ Other Property	Drugs	Drugs/ Alcohol	Drugs/ Alcohol/ Cash	Drugs/ Alcohol/ Cash/ Other Property	...	Weapon	Weapon/ Drugs	Weapon/ Drugs/ Alcohol	Weapon/ Drugs/ Cash	Weapon/ Drugs/ Other Property	Weapon/ Other Property	Weapon/ Vehicle	Weapon/ Vehicle/ Cash	Weapon/ Vehicle/ Drugs/ Cash	Weapons/ Drugs/ Other Property
Race_Ethnicity
White	-0.025718	-0.268997	0.409164	NaN	-0.473189	-0.162265	-0.277705	-0.296719	NaN	NaN	...	-0.215836	-0.355542	-0.390836	0.409164	0.409164	0.409164	NaN	0.409164	NaN	NaN
African-American/Black	0.113823	0.379068	NaN	0.769872	0.652225	0.341301	0.386034	0.358108	0.769872	0.769872	...	0.238622	0.475755	0.369872	NaN	NaN	NaN	0.769872	NaN	NaN	0.769872
Hispanic	-0.062748	-0.077276	NaN	NaN	NaN	NaN	-0.064040	-0.017100	NaN	NaN	...	0.021503	-0.075924	0.065253	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Asian-Pacific Islander	-0.021645	-0.028481	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	0.960025	NaN
Native American	-0.003713	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 32 columns

search = by_authority[["Consent","Incident to Arrest","Parole/Probation","Plain View","Terry Cursory","Tow Inventory"]].sum(axis=1)
discovery = by_discovery.drop(columns=['       ','Nothing']).sum(axis=1)
no_discovery = search - discovery
discovery_nodiscovery = pd.DataFrame([no_discovery,discovery])
discovery_nodiscovery = discovery_nodiscovery.T.rename(columns={0:"No_Discovery", 1:"Discovery"})

total_mask=discovery_nodiscovery.sum().values
census_race_mask = [census_race for i in range(2)]
discovery_nodiscovery_comp = (discovery_nodiscovery/total_mask)#-census_race_mask
discovery_nodiscovery

	No_Discovery	Discovery
Race_Ethnicity
White	366.0	166.0
African-American/Black	703.0	324.0
Hispanic	87.0	36.0
Asian-Pacific Islander	5.0	3.0
Native American	1.0	0.0

by_result_comp.T

Race_Ethnicity	White	African-American/Black	Hispanic	Asian-Pacific Islander	Native American
Result_of_Stop
None	0.491525	0.355932	0.135593	NaN	0.016949
Arrest	0.340153	0.570332	0.084399	0.005115	NaN
Citation	0.561575	0.352537	0.068627	0.016911	0.000351
Report	0.411765	0.470588	0.058824	0.058824	NaN
Warning	0.517007	0.379906	0.082156	0.019623	0.001308

by_reason_comp.T

Race_Ethnicity	White	African-American/Black	Hispanic	Asian-Pacific Islander	Native American
Reason_for_Stop
Equipment Violation	0.482554	0.423077	0.079699	0.013878	0.000793
Investigative Stop	0.520436	0.408719	0.064033	0.006812	NaN
Moving Violation	0.567977	0.338536	0.072651	0.020282	0.000555
Other	0.372263	0.558394	0.060219	0.005474	0.003650
Registration	0.556291	0.369915	0.064333	0.009461	NaN

binary_comp = pd.concat([A_search_nosearch_comp, discovery_nodiscovery_comp], axis=1)
binary_comp

	Not_Searched	Searched	No_Discovery	Discovery
Race_Ethnicity
White	0.570716	0.314607	0.314974	0.313800
African-American/Black	0.338358	0.607333	0.604991	0.612476
Hispanic	0.071874	0.072738	0.074871	0.068053
Asian-Pacific Islander	0.018458	0.004731	0.004303	0.005671
Native American	0.000594	0.000591	0.000861	0.000000

# split into 4 different arrays for spacing
race = ["White", "African-American/Black","Hispanic","Asian-Pacific Islander","Native American"]

results_name = by_result_comp.T.index.values
results_values = by_result_comp.T.values

reasons_name = by_reason_comp.T.index.values
reasons_values = by_reason_comp.T.values

binary_name = binary_comp.T.index.values
binary_value = binary_comp.T.values

#Got this code from https://matplotlib.org/3.1.0/gallery/images_contours_and_fields/image_annotated_heatmap.html#sphx-glr-gallery-images-contours-and-fields-image-annotated-heatmap-py
def heatmap(data, row_labels, col_labels, ax=None,colorbar=True, title= " ",
            cbar_kw={}, cbarlabel="", **kwargs):
    """
    Create a heatmap from a numpy array and two lists of labels.

    Parameters
    ----------
    data
        A 2D numpy array of shape (N, M).
    row_labels
        A list or array of length N with the labels for the rows.
    col_labels
        A list or array of length M with the labels for the columns.
    ax
        A `matplotlib.axes.Axes` instance to which the heatmap is plotted.  If
        not provided, use current axes or create a new one.  Optional.
    cbar_kw
        A dictionary with arguments to `matplotlib.Figure.colorbar`.  Optional.
    cbarlabel
        The label for the colorbar.  Optional.
    **kwargs
        All other arguments are forwarded to `imshow`.
    """

    if not ax:
        ax = plt.gca()

    # Plot the heatmap
    im = ax.imshow(data, **kwargs)
    
    
    # Create colorbar
    if(colorbar == True):
        cbar = ax.figure.colorbar(im, ax=ax, **cbar_kw)
        cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")
    ax.set_xlabel(title,labelpad =10, fontsize=25)
    # We want to show all ticks...
    ax.set_xticks(np.arange(data.shape[1]))
    ax.set_yticks(np.arange(data.shape[0]))
    # ... and label them with the respective list entries.
    ax.set_xticklabels(col_labels, fontsize=22)
    ax.set_yticklabels(row_labels, fontsize=22)

    # Let the horizontal axes labeling appear on top.
    ax.tick_params(top=True, bottom=False,
                   labeltop=True, labelbottom=False)

    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=-30, ha="right",
             rotation_mode="anchor")

    # Turn spines off and create white grid.
    for edge, spine in ax.spines.items():
        spine.set_visible(False)

    ax.set_xticks(np.arange(data.shape[1]+1)-.5, minor=True)
    ax.set_yticks(np.arange(data.shape[0]+1)-.5, minor=True)
    ax.grid(which="minor", color="w", linestyle='-', linewidth=3)
    ax.tick_params(which="minor", bottom=False, left=False)
    
    norm = matplotlib.colors.Normalize(vmin=0,vmax=0.61)
    norm(0.)
    
    if(colorbar == True):
        return im, cbar
    if(colorbar == False):
        return im


def annotate_heatmap(im, data=None, valfmt="{x:.2f}",
                     textcolors=["black", "white"],
                     threshold=None, **textkw):
    """
    A function to annotate a heatmap.

    Parameters
    ----------
    im
        The AxesImage to be labeled.
    data
        Data used to annotate.  If None, the image's data is used.  Optional.
    valfmt
        The format of the annotations inside the heatmap.  This should either
        use the string format method, e.g. "$ {x:.2f}", or be a
        `matplotlib.ticker.Formatter`.  Optional.
    textcolors
        A list or array of two color specifications.  The first is used for
        values below a threshold, the second for those above.  Optional.
    threshold
        Value in data units according to which the colors from textcolors are
        applied.  If None (the default) uses the middle of the colormap as
        separation.  Optional.
    **kwargs
        All other arguments are forwarded to each call to `text` used to create
        the text labels.
    """

    if not isinstance(data, (list, np.ndarray)):
        data = im.get_array()

    # Normalize the threshold to the images color range.
    if threshold is not None:
        threshold = im.norm(threshold)
    else:
        threshold = im.norm(data.max())/2.

    # Set default alignment to center, but allow it to be
    # overwritten by textkw.
    kw = dict(horizontalalignment="center",
              verticalalignment="center")
    kw.update(textkw)

    # Get the formatter in case a string is supplied
    if isinstance(valfmt, str):
        valfmt = matplotlib.ticker.StrMethodFormatter(valfmt)

    # Loop over the data and create a `Text` for each "pixel".
    # Change the text's color depending on the data.
    texts = []
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            kw.update(color=textcolors[int(im.norm(data[i, j]) > threshold)])
            text = im.axes.text(j, i, valfmt(data[i, j], None), **kw)
            texts.append(text)

    return texts

fig, ((ax1,ax2,ax3)) = plt.subplots(1, 3, figsize=(22, 16))

im = heatmap(results_values, results_name, race,  ax=ax1,colorbar=False, title= "Result of Stop",
                   cmap='Greens', cbarlabel="Percentage per individual weapon")
annotate_heatmap(im,fontsize=17)

im = heatmap(reasons_values, reasons_name, race,  ax=ax2, colorbar=False, title= "Reason for Stop",
                   cmap='Greens', cbarlabel="Percentage per individual weapon")
annotate_heatmap(im,fontsize=17)

im = heatmap(binary_value, binary_name, race,  ax=ax3, colorbar=False, title= "Search Results",
                   cmap='Greens', cbarlabel="Percentage per individual weapon")
annotate_heatmap(im,fontsize=17)


fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.08, .2, 0.9, 0.04])
cbar = fig.colorbar(im, cax=cbar_ax,orientation="horizontal")
cbar.set_label("Percent of Traffic Stops", rotation=0,size=25, labelpad = 50)
cbar.ax.tick_params(labelsize=18)

fig.tight_layout()
plt.savefig("./outcomes.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)
plt.show()

C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\colors.py:933: UserWarning: Warning: converting a masked element to nan.
  dtype = np.min_scalar_type(value)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\ma\core.py:713: UserWarning: Warning: converting a masked element to nan.
  data = np.array(a, copy=False, subok=subok)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:150: UserWarning: This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.

png

Is there a trend by Age of which certain demographics are pulled over?

traffic["Age"][Age_corrected_filter].hist(bins=80,width=1)

<matplotlib.axes._subplots.AxesSubplot at 0x29074ee3ec8>

png

traffic["Race_Ethnicity"].unique()

array(['White', 'Unknown', 'African-American/Black',
       'Asian-Pacific Islander', 'Hispanic', 'Native American'],
      dtype=object)

races = ['Native American','African-American/Black','Asian-Pacific Islander','Hispanic','White']

fig, ax = plt.subplots()

cmap = matplotlib.cm.get_cmap('Greens')
#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]
colors = [cmap(2/7),cmap(3/7),cmap(4/7),cmap(5/7),cmap(6/7)]


# Iterate through the races
i=0
for race in races:
    # Subset to the race
    
    subset = traffic[traffic['Race_Ethnicity'] == race][Age_corrected_filter]
    
    
    # Draw the density plot
    sns.distplot(subset['Age'].dropna(), hist = False, kde = True, color = colors[i],
                 label = race,kde_kws={'linewidth':5})
    i = i +1

sns.distplot(traffic['Age'][Age_corrected_filter].dropna(), hist = False, kde = True, color = "black",
                 label = "All Races Together",kde_kws={'linewidth':3,'linestyle':'--','alpha':.5})

# Plot formatting
fig.set_size_inches(22.5, 14.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.yticks([0.01,0.04], fontsize =35)
plt.xticks(np.arange(0,100,5),fontsize=35)
legend = plt.legend(title = 'Race',prop={'size': 35})
legend.get_title().set_fontsize('30')
plt.title('Age distribution of Traffic Stops by race \n (All races same sample size)', pad = 50,fontdict = 
          {'fontsize': 45,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "center"})
plt.xlabel('Age (years)', fontsize = 40, labelpad= 30)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Density of Traffic Stops", rotation=90,fontsize = 40,labelpad=30)
plt.savefig("./Age_by_race.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:15: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  from ipykernel import kernelapp as app

png

Is there any officer that is pulling over a disproportionate percentage of individuals from a given demographic?

percents = [census_ALL["Black"],census_ALL["White"],census_ALL["Asian_Pacific"],census_ALL["Hispanic"],census_ALL["Native"]]
percents = np.array(percents)*100

by_officer = officer_top.groupby(by=["Race_Ethnicity","Officer_Badge"],as_index=False).count()
by_officer = by_officer.pivot(index="Race_Ethnicity",columns="Officer_Badge", values= "Date")
by_officer = by_officer.drop("Unknown", axis=0)
by_officer.reset_index()
for i in by_officer.columns:
    sum_c = by_officer[i].sum()
    for j in range(len(by_officer[i])):
        by_officer[i].iloc[j] = (by_officer[i].iloc[j]/sum_c)*100
officer_black = by_officer.T["African-American/Black"]
officer_asain = by_officer.T["Asian-Pacific Islander"]
officer_hispanic = by_officer.T["Hispanic"]
officer_native = by_officer.T["Native American"]
officer_white = by_officer.T["White"]

print("Number of officers surveryed {}".format(len(by_officer.T)))

Number of officers surveryed 43

fig, (axes1,axes2) = plt.subplots(2,1,figsize=(13,17))

axes1.violinplot(dataset = [officer_black.dropna().values,
                           officer_white.dropna().values] )

axes1.set_title("Percentage of Race Stopped by Individual Officers",fontsize = 30)
axes1.spines['top'].set_visible(False)
axes1.spines['right'].set_visible(False)
axes1.spines['left'].set_visible(False)
axes1.spines['bottom'].set_visible(False)
axes1.tick_params(axis='both', which='major', labelsize=25)
axes1.yaxis.set_major_formatter(mtick.PercentFormatter())
#axes1.set_ylabel('Percentage of Population Stopped',labelpad =10,fontsize=25)
axes1.set_xticks(np.arange(1,3,1))
axes1.set_xticklabels(["African America/Black","White"],rotation=45)
axes1.scatter([1,2],percents[0:2],marker="D",s=100,color="Black",label = "Population Percentage \n from Census",alpha=0.5)
boxprops = dict(linestyle=' ', linewidth=0.1)
capprops=dict(linestyle=' ', linewidth=0.1)
medianprops=dict(linestyle='-', linewidth=2)
bp = by_officer.T[["African-American/Black","White"]].boxplot(figsize=(15,10),boxprops=boxprops,capprops=capprops,medianprops=medianprops,ax=axes1)
#axes1.legend(loc="upper left",fontsize=17)
axes1.grid(True,axis='y')
axes1.grid(False,axis='x')


axes2.violinplot(dataset = [officer_asain.dropna().values,
                           officer_hispanic.dropna().values,
                           officer_native.dropna().values] )

#axes2.set_title("Percentage of Race Stopped by Individual Officers",fontsize = 30)
axes2.spines['top'].set_visible(False)
axes2.spines['right'].set_visible(False)
axes2.spines['left'].set_visible(False)
axes2.spines['bottom'].set_visible(False)
axes2.tick_params(axis='both', which='major', labelsize=25)
axes2.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.0f%%'))
#axes2.set_ylabel('Percentage of Population Stopped by individual officer (N = 43)',labelpad =10, fontsize=25)
fig.text(0, 0.5, "Percentage of Race Stopped in Individual Officer's Total Traffic Stops (N = 43)", va='center', rotation='vertical', fontsize=25)
axes2.set_xticks(np.arange(1,4,1))
axes2.set_xticklabels(["Asian-Pacific Islander","Hispanic","Native American"],rotation=45)
axes2.scatter([1,2,3],percents[2::],marker="D",s=100,color="Black",label = "Population Percentage \n from Census",alpha=0.5)
boxprops = dict(linestyle=' ', linewidth=0.1)
capprops=dict(linestyle=' ', linewidth=0.1)
medianprops=dict(linestyle='-', linewidth=2)
by_officer.T[["Asian-Pacific Islander","Hispanic","Native American"]].boxplot(figsize=(15,10),boxprops=boxprops,capprops=capprops,medianprops=medianprops,ax=axes2)
#xes2.grid(True,axis='y')
axes2.legend(loc="upper left",fontsize=17)
axes2.grid(False,axis='x')
plt.savefig("./Officer_Traffic_Race.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)
plt.show()

png

Are certain demographics more targeted per time of day of the traffic stop?

The last plot is the one I ended up using. This plot can be represented in many different ways, but I feel comparing it to the census gives the most imformation and observations of pattern

traffic_corrected = traffic[traffic['Race_Ethnicity'] != "Unknown"]

race_percentage

{'African-American/Black': 0.362891046386192,
 'Asian-Pacific Islander': 0.01720604099244876,
 'Hispanic': 0.07195253505933118,
 'Native American': 0.0005933117583603021,
 'White': 0.11375404530744336}

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = abs(by_race["White"].values/max_hour)#-census_ALL["White"])
black = abs(by_race["African-American/Black"].values/max_hour)#-census_ALL["Black"])
asian = abs(by_race["Asian-Pacific Islander"].values/max_hour)#-census_ALL["Asian_Pacific"])
hispanic = abs(by_race["Hispanic"].values/max_hour)#-census_ALL["Hispanic"])
native = abs(by_race["Native American"].values/max_hour)#-census_ALL["Native"])

white_m = abs(white - race_percentage['White'])
black_m = abs(black - race_percentage['African-American/Black'])
asian_m = abs(asian - race_percentage['Asian-Pacific Islander'])
hispanic_m = abs(hispanic - race_percentage['Hispanic'])
native_m = abs(native - race_percentage['Native American'])


native_estimate_m=pd.DataFrame(native_m).fillna(method="bfill").values
asian_estimate_m=pd.DataFrame(asian_m).fillna(method="bfill").values

by_race["White"].values/max_hour

array([0.47021277, 0.4054878 , 0.46610169, 0.38461538, 0.41836735,
       0.31707317, 0.58706468, 0.61942959, 0.63086172, 0.60101243,
       0.56942102, 0.56384505, 0.5725938 , 0.55119215, 0.48765432,
       0.46764706, 0.48816029, 0.50173611, 0.50630631, 0.54291845,
       0.42950108, 0.44904815, 0.462     , 0.47295597])

races = [black_m,native_m,asian_m,hispanic_m,white_m]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White","Unkown"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

#plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate_m*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate_m*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 3,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(18.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=30)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
plt.legend(bbox_to_anchor=(0.5, -0.05),prop={'size': 16}, title = 'Race')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 35,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "right"})
plt.xlabel('Hours (24hr Time)', fontsize = 30,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Mean Percentage", rotation=90,fontsize = 30,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour_abs.png", dpi=400,bbox_inches="tight", pad_inches=0)

png

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = abs(by_race["White"].values/max_hour)#-census_ALL["White"])
black = abs(by_race["African-American/Black"].values/max_hour)#-census_ALL["Black"])
asian = abs(by_race["Asian-Pacific Islander"].values/max_hour)#-census_ALL["Asian_Pacific"])
hispanic = abs(by_race["Hispanic"].values/max_hour)#-census_ALL["Hispanic"])
native = abs(by_race["Native American"].values/max_hour)#-census_ALL["Native"])

white_m = white - race_percentage['White']
black_m = black - race_percentage['African-American/Black']
asian_m = asian - race_percentage['Asian-Pacific Islander']
hispanic_m = hispanic - race_percentage['Hispanic']
native_m = native - race_percentage['Native American']


native_estimate_m=pd.DataFrame(native_m.copy()).fillna(method="bfill").values.flatten()
asian_estimate_m=pd.DataFrame(asian_m.copy()).fillna(method="bfill").values.flatten()

races = [black_m,native_m,asian_m,hispanic_m,white_m]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

#plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate_m*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate_m*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 3,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(18.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=20)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
plt.legend(bbox_to_anchor=(1.00, 1.0),prop={'size': 16}, title = 'Race')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 30,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "right"})
plt.xlabel('Hours (24hr Time)', fontsize = 25,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Mean Percentage", rotation=90,fontsize = 25,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour.png", dpi=400,bbox_inches="tight", pad_inches=0)

png

asian = by_race["Asian-Pacific Islander"]/max_hour-census_ALL["Asian_Pacific"]
native = by_race["Native American"]/max_hour-census_ALL["Native"]
native_estimate=abs(native.fillna(method="bfill").values)
asian_estimate=abs(asian.fillna(method="bfill").values)

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = abs(by_race["White"].values/max_hour-census_ALL["White"])
black = abs(by_race["African-American/Black"].values/max_hour-census_ALL["Black"])
asian = abs(by_race["Asian-Pacific Islander"].values/max_hour-census_ALL["Asian_Pacific"])
hispanic = abs(by_race["Hispanic"].values/max_hour-census_ALL["Hispanic"])
native = abs(by_race["Native American"].values/max_hour-census_ALL["Native"])

#white_m = abs(white - white.mean())
#black_m = abs(black - black.mean())
#asain_m = abs(asain - np.nanmean(asain))
#hispanic_m = abs(hispanic - hispanic.mean())
#native_m = abs(native - np.nanmean(native))


#native_estimate_m = abs(native_estimate - np.nanmean(native))
#asian_estimate_m = abs(asain_estimate - np.nanmean(asain))

races = [black,native,asian,hispanic,white]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White","Unkown"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

#plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 3,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(18.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=20)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
plt.legend(bbox_to_anchor=(1.00, 1.0),prop={'size': 16}, title = 'Race')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 30,
            'fontweight' : 3,
             'verticalalignment': 'top',
             'horizontalalignment': "right"})
plt.xlabel('Hours (24hr Time)', fontsize = 25,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Population Percentage", rotation=90,fontsize = 25,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour_census_abs.png", dpi=400,bbox_inches="tight", pad_inches=0)

png

asian = by_race["Asian-Pacific Islander"]/max_hour-census_ALL["Asian_Pacific"]
native = by_race["Native American"]/max_hour-census_ALL["Native"]
native_estimate=native.fillna(method="bfill").values
asian_estimate=asian.fillna(method="bfill").values

max_hour = traffic_corrected.groupby(by=["Hour"]).count()["Date"].values
white = by_race["White"].values/max_hour-census_ALL["White"]
black = by_race["African-American/Black"].values/max_hour-census_ALL["Black"]
asian = by_race["Asian-Pacific Islander"].values/max_hour-census_ALL["Asian_Pacific"]
hispanic = by_race["Hispanic"].values/max_hour-census_ALL["Hispanic"]
native = by_race["Native American"].values/max_hour-census_ALL["Native"]

#white_m = abs(white - white.mean())
#black_m = abs(black - black.mean())
#asain_m = abs(asain - np.nanmean(asain))
#hispanic_m = abs(hispanic - hispanic.mean())
#native_m = abs(native - np.nanmean(native))


#native_estimate_m = abs(native_estimate - np.nanmean(native))
#asian_estimate_m = abs(asain_estimate - np.nanmean(asain))

races = [black,native,asian,hispanic,white]

names=["African America/Black","Native American","Asian-Pacific Islander","Hispanic","White","Unkown"]

fig, ax = plt.subplots()
cmap = matplotlib.cm.get_cmap('Greens')
colors = [cmap(6/7),cmap(5/7),cmap(4/7),cmap(3/7),cmap(2/7)]

plt.plot(np.arange(0,24,1),np.zeros(24),label="Population Percentage \n from Census",linestyle="--",color="Black",alpha=0.8)
plt.plot(np.arange(0,24,1),native_estimate*100,color=cmap(4/6),linestyle='--',linewidth=3)
#Plotting the estimated time of asain-pacific
plt.plot(np.arange(0,24,1),asian_estimate*100,color=cmap(3/6), linestyle = "--", linewidth=3)


#styles = [(0, (3, 5, 1, 5, 1, 5)),'dashed', 'solid','dashdot',(0, (5, 5)),(0, (3, 5, 1, 5)),(0, (1, 1))]


# Iterate through the races
i=0
for race in races:
    # Subset to the rac
    
    # Draw the density plot
    plt.plot(race*100,label = names[i], linewidth = 6,color = colors[i])
    i = i +1
    

# Plot formatting
fig.set_size_inches(24.5, 10.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)
ax.tick_params(axis='both', which='major', labelsize=25)
ax.yaxis.set_major_formatter(mtick.PercentFormatter())
plt.xticks(np.arange(0,24,1))
legend = plt.legend(loc='upper center',bbox_to_anchor=(0.5, -0.25),prop={'size': 25}, title = 'Race',ncol=3)
legend.get_title().set_fontsize('25')
plt.title('Traffic Stops by Hour compared to race', pad = 50,fontdict = 
          {'fontsize': 40,
            'fontweight' : 3,
             'verticalalignment': 'top'})
plt.xlabel('Hours (24hr Time)', fontsize = 35,labelpad=50)
plt.grid(axis='x',linestyle='--',alpha = 0.5)
plt.ylabel("Difference from Population Percentage", rotation=90,fontsize = 35,labelpad=50)



#Plotting the esitamted time of Native American


plt.savefig("./Race_per_hour_census.png", dpi=400,bbox_inches="tight", pad_inches=0,transparent=True)

png