New York City Evictions Analysis¶
Executive Summary¶
An analysis is conducted on eviction data in New York city from 2017 to 2020, compared to income data during that same period and population data from the 2020 census. For this project, the use case is defined as a non-profit in New York city providing support for eviction cases across the municipality. Such a non-profit would naturally need to gather data to determine where best to concentrate their efforts in the city; i.e. where the most evictions are per capita and what drives these trends. The project here is therefore set up as an analysis that such a non-profit might perform.
This analysis shows that the Bronx is the most heavily affected by eviction trends, at a prevalence per capita over twice that of the next two boroughs. A correlation analysis indicates that eviction rates during this period are influenced strongly by average household income by borough, and that tackling eviction trends across the city would require income assistance measures.
Workflow¶
The workflow of this analysis is as follows:
- Importing libraries and setting up the environment
- Importing and describing the data
- Cleaning and manipulating the data
- Joining the datasets
- Analyzing and visualizing the data
Data Description¶
The analysis uses the following datasets:
- Evictions Data: This dataset contains information about evictions in New York City, including the year, borough, and number of evictions.
- Income Data: This dataset provides information on household income in New York City, broken down by borough and year.
- Population Data: This dataset includes the population of each borough in New York City for the year 2020.
Analysis¶
The analysis includes the following results:
- Sum of evictions by borough and year
- Evictions per 1,000 people by borough and year
- Mean household income by borough and year
- Correlation between evictions per 1,000 people and household income
##########################################
#Step 0: Import libraries
##########################################
import os
import pandas as pd
from datetime import date
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
##########################################
#Step 1: Import data
##########################################
#Setting directory
os.chdir(r'C:\Users\jlenehan\OneDrive - Intel Corporation\Documents\0 - Data Science\Intro to Data Analytics\UCDPA_JohnLenehan\UCDPA_JohnLenehan')
##EVICTIONS DATA
##JSON METHOD
#Set limit to 100,000 to capture all entries in json file
NY_Evictions_gross = pd.read_json(r'https://data.cityofnewyork.us/resource/6z8x-wfk4.json?$limit=100000')
##POPULATION DATA
##JSON METHOD
NY_Pop_gross=pd.read_json(r'https://data.ny.gov/resource/krt9-ym2k.json?$limit=100000')
##INCOME DATA
##CSV METHOD
#income-location data taken from datausa.io
NY_Income_gross = pd.read_csv('Income by Location.csv')
##########################################
#Step 2: Describe data
##########################################
#Describe Evictions dataset
print(NY_Evictions_gross.columns)
print(NY_Evictions_gross.info())
print(NY_Evictions_gross.describe())
print(NY_Evictions_gross.shape)
print(NY_Evictions_gross.head())
print('\nNY Evictions Data - Unique Values:')
for x in NY_Evictions_gross.columns:
print(x+':')
print(NY_Evictions_gross[x].unique())
print(str(NY_Evictions_gross[x].nunique()) + str(' unique values'))
Index(['court_index_number', 'docket_number', 'eviction_address',
'eviction_apt_num', 'executed_date', 'marshal_first_name',
'marshal_last_name', 'residential_commercial_ind', 'borough',
'eviction_zip', 'ejectment', 'eviction_possession', 'latitude',
'longitude', 'community_board', 'council_district', 'census_tract',
'bin', 'bbl', 'nta'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 70319 entries, 0 to 70318
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 court_index_number 70319 non-null object
1 docket_number 70319 non-null int64
2 eviction_address 70319 non-null object
3 eviction_apt_num 58577 non-null object
4 executed_date 70319 non-null object
5 marshal_first_name 70319 non-null object
6 marshal_last_name 70319 non-null object
7 residential_commercial_ind 70319 non-null object
8 borough 70319 non-null object
9 eviction_zip 70319 non-null int64
10 ejectment 70319 non-null object
11 eviction_possession 70319 non-null object
12 latitude 63559 non-null float64
13 longitude 63559 non-null float64
14 community_board 63559 non-null float64
15 council_district 63559 non-null float64
16 census_tract 63559 non-null float64
17 bin 63431 non-null float64
18 bbl 63431 non-null float64
19 nta 63559 non-null object
dtypes: float64(7), int64(2), object(11)
memory usage: 10.7+ MB
None
docket_number eviction_zip latitude longitude \
count 70319.000000 70319.000000 63559.000000 63559.000000
mean 152653.852543 10793.951350 40.751713 -73.908346
std 147183.125791 519.214106 0.091157 0.068434
min 1.000000 0.000000 40.499050 -74.251262
25% 60329.500000 10454.000000 40.672151 -73.945114
50% 92472.000000 10473.000000 40.752381 -73.910004
75% 288190.000000 11229.000000 40.836407 -73.871957
max 496987.000000 12221.000000 40.911588 -73.701430
community_board council_district census_tract bin \
count 63559.000000 63559.000000 63559.000000 6.343100e+04
mean 8.002722 23.823204 8423.911610 2.705469e+06
std 4.577468 13.759216 21538.302883 1.116481e+06
min 1.000000 1.000000 1.000000 1.000000e+06
25% 4.000000 13.000000 197.000000 2.010531e+06
50% 8.000000 19.000000 379.000000 3.000000e+06
75% 12.000000 36.000000 966.000000 3.337614e+06
max 18.000000 51.000000 157903.000000 5.171959e+06
bbl
count 6.343100e+04
mean 2.629100e+09
std 1.063481e+09
min 0.000000e+00
25% 2.028290e+09
50% 3.000368e+09
75% 3.072325e+09
max 5.080490e+09
(70319, 20)
court_index_number docket_number eviction_address eviction_apt_num \
0 52841/17 8271 300 EAST 64TH STREET 5-I
1 72193/17 13665 120 BEACH 26 STREET 1501
2 88688/16B 57195 1719 QUENTIN ROAD 5C
3 B043281/19 408391 50 EAST 172ND STREET 2W
4 68326/19 19697 94-06 34 ROAD B1
executed_date marshal_first_name marshal_last_name \
0 2017-10-03T00:00:00.000 George Essock, Jr.
1 2018-01-08T00:00:00.000 Edward Guida
2 2017-03-17T00:00:00.000 Justin Grossman
3 2020-01-08T00:00:00.000 Richard McCoy
4 2019-12-20T00:00:00.000 George Essock, Jr.
residential_commercial_ind borough eviction_zip ejectment \
0 Residential MANHATTAN 10065 Not an Ejectment
1 Residential QUEENS 11691 Not an Ejectment
2 Residential BROOKLYN 11229 Not an Ejectment
3 Residential BRONX 10452 Not an Ejectment
4 Residential QUEENS 11372 Not an Ejectment
eviction_possession latitude longitude community_board \
0 Possession 40.763578 -73.961713 8.0
1 Possession 40.593263 -73.758984 14.0
2 Possession 40.608505 -73.956033 15.0
3 Possession 40.841904 -73.915176 4.0
4 Possession NaN NaN NaN
council_district census_tract bin bbl \
0 5.0 110.0 1044388.0 1.014388e+09
1 31.0 99802.0 4595352.0 4.158178e+09
2 48.0 552.0 3182470.0 3.067800e+09
3 16.0 223.0 2008114.0 2.028440e+09
4 NaN NaN NaN NaN
nta
0 Lenox Hill-Roosevelt Island
1 Far Rockaway-Bayswater
2 Madison
3 West Concourse
4 NaN
NY Evictions Data - Unique Values:
court_index_number:
['52841/17' '72193/17' '88688/16B' ... 'B901861/17' '65897/17' '13689/16']
65399 unique values
docket_number:
[ 8271 13665 57195 ... 74810 8220 84968]
61328 unique values
eviction_address:
['300 EAST 64TH STREET' '120 BEACH 26 STREET' '1719 QUENTIN ROAD' ...
'402 OCEAN PARKWAY' '1270 FULTON ST COMMERICAL STORE' '888 PARK AVENUE']
42907 unique values
eviction_apt_num:
['5-I' '1501' '5C' ... '0C6' 'C56' 'SPACE #59']
6984 unique values
executed_date:
['2017-10-03T00:00:00.000' '2018-01-08T00:00:00.000'
'2017-03-17T00:00:00.000' ... '2021-12-13T00:00:00.000'
'2021-11-05T00:00:00.000' '2020-12-03T00:00:00.000']
1093 unique values
marshal_first_name:
['George' 'Edward' 'Justin' 'Richard' 'Ileana' 'Alfred' 'Henry' 'Maxine'
'Thomas' 'Darlene' 'Steven' 'Howard' 'Salavatore' 'Danny' 'Kenneth'
'Robert' 'Vadim' 'Gary' 'Charles' 'Bernard' 'Ronald' 'Frank' 'John'
'Bruce' 'Jeffrey' 'David']
26 unique values
marshal_last_name:
['Essock, Jr.' 'Guida' 'Grossman' 'McCoy' 'Rivera' 'Locascio' 'Daley'
'Capuano' 'Chevlowe' 'Bia' 'Barone' 'Essock' 'Powell' 'Schain' 'Giglio'
'Weinheim' 'Giachetta' 'Renzulli' 'Barbarovich' 'Rose' 'Marchisotto'
'Blake' 'Pazant' 'Siracusa' 'Villanueva' 'Kemp' 'Smith' 'Solimine'
'Pagnotta']
29 unique values
residential_commercial_ind:
['Residential' 'Commercial']
2 unique values
borough:
['MANHATTAN' 'QUEENS' 'BROOKLYN' 'BRONX' 'STATEN ISLAND']
5 unique values
eviction_zip:
[10065 11691 11229 10452 11372 10310 11213 11208 10031 11238 10457 11429
10467 10466 10453 11106 10039 11223 10030 10459 10461 11226 11236 10460
11692 11385 11373 11379 11212 11207 10454 11217 11355 11206 11203 10035
11225 10029 11420 11354 11221 11422 11220 11368 11377 11210 10462 11103
11205 10314 10128 10469 10303 11224 11211 11434 10032 10468 10002 10037
11204 10473 11233 11433 10304 10014 10475 11215 11237 11417 11358 10472
10026 10456 11209 11214 11367 11432 10306 10474 10465 10019 10301 10009
10455 11219 10458 11694 10034 11365 11412 10024 10470 11416 10305 11413
10003 10028 10001 10471 10463 10033 11418 11235 11361 11435 11357 11378
11375 10011 11104 10027 11356 10018 10451 11234 11374 10016 10013 11419
11421 11101 10017 10044 10040 11423 11230 10025 11369 11411 10302 11239
11232 10012 11370 10021 11218 10075 11360 11216 10022 10309 11249 11364
11415 10036 11222 11201 10038 11228 11414 11105 11366 11436 11102 10023
11693 11427 11426 10464 10307 10312 10005 11428 11040 10308 10006 10280
11363 11231 10010 11362 10069 10007 11698 0 10282 10004 11384 10152
11039 11424 11109 10170 10423 10167 11430 11266 11001 11004 10020 11242
10165 1000 11439 10177 10123 10162 11443 10101 10424 12221 10107 11514
10281 11619 11575 10168 10112 11335 10000 11359 11753 10335 10103 11306
11137 11272 10176 10118 11371 11017 11024 11202 11240 10155 10111]
227 unique values
ejectment:
['Not an Ejectment' 'Ejectment']
2 unique values
eviction_possession:
['Possession' 'Eviction' 'Unspecified']
3 unique values
latitude:
[40.763578 40.593263 40.608505 ... 40.629692 40.639224 40.698721]
30894 unique values
longitude:
[-73.961713 -73.758984 -73.956033 ... -73.758727 -73.98801 -74.152536]
30324 unique values
community_board:
[ 8. 14. 15. 4. nan 1. 5. 9. 2. 13. 12. 10. 11. 17. 18. 6. 16. 3.
7.]
18 unique values
council_district:
[ 5. 31. 48. 16. nan 49. 36. 42. 7. 35. 15. 27. 12. 14. 22. 9. 47. 17.
13. 40. 46. 18. 30. 25. 8. 20. 28. 38. 21. 26. 45. 11. 34. 41. 1. 44.
3. 39. 37. 32. 43. 24. 50. 4. 10. 6. 2. 29. 19. 33. 51. 23.]
51 unique values
census_tract:
[ 110. 99802. 552. ... 1129. 669. 1097.]
1293 unique values
bin:
[1044388. 4595352. 3182470. ... 1017796. 5027246. 3126163.]
31771 unique values
bbl:
[1.01438750e+09 4.15817750e+09 3.06780004e+09 ... 1.00871003e+09
5.01194011e+09 3.05374004e+09]
29752 unique values
nta:
['Lenox Hill-Roosevelt Island' 'Far Rockaway-Bayswater' 'Madison'
'West Concourse' nan 'West New Brighton-New Brighton-St. George'
'Crown Heights North' 'East New York' 'Manhattanville' 'Clinton Hill'
'Mount Hope' 'Queens Village' 'Williamsbridge-Olinville' 'Fordham South'
'Astoria' 'Central Harlem North-Polo Grounds' 'Bensonhurst East'
'Longwood' 'Pelham Parkway' 'Erasmus' 'Canarsie' 'West Farms-Bronx River'
'Hammels-Arverne-Edgemere' 'Belmont' 'Glendale' 'Elmhurst'
'Middle Village' 'Brownsville' 'Mott Haven-Port Morris'
'University Heights-Morris Heights' 'Claremont-Bathgate' 'Flushing'
'Bedford' 'Crotona Park East' 'East Flatbush-Farragut'
'East Harlem North' 'Crown Heights South' 'Highbridge' 'South Ozone Park'
'Stuyvesant Heights' 'Sunset Park East' 'East Tremont' 'Corona'
'Hunters Point-Sunnyside-West Maspeth' 'Flatlands'
'Prospect Lefferts Gardens-Wingate' 'Jackson Heights'
'Van Nest-Morris Park-Westchester Square' 'Norwood' 'Murray Hill'
'Fort Greene' "Mariner's Harbor-Arlington-Port Ivory-Graniteville"
'East Harlem South' 'Flatbush' 'Co-op City' 'Yorkville'
'Seagate-Coney Island' 'North Side-South Side' 'St. Albans'
'Hamilton Heights' 'Ocean Hill' 'Rugby-Remsen Village'
'Bedford Park-Fordham North' 'Chinatown'
'Soundview-Castle Hill-Clason Point-Harding Park' 'Kingsbridge Heights'
'Stapleton-Rosebank' 'West Village' 'Park Slope-Gowanus' 'Bushwick North'
'Ozone Park' 'Soundview-Bruckner' 'Central Harlem South'
'Morrisania-Melrose' 'Bay Ridge' 'Bensonhurst West'
'East Concourse-Concourse Village' 'Kew Gardens Hills' 'Jamaica'
'Lower East Side' 'New Dorp-Midland Beach' 'Hunts Point'
'Cypress Hills-City Line' 'Schuylerville-Throgs Neck-Edgewater Park'
'Midwood' 'Midtown-Midtown South'
'Sheepshead Bay-Gerritsen Beach-Manhattan Beach'
'Stuyvesant Town-Cooper Village' 'Melrose South-Mott Haven North'
'Borough Park' 'Westchester-Unionport'
'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel'
'Marble Hill-Inwood' 'Pomonok-Flushing Heights-Hillcrest'
'Upper West Side' 'Auburndale' 'Van Cortlandt Village'
'Woodlawn-Wakefield' 'Woodhaven' 'Grasmere-Arrochar-Ft. Wadsworth'
'Springfield Gardens South-Brookville' 'Washington Heights South'
'East Village' 'North Riverdale-Fieldston-Riverdale' 'South Jamaica'
'Spuyten Duyvil-Kingsbridge' 'Washington Heights North' 'Kew Gardens'
'Bayside-Bayside Hills' 'Pelham Bay-Country Club-City Island'
'East New York (Pennsylvania Ave)' 'Briarwood-Jamaica Hills' 'Rosedale'
'Eastchester-Edenwald-Baychester' 'Whitestone' 'East Williamsburg'
'Richmond Hill' 'Maspeth' 'Homecrest' 'Forest Hills' 'Bath Beach'
'West Brighton' 'Bushwick South' 'Clinton' 'Prospect Heights'
'Parkchester' 'Hudson Yards-Chelsea-Flatiron-Union Square'
'College Point' 'Rego Park' 'Murray Hill-Kips Bay'
'SoHo-TriBeCa-Civic Center-Little Italy'
'Queensbridge-Ravenswood-Long Island City' 'Turtle Bay-East Midtown'
'DUMBO-Vinegar Hill-Downtown Brooklyn-Boerum Hill'
'Todt Hill-Emerson Hill-Heartland Village-Lighthouse Hill' 'Hollis'
'Laurelton' 'Georgetown-Marine Park-Bergen Beach-Mill Basin'
'park-cemetery-etc-Brooklyn' 'Gravesend' 'Cambria Heights' 'Gramercy'
'Elmhurst-Maspeth' 'Port Richmond' 'Starrett City'
'Upper East Side-Carnegie Hill' 'Oakwood-Oakwood Beach'
'Sunset Park West' 'New Brighton-Silver Lake' 'Kensington-Ocean Parkway'
'Ft. Totten-Bay Terrace-Clearview' 'Grymes Hill-Clifton-Fox Hills'
'Bronxdale' 'Great Kills' 'Brighton Beach' 'Oakland Gardens'
'East Elmhurst' 'North Corona' 'Ridgewood' 'Greenpoint' 'Baisley Park'
'Springfield Gardens North' 'Fresh Meadows-Utopia' 'East Flushing'
'Brooklyn Heights-Cobble Hill' 'Allerton-Pelham Gardens'
'Battery Park City-Lower Manhattan' 'Morningside Heights'
'Lindenwood-Howard Beach' 'Queensboro Hill'
'New Springville-Bloomfield-Travis' 'Old Astoria' 'Dyker Heights'
'Lincoln Square' 'Williamsburg' 'Bellerose'
'Charleston-Richmond Valley-Tottenville' 'Woodside'
"Annadale-Huguenot-Prince's Bay-Eltingville" 'park-cemetery-etc-Bronx'
'Steinway' 'Glen Oaks-Floral Park-New Hyde Park' 'Westerleigh'
'Ocean Parkway South' 'Douglas Manor-Douglaston-Little Neck'
'Carroll Gardens-Columbia Street-Red Hook'
'Old Town-Dongan Hills-South Beach' 'Jamaica Estates-Holliswood'
'Rossville-Woodrow' 'Arden Heights' 'Windsor Terrace'
'park-cemetery-etc-Queens']
191 unique values
#Describe Income dataset
print(NY_Income_gross.columns)
print(NY_Income_gross.info())
print(NY_Income_gross.describe())
print(NY_Income_gross.shape)
print(NY_Income_gross.head())
print('\nNY Income Data - Unique Values:')
for x in NY_Income_gross.columns:
print(x+':')
print(NY_Income_gross[x].unique())
print(str(NY_Income_gross[x].nunique()) + str(' unique values'))
Index(['ID Year', 'Year', 'ID Race', 'Race', 'Household Income by Race',
'Household Income by Race Moe', 'Geography', 'ID Geography'],
dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16693 entries, 0 to 16692
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID Year 16693 non-null int64
1 Year 16693 non-null int64
2 ID Race 16693 non-null int64
3 Race 16693 non-null object
4 Household Income by Race 16693 non-null int64
5 Household Income by Race Moe 16669 non-null float64
6 Geography 16693 non-null object
7 ID Geography 16693 non-null object
dtypes: float64(1), int64(4), object(3)
memory usage: 1.0+ MB
None
ID Year Year ID Race Household Income by Race \
count 16693.000000 16693.000000 16693.0 16693.000000
mean 2016.467681 2016.467681 0.0 64091.405020
std 2.279213 2.279213 0.0 32113.944803
min 2013.000000 2013.000000 0.0 2499.000000
25% 2014.000000 2014.000000 0.0 41759.000000
50% 2016.000000 2016.000000 0.0 59383.000000
75% 2018.000000 2018.000000 0.0 79898.000000
max 2020.000000 2020.000000 0.0 250001.000000
Household Income by Race Moe
count 16669.000000
mean 14334.611854
std 10210.043203
min 917.000000
25% 8075.000000
50% 12107.000000
75% 17689.000000
max 321985.000000
(16693, 8)
ID Year Year ID Race Race Household Income by Race \
0 2020 2020 0 Total 68000
1 2020 2020 0 Total 93155
2 2020 2020 0 Total 34766
3 2020 2020 0 Total 24474
4 2020 2020 0 Total 27196
Household Income by Race Moe Geography \
0 21984.0 Census Tract 2, Bronx County, NY
1 12263.0 Census Tract 4, Bronx County, NY
2 6118.0 Census Tract 16, Bronx County, NY
3 7684.0 Census Tract 23, Bronx County, NY
4 8483.0 Census Tract 25, Bronx County, NY
ID Geography
0 14000US36005000200
1 14000US36005000400
2 14000US36005001600
3 14000US36005002300
4 14000US36005002500
NY Income Data - Unique Values:
ID Year:
[2020 2019 2018 2017 2016 2015 2014 2013]
8 unique values
Year:
[2020 2019 2018 2017 2016 2015 2014 2013]
8 unique values
ID Race:
[0]
1 unique values
Race:
['Total']
1 unique values
Household Income by Race:
[68000 93155 34766 ... 27386 35375 43682]
13100 unique values
Household Income by Race Moe:
[21984. 12263. 6118. ... 16336. 9502. 14271.]
12130 unique values
Geography:
['Census Tract 2, Bronx County, NY' 'Census Tract 4, Bronx County, NY'
'Census Tract 16, Bronx County, NY' ...
'Census Tract 110, Bronx County, NY'
'Census Tract 641.02, Queens County, NY'
'Census Tract 228, Richmond County, NY']
2112 unique values
ID Geography:
['14000US36005000200' '14000US36005000400' '14000US36005001600' ...
'14000US36005011000' '14000US36081064102' '14000US36085022800']
2112 unique values
#Describe Population dataset
print(NY_Pop_gross.columns)
print(NY_Pop_gross.info())
print(NY_Pop_gross.describe())
print(NY_Pop_gross.shape)
print(NY_Pop_gross.head())
print('\nNY Population Data - Unique Values:')
for x in NY_Pop_gross.columns:
print(x+':')
print(NY_Pop_gross[x].unique())
print(str(NY_Pop_gross[x].nunique()) + str(' unique values'))
Index(['fips_code', 'geography', 'year', 'program_type', 'population'], dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3654 entries, 0 to 3653
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 fips_code 3654 non-null int64
1 geography 3654 non-null object
2 year 3654 non-null int64
3 program_type 3654 non-null object
4 population 3654 non-null int64
dtypes: int64(3), object(2)
memory usage: 142.9+ KB
None
fips_code year population
count 3654.000000 3654.000000 3.654000e+03
mean 36061.015873 1995.448276 5.922912e+05
std 36.346603 15.238354 2.351561e+06
min 36000.000000 1970.000000 4.438000e+03
25% 36029.000000 1982.000000 5.150125e+04
50% 36061.000000 1995.500000 8.831400e+04
75% 36093.000000 2009.000000 2.523632e+05
max 36123.000000 2021.000000 2.020125e+07
(3654, 5)
fips_code geography year program_type \
0 36000 New York State 2021 Postcensal Population Estimate
1 36001 Albany County 2021 Postcensal Population Estimate
2 36003 Allegany County 2021 Postcensal Population Estimate
3 36005 Bronx County 2021 Postcensal Population Estimate
4 36007 Broome County 2021 Postcensal Population Estimate
population
0 19835913
1 313743
2 46106
3 1424948
4 197240
NY Population Data - Unique Values:
fips_code:
[36000 36001 36003 36005 36007 36009 36011 36013 36015 36017 36019 36021
36023 36025 36027 36029 36031 36033 36035 36037 36039 36041 36043 36045
36047 36049 36051 36053 36055 36057 36059 36061 36063 36065 36067 36069
36071 36073 36075 36077 36079 36081 36083 36085 36087 36089 36091 36093
36095 36097 36099 36101 36103 36105 36107 36109 36111 36113 36115 36117
36119 36121 36123]
63 unique values
geography:
['New York State' 'Albany County' 'Allegany County' 'Bronx County'
'Broome County' 'Cattaraugus County' 'Cayuga County' 'Chautauqua County'
'Chemung County' 'Chenango County' 'Clinton County' 'Columbia County'
'Cortland County' 'Delaware County' 'Dutchess County' 'Erie County'
'Essex County' 'Franklin County' 'Fulton County' 'Genesee County'
'Greene County' 'Hamilton County' 'Herkimer County' 'Jefferson County'
'Kings County' 'Lewis County' 'Livingston County' 'Madison County'
'Monroe County' 'Montgomery County' 'Nassau County' 'New York County'
'Niagara County' 'Oneida County' 'Onondaga County' 'Ontario County'
'Orange County' 'Orleans County' 'Oswego County' 'Otsego County'
'Putnam County' 'Queens County' 'Rensselaer County' 'Richmond County'
'Rockland County' 'St. Lawrence County' 'Saratoga County'
'Schenectady County' 'Schoharie County' 'Schuyler County' 'Seneca County'
'Steuben County' 'Suffolk County' 'Sullivan County' 'Tioga County'
'Tompkins County' 'Ulster County' 'Warren County' 'Washington County'
'Wayne County' 'Westchester County' 'Wyoming County' 'Yates County']
63 unique values
year:
[2021 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008
2007 2006 2005 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994
1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980
1979 1978 1977 1976 1975 1974 1973 1972 1971 1970]
52 unique values
program_type:
['Postcensal Population Estimate' 'Census Base Population'
'Intercensal Population Estimate']
3 unique values
population:
[19835913 313743 46106 ... 894227 38000 19978]
3606 unique values
##########################################
#Step 3: Clean + Manipulate data
##########################################
##Cleaning NY Evictions Data
#Adding year info
NY_Evictions_gross['executed_date']=pd.to_datetime(NY_Evictions_gross['executed_date'])
NY_Evictions_gross['year']=pd.DatetimeIndex(NY_Evictions_gross['executed_date']).year
#sanity check
print(NY_Evictions_gross.head())
#grouping evictions by year and borough
NY_Evictions = NY_Evictions_gross.groupby(['year','borough'], sort = [True,True])['eviction_zip'].count()
#resetting index on pivot table
NY_Evictions = NY_Evictions.reset_index()
#renaming column to sum_evictions
NY_Evictions.rename(columns = {'eviction_zip':'sum_evictions'}, inplace=True)
#print final dataset
print(NY_Evictions)
court_index_number docket_number eviction_address eviction_apt_num \
0 52841/17 8271 300 EAST 64TH STREET 5-I
1 72193/17 13665 120 BEACH 26 STREET 1501
2 88688/16B 57195 1719 QUENTIN ROAD 5C
3 B043281/19 408391 50 EAST 172ND STREET 2W
4 68326/19 19697 94-06 34 ROAD B1
executed_date marshal_first_name marshal_last_name \
0 2017-10-03 George Essock, Jr.
1 2018-01-08 Edward Guida
2 2017-03-17 Justin Grossman
3 2020-01-08 Richard McCoy
4 2019-12-20 George Essock, Jr.
residential_commercial_ind borough eviction_zip ... \
0 Residential MANHATTAN 10065 ...
1 Residential QUEENS 11691 ...
2 Residential BROOKLYN 11229 ...
3 Residential BRONX 10452 ...
4 Residential QUEENS 11372 ...
eviction_possession latitude longitude community_board \
0 Possession 40.763578 -73.961713 8.0
1 Possession 40.593263 -73.758984 14.0
2 Possession 40.608505 -73.956033 15.0
3 Possession 40.841904 -73.915176 4.0
4 Possession NaN NaN NaN
council_district census_tract bin bbl \
0 5.0 110.0 1044388.0 1.014388e+09
1 31.0 99802.0 4595352.0 4.158178e+09
2 48.0 552.0 3182470.0 3.067800e+09
3 16.0 223.0 2008114.0 2.028440e+09
4 NaN NaN NaN NaN
nta year
0 Lenox Hill-Roosevelt Island 2017
1 Far Rockaway-Bayswater 2018
2 Madison 2017
3 West Concourse 2020
4 NaN 2019
[5 rows x 21 columns]
year borough sum_evictions
0 2017 BRONX 7658
1 2017 BROOKLYN 6355
2 2017 MANHATTAN 3450
3 2017 QUEENS 4325
4 2017 STATEN ISLAND 734
5 2018 BRONX 7140
6 2018 BROOKLYN 6157
7 2018 MANHATTAN 3390
8 2018 QUEENS 4452
9 2018 STATEN ISLAND 691
10 2019 BRONX 6244
11 2019 BROOKLYN 5312
12 2019 MANHATTAN 2818
13 2019 QUEENS 3705
14 2019 STATEN ISLAND 636
15 2020 BRONX 1088
16 2020 BROOKLYN 1005
17 2020 MANHATTAN 521
18 2020 QUEENS 696
19 2020 STATEN ISLAND 112
20 2021 BRONX 29
21 2021 BROOKLYN 100
22 2021 MANHATTAN 68
23 2021 QUEENS 36
24 2021 STATEN ISLAND 35
25 2022 BRONX 748
26 2022 BROOKLYN 1358
27 2022 MANHATTAN 606
28 2022 QUEENS 657
29 2022 STATEN ISLAND 193
##Cleaning NY Income data
#if statement to avoid duplicates on reruns
if 'Borough' not in NY_Income_gross.columns:
#splitting geography column by ',' delimiter to get county column
NY_Income_geodata = NY_Income_gross.Geography.str.split(',',expand=True)
#merging new columns to income table by index
NY_Income_gross = NY_Income_gross.merge(NY_Income_geodata,
left_index=True,
right_index=True,
how='inner')
#Renaming columns
NY_Income_gross.rename(columns = {'Household Income by Race':'Household Income',
'Household Income by Race Moe':'Household Income MOE',
0:'Census Tract',1:'Borough',2:'State'},
inplace=True)
#Adjusting borough data to match evictions borough data format
NY_Income_gross['Borough']=NY_Income_gross['Borough'].replace([' Bronx County',
' Kings County',
' New York County',
' Queens County',
' Richmond County'],
['BRONX','BROOKLYN','MANHATTAN','QUEENS','STATEN ISLAND'])
#sanity check
print(NY_Income_gross.head())
#filtering relevant columns for analysis
NY_Income_gross = NY_Income_gross[['Borough','Year','Household Income','Household Income MOE']]
#getting mean household income and margin of error by year and borough
NY_Income = NY_Income_gross.groupby(['Year','Borough'],
sort = [True,True])['Household Income','Household Income MOE'].mean()
#rounding values to 2 decimal places
NY_Income = NY_Income.round(decimals=2)
#resetting index
NY_Income = NY_Income.reset_index()
#print final dataset
print(NY_Income)
ID Year Year ID Race Race Household Income Household Income MOE \
0 2020 2020 0 Total 68000 21984.0
1 2020 2020 0 Total 93155 12263.0
2 2020 2020 0 Total 34766 6118.0
3 2020 2020 0 Total 24474 7684.0
4 2020 2020 0 Total 27196 8483.0
Geography ID Geography Census Tract \
0 Census Tract 2, Bronx County, NY 14000US36005000200 Census Tract 2
1 Census Tract 4, Bronx County, NY 14000US36005000400 Census Tract 4
2 Census Tract 16, Bronx County, NY 14000US36005001600 Census Tract 16
3 Census Tract 23, Bronx County, NY 14000US36005002300 Census Tract 23
4 Census Tract 25, Bronx County, NY 14000US36005002500 Census Tract 25
Borough State
0 BRONX NY
1 BRONX NY
2 BRONX NY
3 BRONX NY
4 BRONX NY
Year Borough Household Income Household Income MOE
0 2013 BRONX 39525.86 10160.88
1 2013 BROOKLYN 52152.55 12350.03
2 2013 MANHATTAN 79878.71 18840.48
3 2013 QUEENS 61378.72 14376.66
4 2013 STATEN ISLAND 72802.25 13725.81
5 2014 BRONX 39249.36 10488.69
6 2014 BROOKLYN 53130.10 12067.54
7 2014 MANHATTAN 82404.63 19210.25
8 2014 QUEENS 61879.28 13986.73
9 2014 STATEN ISLAND 74558.15 14167.82
10 2015 BRONX 39311.20 8547.60
11 2015 BROOKLYN 54264.91 11949.44
12 2015 MANHATTAN 84555.19 18248.35
13 2015 QUEENS 62336.88 13189.87
14 2015 STATEN ISLAND 72021.23 13995.67
15 2016 BRONX 40375.79 8773.16
16 2016 BROOKLYN 56790.46 12290.36
17 2016 MANHATTAN 86799.51 19310.58
18 2016 QUEENS 64146.02 13204.83
19 2016 STATEN ISLAND 73412.28 14549.54
20 2017 BRONX 41688.32 9209.21
21 2017 BROOKLYN 59774.86 12981.86
22 2017 MANHATTAN 91715.70 19810.96
23 2017 QUEENS 66931.36 13756.62
24 2017 STATEN ISLAND 75298.24 14432.01
25 2018 BRONX 43354.89 9934.15
26 2018 BROOKLYN 63340.91 13751.50
27 2018 MANHATTAN 94790.56 20508.93
28 2018 QUEENS 70348.20 14291.14
29 2018 STATEN ISLAND 78057.55 16994.09
30 2019 BRONX 45628.97 10545.49
31 2019 BROOKLYN 67743.39 14953.65
32 2019 MANHATTAN 99467.44 21443.96
33 2019 QUEENS 73779.56 14984.65
34 2019 STATEN ISLAND 82968.25 17775.42
35 2020 BRONX 47602.21 13496.98
36 2020 BROOKLYN 70885.27 18509.92
37 2020 MANHATTAN 102472.50 26732.09
38 2020 QUEENS 77254.35 18586.95
39 2020 STATEN ISLAND 85251.05 21719.59
C:\Users\jlenehan\AppData\Local\Temp\ipykernel_5156\169235047.py:36: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead. NY_Income = NY_Income_gross.groupby(['Year','Borough'],
##Cleaning NY Pop data
#filtering data to only see census data from 2020, and ignore population estimates
NY_Pop_gross = NY_Pop_gross[(NY_Pop_gross['program_type']=='Census Base Population') & (NY_Pop_gross['year']==2020)]
if 'geography' in NY_Pop_gross.columns:
#filtering to 5 boroughs (counties) of New York city
NY_Pop_gross = NY_Pop_gross.loc[NY_Pop_gross['geography'].isin(['Bronx County',
'Kings County',
'New York County',
'Queens County',
'Richmond County'])]
#Adjusting borough data to match evictions borough data format
NY_Pop_gross['geography']=NY_Pop_gross['geography'].replace(['Bronx County',
'Kings County',
'New York County',
'Queens County',
'Richmond County'],
['BRONX','BROOKLYN','MANHATTAN','QUEENS','STATEN ISLAND'])
#renaming geography column to borough
NY_Pop_gross.rename(columns = {'geography':'borough'}, inplace=True)
#sanity check
print(NY_Pop_gross)
NY_Pop = NY_Pop_gross[['borough','year','population']]
NY_Pop.set_index('borough',inplace=True)
print(NY_Pop)
fips_code borough year program_type population
129 36005 BRONX 2020 Census Base Population 1472654
150 36047 BROOKLYN 2020 Census Base Population 2736074
157 36061 MANHATTAN 2020 Census Base Population 1694251
167 36081 QUEENS 2020 Census Base Population 2405464
169 36085 STATEN ISLAND 2020 Census Base Population 495747
year population
borough
BRONX 2020 1472654
BROOKLYN 2020 2736074
MANHATTAN 2020 1694251
QUEENS 2020 2405464
STATEN ISLAND 2020 495747
##########################################
#Step 4: Joining data
##########################################
##Joining income data to population data - left join
##2020 census populations will be taken as the
NY_Income_Pop = NY_Income.merge(NY_Pop['population'], how = 'left',
left_on = 'Borough', right_on = 'borough')
print(NY_Income_Pop)
Year Borough Household Income Household Income MOE population 0 2013 BRONX 39525.86 10160.88 1472654 1 2013 BROOKLYN 52152.55 12350.03 2736074 2 2013 MANHATTAN 79878.71 18840.48 1694251 3 2013 QUEENS 61378.72 14376.66 2405464 4 2013 STATEN ISLAND 72802.25 13725.81 495747 5 2014 BRONX 39249.36 10488.69 1472654 6 2014 BROOKLYN 53130.10 12067.54 2736074 7 2014 MANHATTAN 82404.63 19210.25 1694251 8 2014 QUEENS 61879.28 13986.73 2405464 9 2014 STATEN ISLAND 74558.15 14167.82 495747 10 2015 BRONX 39311.20 8547.60 1472654 11 2015 BROOKLYN 54264.91 11949.44 2736074 12 2015 MANHATTAN 84555.19 18248.35 1694251 13 2015 QUEENS 62336.88 13189.87 2405464 14 2015 STATEN ISLAND 72021.23 13995.67 495747 15 2016 BRONX 40375.79 8773.16 1472654 16 2016 BROOKLYN 56790.46 12290.36 2736074 17 2016 MANHATTAN 86799.51 19310.58 1694251 18 2016 QUEENS 64146.02 13204.83 2405464 19 2016 STATEN ISLAND 73412.28 14549.54 495747 20 2017 BRONX 41688.32 9209.21 1472654 21 2017 BROOKLYN 59774.86 12981.86 2736074 22 2017 MANHATTAN 91715.70 19810.96 1694251 23 2017 QUEENS 66931.36 13756.62 2405464 24 2017 STATEN ISLAND 75298.24 14432.01 495747 25 2018 BRONX 43354.89 9934.15 1472654 26 2018 BROOKLYN 63340.91 13751.50 2736074 27 2018 MANHATTAN 94790.56 20508.93 1694251 28 2018 QUEENS 70348.20 14291.14 2405464 29 2018 STATEN ISLAND 78057.55 16994.09 495747 30 2019 BRONX 45628.97 10545.49 1472654 31 2019 BROOKLYN 67743.39 14953.65 2736074 32 2019 MANHATTAN 99467.44 21443.96 1694251 33 2019 QUEENS 73779.56 14984.65 2405464 34 2019 STATEN ISLAND 82968.25 17775.42 495747 35 2020 BRONX 47602.21 13496.98 1472654 36 2020 BROOKLYN 70885.27 18509.92 2736074 37 2020 MANHATTAN 102472.50 26732.09 1694251 38 2020 QUEENS 77254.35 18586.95 2405464 39 2020 STATEN ISLAND 85251.05 21719.59 495747
##Joining datasets (NY_Merged dataframe)
##Joining eviction data to income-population data - left join
##2020 borough census populations will be taken as the same for all years
NY_Merged = NY_Evictions.merge(NY_Income_Pop, how = 'left',
left_on = ['year','borough'],
right_on = ['Year','Borough'])
#dropping missing values
NY_Merged.dropna(inplace=True)
#filtering out redundant columns
NY_Merged = NY_Merged[['year','borough','sum_evictions','Household Income','Household Income MOE','population']]
#Renaming columns
NY_Merged.rename(columns = {'year':'Year',
'borough':'Borough',
'sum_evictions':'Sum Evictions',
'population':'Population'},
inplace=True)
#Changing Year column to string
NY_Merged['Year']=NY_Merged['Year'].apply(str)
#printing final dataset
print(NY_Merged)
Year Borough Sum Evictions Household Income \
0 2017 BRONX 7658 41688.32
1 2017 BROOKLYN 6355 59774.86
2 2017 MANHATTAN 3450 91715.70
3 2017 QUEENS 4325 66931.36
4 2017 STATEN ISLAND 734 75298.24
5 2018 BRONX 7140 43354.89
6 2018 BROOKLYN 6157 63340.91
7 2018 MANHATTAN 3390 94790.56
8 2018 QUEENS 4452 70348.20
9 2018 STATEN ISLAND 691 78057.55
10 2019 BRONX 6244 45628.97
11 2019 BROOKLYN 5312 67743.39
12 2019 MANHATTAN 2818 99467.44
13 2019 QUEENS 3705 73779.56
14 2019 STATEN ISLAND 636 82968.25
15 2020 BRONX 1088 47602.21
16 2020 BROOKLYN 1005 70885.27
17 2020 MANHATTAN 521 102472.50
18 2020 QUEENS 696 77254.35
19 2020 STATEN ISLAND 112 85251.05
Household Income MOE Population
0 9209.21 1472654.0
1 12981.86 2736074.0
2 19810.96 1694251.0
3 13756.62 2405464.0
4 14432.01 495747.0
5 9934.15 1472654.0
6 13751.50 2736074.0
7 20508.93 1694251.0
8 14291.14 2405464.0
9 16994.09 495747.0
10 10545.49 1472654.0
11 14953.65 2736074.0
12 21443.96 1694251.0
13 14984.65 2405464.0
14 17775.42 495747.0
15 13496.98 1472654.0
16 18509.92 2736074.0
17 26732.09 1694251.0
18 18586.95 2405464.0
19 21719.59 495747.0
##Adding computed columns to merged dataset
#Evictions per 1,000 people
NY_Merged['Evictions per 1,000'] = 1000*(NY_Merged['Sum Evictions']/NY_Merged['Population'])
#Upper margin of Household Income
NY_Merged['Household Income - Upper Margin'] = NY_Merged['Household Income']+NY_Merged['Household Income MOE']
#Lower margin of Household Income
NY_Merged['Household Income - Lower Margin'] = NY_Merged['Household Income']-NY_Merged['Household Income MOE']
#sanity check
print(NY_Merged.head())
Year Borough Sum Evictions Household Income Household Income MOE \ 0 2017 BRONX 7658 41688.32 9209.21 1 2017 BROOKLYN 6355 59774.86 12981.86 2 2017 MANHATTAN 3450 91715.70 19810.96 3 2017 QUEENS 4325 66931.36 13756.62 4 2017 STATEN ISLAND 734 75298.24 14432.01 Population Evictions per 1,000 Household Income - Upper Margin \ 0 1472654.0 5.200135 50897.53 1 2736074.0 2.322671 72756.72 2 1694251.0 2.036298 111526.66 3 2405464.0 1.797990 80687.98 4 495747.0 1.480594 89730.25 Household Income - Lower Margin 0 32479.11 1 46793.00 2 71904.74 3 53174.74 4 60866.23
#show correlations in dataset
NY_Merged.corr()
| Sum Evictions | Household Income | Household Income MOE | Population | Evictions per 1,000 | Household Income - Upper Margin | Household Income - Lower Margin | |
|---|---|---|---|---|---|---|---|
| Sum Evictions | 1.000000 | -0.620283 | -0.708677 | 0.441593 | 0.864418 | -0.645123 | -0.574023 |
| Household Income | -0.620283 | 1.000000 | 0.926916 | -0.155193 | -0.648283 | 0.997217 | 0.993048 |
| Household Income MOE | -0.708677 | 0.926916 | 1.000000 | -0.142458 | -0.733238 | 0.952313 | 0.876298 |
| Population | 0.441593 | -0.155193 | -0.142458 | 1.000000 | -0.001535 | -0.154484 | -0.154551 |
| Evictions per 1,000 | 0.864418 | -0.648283 | -0.733238 | -0.001535 | 1.000000 | -0.672769 | -0.602266 |
| Household Income - Upper Margin | -0.645123 | 0.997217 | 0.952313 | -0.154484 | -0.672769 | 1.000000 | 0.981508 |
| Household Income - Lower Margin | -0.574023 | 0.993048 | 0.876298 | -0.154551 | -0.602266 | 0.981508 | 1.000000 |
##########################################
#Step 5: Plot data
##########################################
#Get evictions by borough, year
sns.lineplot(x=NY_Merged['Year'],y=NY_Merged['Sum Evictions'],hue=NY_Merged['Borough'])
plt.show()
#Show evictions per 1000 by borough, year
sns.lineplot(x=NY_Merged['Year'],y=NY_Merged['Evictions per 1,000'],hue=NY_Merged['Borough'])
plt.show()
#Show mean household income by borough, year
sns.relplot(x=NY_Merged['Year'],y=NY_Merged['Household Income'], kind='line',hue=NY_Merged['Borough'])
plt.show()
#Show evictions per 1,000 against household income
sns.relplot(x='Evictions per 1,000',
y='Household Income',
hue='Borough',
data=NY_Merged)
plt.show()
#define function for getting scatter plot of 2 variables and printing correlation coefficient
def plot_correlation(df,var1,var2,group=None):
sns.lmplot(x=var1,y=var2,hue=group,data=df,ci=0)
plt.show()
print(df[[var1,var2]].corr())
return
#get scatter of evictions per 1,000 vs mean household income
plot_correlation(NY_Merged,'Evictions per 1,000','Household Income')
#get scatter of evictions per 1,000 vs mean household income MOE
plot_correlation(NY_Merged,'Evictions per 1,000','Household Income MOE')
Evictions per 1,000 Household Income Evictions per 1,000 1.000000 -0.648283 Household Income -0.648283 1.000000
Evictions per 1,000 Household Income MOE Evictions per 1,000 1.000000 -0.733238 Household Income MOE -0.733238 1.000000
Insights¶
▪ The scatter plot shows the Bronx has the highest eviction trends in the city, followed by Brooklyn and Queens. Adjusting for population the discrepancy is more stark – in 2017 the Bronx had 5.2 evictions per 1,000 people, compared to 2.3 and 2.04 for Brooklyn and Queens respectively. This indicates that the Bronx is most heavily affected by evictions for the city overall, and therefore this is where the most effort should be put by a hypothetical non-profit.
▪ Further to this, the Bronx has more evictions per capita than the next 2 boroughs combined, for 2017 through 2019. Note that the dropoff in evictions for 2020 can be attributed to the Tenant Safe Harbor Act, passed during the pandemic to protect tenants from eviction. From this, it’s reasonable to assume that twice as much funding and resources should be spent in the Bronx compared to Brooklyn or Queens.
▪ The line plot of household income by year shows that the Bronx is the least affluent area of New York city; while the mean household income has improved over this period, it’s still much lower than the other boroughs. This would suggest the key issue is low income leading to high eviction rate.
▪ The scatter plot of household income against evictions per 1,000 shows the Bronx has the lowest mean household income along with the highest evictions per capita across the city. This further supports the argument that the eviction problem is income driven.
▪ A correlation analysis of household income and household income MOE against evictions per 1,000 also supports this argument; both are negatively correlated, with a -0.648 and -0.733 correlation respectively. This indicates that income support is needed to tackle eviction rates across the city.