Neighborhood Characteristics and Influenza Activities

Page 1

NEW YORK CITY

NEIGHBORHOOD CHARACTERISTICS AND INFLUENZA ACTIVITIES 2016 - 2019

Instructor: Boyeong Hong Exploring Urban Data with Machine Learning

XIYU

CHEN

xc2521@columbia.edu

HANZHANG YANG

hy2510@columbia.edu

TING

ZHANG

tz2436@columbia.edu



C

O

N

T

E

N

T

01 Introduction 02 Research Question 03 Literature Review 04 Dataset 05 Methodological Approach 06 Analysis 07 Conclusion 08 Limitation 09 Bibliography

1


INTRODUCTION

23,000 - 61,000 deaths (CDC, 2020)

$87 billion (Molinari et al, 2007)

The recent Coronavirus disease (COVID-19) outbreak in China, Italy, United States, and other countries, urged the researchers to think about how machine learning can equip the academic, government, developer, or even proletariat with a better understanding of the urban data, to prepare and mitigate the city and its people from pandemic hazards? Due to the lack of detailed COVID-19 case data, and its high contagiousness nature (Sanche et al., 2020), to better examine the relationship between pandemic diseases and urban data, we choose influenza illness as our research object. Each year, influenza illnesses in the U.S. lead to between 23,000 and 61,000 deaths (CDC, 2020) and cost

2

an estimated $87 billion (Molinari et al, 2007). Currently, the NYC Department of Health and Mental Hygiene’s Syndromic Surveillance program carried monthly Influenza-like Illness (ILI) data to the public. However, the dataset only covers relatively coarse-grained spatial units — ZIP Code equivalent areas and thus only telling a blurry story of influenza activities in New York City. By building machine learning cluster and regression models, the research examined the relation between neighborhood characteristics and the Influenza illness in New York City, from 2016 to 2019, on the ZIP Code scale, and predict census tract scale Influenza-like Illness rate by using nonlinear regression models.


2016

2017

2018

2019

Influenza-Like Illness (ILI) Overall Emergency Department Visit Rate Per 100 People by ZIP Code 2016-2019 0.5

1.0

1.5

2.0

2.5

3.0

3.5

3


RESEARCH

QUESTION

What is the relationship between Influenza illness and neighborhood characteristics? What are the principal neighborhood characteristics related to the Influenza-like illness rate? How to predict fine-grained influenza illness activities based on existing coarsegrained records?

Photo by Mark Lennihan - AP

4


LITERATURE

Despite the importance of governments and public health agencies to mitigate hazards created by contagious diseases, little attention has been paid to the datadriven study of urban pandemic diseases. Previous research focuses on influenza activity prediction generally employed epidemiological models (Chretien et al., 2014), or forecasting influenza activities’ peak time, peak height, and magnitude during an outbreak (Nsoesie et al., 2014). Previous literature on influenza activities which employed machine learning models generally focuses on monitoring influenza activities on social media (Aramaki et al., 2011; Pineda et al., 2015; Allen et al., 2016). However, there are little writings about how to improve the spatial resolution of influenza activities estimation. Data are usually aggregated into larger geographical units due to privacy, confidentiality, and administrative concerns. The machine learning regression model could help to improve the spatial resolution of urban data. In other fields of urban science, Kontokosta et al. (2018) combined machine learning and small area estimation to predict the

REVIEW

building-level waste generation from less granular sample data. In a paper by Yang et al. (2016), the authors compared network models at different spatial scales to forecast the influenza outbreak in New York City, it also suggested that Influenza-like illness (ILI) data available from the NYC Department of Health and Mental Hygiene is a primary source to measure the influenza activities. For neighborhood characteristics, recent studies draw attention to how lower neighborhood socioeconomic status will adversely impact resident physical functioning and individual health (Feldman & Steptoe, 2004). In the mode of transportation to work’s relation to individual health, Muller et al. (2015) concluded that active transportation(walking, cycling) provides substantial health benefits. In the report regarding the influence of the 2007 pandemic flu in New York City, retrospective analysis shows that the minority ethnic groups are not well informed of the ongoing flu trend and relative responding method (Fuller et al, 2007).

5


D A T A S E T

We use the Influenza-like illness (ILI) rate subtracted from NYC Department of Health and Mental Hygiene. The available data time scale is 2016 - 2019 with ZIP Code equivalent spatial scale. In order to find the relationship between ILI rate and neighborhood characteristics as precise as possible, we list a number of features that might have an effect on the ILI rate. The features belong to ten categories below: Health Insurance, Means Of Transportation, Travel Time, Age, Race, Education, Income, Health Facility Accessibility, Urban Form, and 311 Health Service Request. Thirtyfive features are selected under these

6

categories, as shown in the Variable table below. The neighborhood characteristic dataset we use includes: 311 Service Request Data, MapPLUTO, 2013-2017 ACS 5-year Estimates, New York City Locations Providing Seasonal Flu Vaccinations and Health Facility Map. The source, temporal coverage and spatial granularity are shown below . The auxiliary dataset we use includes but is not limited to the LION file Single Line Street Map and 2010 Census Tracts from Department of City Planning.


Dataset Using in This Research Category

Variable

ILI Rate

ili_p100

Health Insurance

%HealthInsurance

Dataset

Source

Influenza-like NYC Department illness Syndromic of Health and Surveillance Data Mental Hygiene

Temporal Coverage

Spatial Granularity

2016 - 2019

ZIP Code equivalent

%PublicInsurance %Drive Alone %Carpool

Means Of Transportation

%Public Transportation %Taxicab %Walk %WorkAtHome %lessthan30

Travel Time

%30_60 %morethan60 %pop_age_under5

Age

%pop_age_5_18 %pop_age_18_65 %pop_age_65over

2013-2017 ACS 5-year Estimates

United States Census Bureau

2013 - 2017

ZIP Code Tabulation Areas; Census Tract

Health Facility Map

New York State Department of Health

Updated Weekly

Address and Coordinates

MapPLUTO

NYC Department of City Planning

Updated Quarterly

BBL or Building Address

311 Service Request Data

NYC311

2010 - present

BBL or Coordinates

%Total_White Race

%Total_Black %Total_Other %Households_with_children

Household

Average_household_size %households_public_assistance %housing units_Owner_occupied

Education

%Population_less_than_college %Unemployed_Population %households_income_less_25K

Income

%households_income_25-75K %households_income_75-150K %households_income_over150K

Health Facility Accessibility

Facility_access TreeDens %GreenCover

Urban Form

%Walkup %Com %Res

311 Health Service Request

311_p

7


METHODOLOGICAL

APPROACH

First, we subtract the datasets from different sources, operate data cleaning and aggregation. As the influenza-like illness (ILI) Syndromic Surveillance Data is based on 134 adjoining ZIP Code equivalent areas, we first try to aggregate all the other dataset onto ZIP Code data. The R-squared is not prominent and the model result is not predictable because the number of observations is limited. Due to that, we decided to derive all data, including ILI rate data into Census Tract scale, where the number of census tracts in NYC is 2168, much higher than the ZIP Code areas. To do so, we conduct GIS proportional split and spatial join on the building-scale built environment data and census-tract scale socio-economic data in ArcGIS. Then, we use Pandas DataFrame to combine the data. Besides, we conduct basic data transformation to ensure our accuracy. For example, we divide the feature by the population with each census tract to make sure the variable will not be affected by the population of the census tract. Furthermore, in order to capture the spatial relationship of the neighborhood to the others, we also use Network Analysis (LION dataset) to

8

calculate the average distance from the centroid of the census tract to the 3 nearest health facilities access. And then we get the reciprocal for the result to represent health facility accessibility. For preliminary exploration, we do ILI rate geographic visualization and correlation analysis to find the basic pattern of our variables. After that, we do clustering analysis to see how these neighborhoods are alike throughout NYC. We use K-Means Clustering, Agglomerative Clustering, Gaussian Mixture Model (GMM) and DBSCAN clustering. Then, we set the ILI rate set as the target variable, and different kinds of neighborhood characteristics data as the explanatory variables, to build the regression model. We try 5 regression models — ordinary least squares (OLS) regression, Ridge regression, Lasso regression, decision tree, and random forests — to predict the relationship between the neighborhood characteristics and the ILI rate. Afterward, we will use test data to validate the feasibility of the models to choose the best models that fit with our dataset and are suitable for future census tract scale ILI rate prediction (figure 02).


Methodology Diagram Data Cleaning

Drop NA

Data Preparation Transfer Data into Census Tract scale • • •

GIS proportional split Spatial join Pandas DataFrame group ZIP Code level data

Accessibility: Network analysis

Proportion calculation •

Divide feature by the population with each census tract

Use Network Analysis (LION dataset) to calculate the average distance from the centroid of the census tract to the 3 nearest health facilities access. Get the reciprocal to represent health facility accessibility

Build and test different clustering models to find the similarity among census tract neighborhood

Build and test different regression models to find the relationship between various neighborhood characteristics with ILI rate. Identify key features and forecast the future

Preliminary Exploration ILI Rate geographic visualization •

Get the basic information of Influenza like illness distribution in NYC

Correlation Analysis •

Show the correlation of features

Clustering Analysis K-Means Clustering

Gaussian Mixture Model (GMM)

Agglomerative Clustering

DBSCAN Clustering

Regression Analysis Ordinary Least Squares (OLS) Regression

Decision Tree Regression

Ridge Regression

Random Forest Regression

Lasso Regression

9


PRELIMINARY FINDINGS Descriptive Statistics Category

Variable

Count

Mean

Std

Min

25%

50%

75%

Max

ILI Rate

ili_p100

8660

1.06

0.66

0

0.552

0.931

1.479

3.619

Health Insurance

%HealthInsurance

8660

0.886

0.136

0

0.871

0.913

0.942

1

%PublicInsurance

8660

0.404

0.164

0

0.298

0.393

0.505

1

%Drive Alone

8660

0.244

0.171

0

0.103

0.206

0.356

1

%Carpool

8660

0.049

0.042

0

0.017

0.04

0.072

0.333

%Public Transportation

8660

0.537

0.179

0

0.423

0.568

0.674

1

%Taxicab

8660

0.007

0.015

0

0

0

0.009

0.157

%Walk

8660

0.087

0.094

0

0.031

0.062

0.106

1

%WorkAtHome

8660

0.039

0.046

0

0.013

0.03

0.053

1

%lessthan30

8660

0.286

0.138

0

0.203

0.262

0.34

1

%30_60

8660

0.416

0.141

0

0.329

0.406

0.511

1

%morethan60

8660

0.277

0.133

0

0.181

0.292

0.371

1

%pop_age_under5

8660

0.063

0.031

0

0.043

0.061

0.08

0.315

%pop_age_5_18

8660

0.142

0.059

0

0.108

0.143

0.178

0.449

%pop_age_18_65

8660

0.642

0.119

0

0.609

0.647

0.691

1

%pop_age_65over

8660

0.136

0.073

0

0.092

0.127

0.169

1

%Total_White

8660

0.421

0.3

0

0.134

0.383

0.7

1

%Total_Black

8660

0.25

0.298

0

0.019

0.09

0.429

1

%Total_Other

8660

0.31

0.221

0

0.132

0.255

0.472

1

%Households_with_children

8660

0.315

0.127

0

0.233

0.324

0.4

1

Average_household_size

8660

0.028

0.008

0

0.024

0.028

0.032

0.065

%Population_less_than_college

8660

0.644

0.224

0

0.546

0.704

0.807

1

%Unemployed_Population

8660

0.078

0.049

0

0.045

0.069

0.101

0.647

%households_income_less_25K

8660

0.242

0.142

0

0.139

0.211

0.319

0.843

%households_income_25-75K

8660

0.346

0.111

0

0.289

0.356

0.419

1

%households_income_75-150K

8660

0.248

0.104

0

0.184

0.254

0.318

1

%households_income_over150K

8660

0.142

0.127

0

0.053

0.109

0.191

1

%households_public_assistance

8660

0.042

0.041

0

0.012

0.031

0.059

0.392

%housing units_Owner_occupied

8660

0.365

0.258

0

0.15

0.326

0.56

1

Facility_access

8660

0.529

0.395

0

0.277

0.438

0.666

6.169

TreeDens

8660

2873

943

0

2334

2900

3509

5615

%GreenCover

8660

0.271

0.146

0.009

0.17

0.238

0.344

0.918

%Walkup

8660

0.587

0.346

0

0.27

0.661

0.924

1

%Com

8660

0.24

0.214

0

0.099

0.169

0.297

1

%Res

8660

0.718

0.217

0

0.658

0.784

0.861

1

311_p

8660

0.124

1.106

0

0.046

0.07

0.102

44.5

Means Of Transportation

Travel Time

Age

Race

Education

Income

Facility

Urban Form

311

10


In data preparation, we got a dataset that included 35 variables containing different neighborhood characteristics, and 1 variable contained ILI rate, it covered 2165 census tracts in New York City, in the time

period from 2016 to 2019. By combining the ILI rate dataset from 4 continuous years, we developed a dataset with 8660 observations and 36 variables.

PRINCIPAL COMPONENTS ANALYSIS AND SPARSEPCA By using Principal components analysis (PCA), we reduced the dimensionality of variables from 35 to 2/10/15/20. The performance of PCA parameters, the number of components to keep, was compared by the ratio of variances explained via the output components. PCA produced a limited result that does not take into account any a-priori knowledge (Bai, 2007). Since the goal of this research is to facilitate urban decision making, we choose SparsePCA, a method to reconstruct the variables to a combination of sparse

components, which can be explained by human knowledge. In comparing three sets of SparsePCA analysis parameter combinations, the number of sparse atoms to extract was fixed at 2, while alpha, which controlled the sparsity in analysis, were set to 35, 20, 15, respectively. The SparsePCA analysis with parameters of n=2 and alpha=20 gave the best result and identified that participation rate of public health insurance, proportion of white population, education rate, and household income are important features.

11


PRELIMINARY FINDINGS CORRELATION TEST By performing correlation analysis, the result of Pearson standard correlation coefficient between each pair of columns is presented in the correlation map. The correlation map suggested that percentage of population enrolled in public health insurance, percentage of population using public transportation to work, longer commute time to work, percentage of teenager population, percentage of nonWhite population, low education level in population, and low household income contributed to higher ILI rate. While shorter commute time to work, percentage of elderly population, percentage of white population, high household income, percentage of owner-occupied housing, and percentage of green area in neighborhood contributed to the lower ILI rate.

12


13


ANALYSIS - CLUSTERING To better understand how Influenza activities affected New York City neighborhoods, we use machine learning clustering algorithms to identify clusters of neighborhoods with shared characteristics. KMeans Clustering n_clusters=4

Agglomerative Clustering n_clusters=4

14


MODEL EVALUATION - MODEL SELECTION In choosing the optimal clustering model, we compared K-Means Clustering, Agglomerative Clustering, Gaussian Mixture Model, and DBSCAN. The elbow

test suggested that the optimal number of clusters is 4. In the comparison, K-Means clustering yielded the best result, which divided the 2165 census tracts into 4 clusters.

Gaussian Mixture Clustering n_components=2

DBSCAN Clustering eps=5

15


ANALYSIS - CLUSTERING Kernel Density Plots of Standardized Features By Each Cluster ili_p100 %HealthInsurance %PublicInsurance %Drive Alone %Carpool %Public Transportation %Taxicab %Walk %WorkAtHome %lessthan30 %30_60 %morethan60 %pop_age_under5 %pop_age_5_18 %pop_age_18_65 %pop_age_65over %Total_White %Total_Black %Total_Other %Households_with_children Average_household_size %Population_less_than_college %Unemployed_Population %households_income_less_25K %households_income_25-75K %households_income_75-150K %households_income_over150K %households_public_assistance %housing units_Owner_occupied Facility_access TreeDens %GreenCover %Walkup %Com %Res 311_p

16

-3

-2

-1

0

1

2

3


New York City Census Tract Clusters - KMeans

SHOWING THE K-MEANS RESULT The reason we choose K-Means clustering is because the resulting cluster group 0 (in Red) highly collocated with census tracts experiencing high Influenza activities. The visualization shows that neighborhoods with following characteristics: high percentage of population enrolled in public health

insurance, high percentage of households with children, low education level in population, low to medium household income (less than 75k) are among the most vulnerable areas of Influenza activities.

17


ANALYSIS - CLUSTERING Kernel Density Plots of Standardized Features By Each Cluster ili_p100 %HealthInsurance %PublicInsurance %Drive Alone %Carpool %Public Transportation %Taxicab %Walk %WorkAtHome %lessthan30 %30_60 %morethan60 %pop_age_under5 %pop_age_5_18 %pop_age_18_65 %pop_age_65over %Total_White %Total_Black %Total_Other %Households_with_children Average_household_size %Population_less_than_college %Unemployed_Population %households_income_less_25K %households_income_25-75K %households_income_75-150K %households_income_over150K %households_public_assistance %housing units_Owner_occupied Facility_access TreeDens %GreenCover %Walkup %Com %Res 311_p

18

-3

-2

-1

0

1

2

3


New York City High Flu Rate Census Tract Clusters - KMeans

FURTHER CLUSTER ANALYSIS ON VULNERABLE NEIGHBORHOODS We perform the K-Means clustering again by using only census tracts with ILI rate above city mean. It well clustered the census tracts by neighborhood characteristics and ILI rate, which group 0 (in Blue) have higher percentage of teenagers in population, and lower household income; group 1 (in

Red) have higher percentage of elders in population, and higher household income.

19


ANALYSIS - REGRESSION MODEL EVALUATION - MODEL SELECTION In finding the optimal regression model, we compared the performance of 3 linear regression models: ordinary least squares (OLS) regression, Ridge regression, and Lasso regression; and 2 non-linear regression models: decision tree, and random forest. While the Lasso regression failed to have a meaningful result, the OLS and Ridge regression returned a performance score (test set score) of 0.46, and 0.458,

respectively. The poor performance of linear regression models turned us to non-linear regression models. The non-linear regression models performed better in solving the prediction problems. Both decision tree model and random forest model have a test set score of 0.85, which we believe it performed well in predicting the ILI rate.

LINEAR REGRESSION MODEL

ORDINARY LEAST SQUARES (OLS)

0.437

0.56

0.457

0.56

Training Set Score

Test Set Score

Mean Squared Error - Training

Mean Squared Error - Test

RIDGE

0.56

0.458

0.56

Mean Squared Error - Training

Mean Squared Error - Test

LASSO

0.0

0.99

0.0

1.04

Training Set Score

Test Set Score 20

max_depth=23

0.932

Training Set Score

0.853

Test Set Score

max_depth=19, n_estimators=100

0.437 Test Set Score

DECISION TREE

RANDOM FOREST

Alpha=10

Training Set Score

NON-LINEAR REGRESSION MODEL

Mean Squared Error - Training

Mean Squared Error - Test

0.929

Training Set Score

0.857 Test Set Score


OPTIMAL MODEL - DECISION TREE AND RANDOM FOREST The depths of two non-linear models are tuned using a search in order to achieve optimal model performance, by comparing test set scores. At max maximum depth of the tree equals 23, we have optimal model performance for decision tree regressor. At max maximum depth of the tree equals 19, we have optimal model performance for random forest model. By showing importance of each features, DECISION TREE

both the decision tree and random forest regression model pointed out that percentage of white population, percentage of people enrolled in health insurance, and percentage of population commute with public transportation have significant impact on ILI rate, while the random forest model further discovered that high household income, and education rate also contributed to the difference of ILI rate between census tracts. RANDOM FOREST

%HealthInsurance %PublicInsurance %Drive Alone %Carpool %Public Transportation %Taxicab %Walk %WorkAtHome %lessthan30 %30_60 %morethan60 %pop_age_under5 %pop_age_5_18 %pop_age_18_65 %pop_age_65over %Total_White %Total_Black %Total_Other %with_children household_size %less_than_college %Unemployed %income_less_25K %income_25-75K %income_75-150K %income_over150K %public_assistance %housingOwner Facility_access TreeDens %GreenCover %Walkup %Com %Res 311_p

21


ANALYSIS - REGRESSION Projected Average Influenza-Like Illness (ILI) Overall Emergency Department Visit Rate Per Year Per 100 People By Census Tract 2016-2019 The random forest regression model gave its prediction of ILI rate to each census tract. We compared the ILI rate derived from NYC Syndromic Surveillance Data, which is on ZIP Code equivalent area level, and the predicted ILI rate on census tract level. The comparison proved (1) the prediction matched the original training data, and (2) the prediction, since having a finer granularity, can identify the ILI rate difference between census tracts within the same ZIP Code area.

0.0

0.5

1.0

1.5

2.0

MEAN

22

Projected Average ILI ED Visit Rate Per Year Per 100 People

2.5

3.0


Predicted Influenza-Like Illness (ILI) Overall Emergency Department Visit Rate Per Year Per 100 People By Census Tract

0.0

0.5

1.0

1.5

2.0

2.5

3.0

MEAN

Predicted Average ILI ED Visit Rate Per Year Per 100 People

23


CONCLUSION The aim of this research is to develop a predictive model for influenza activities at the census tract level in New York City, to facilitate the better implementation of public health measures and interventions on a finer spatial scale. With the development of the regression predictor using machine learning models, the study also identified important neighborhood characteristics related to influenza activities, which can also facilitate urban decision making and policy implementation. COVID-19 Overall Confirmed Case Rate Per 100 People By ZIP Code Till June 7th 2020

1.0

24

1.5

2.0

The research is triggered by the COVID-19 outbreak in New York City. Through comparing the COVID-19 case map with our predicted ILI rate map, we can find some collocation between these pandemic activities. In the future, we believe that the research topic can expand to other diseases, by using our model of small area estimation.

2.5

3.0

3.5

4.0


LIMITATION While we can use the ZIP Code level ILI rate data to validate our predictions, the accuracy of our models are unable to prove since ILI rate data on census tract level does not exist in statistics. We respect that the health agencies want to protect the privacy of patients, thus finer-grain data is concealed.

Another limitation of our study is that although these contagious disease datas are updated monthly, or even daily on health agencies’ websites, the neighborhood characteristics, especially those from the American Community Survey, are updated less frequently and come in outof-date. Without a live feed of urban data, we are not able to understand the current socioeconomic and urban activities.

Predicted Influenza-Like Illness (ILI) Overall Emergency Department Visit Rate Per Year Per 100 People By Census Tract

0.0

0.5

1.0

1.5

2.0

2.5

3.0

25


BIBLIOGRAPHY

Allen, C., Tsou, M. H., Aslam, A., Nagel, A., & Gawron, J. M. (2016). Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza. ​PloS one,​ 11(7). Aramaki, E., Maskawa, S., & Morita, M. (2011, July). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing (pp. 1568-1576). ​Association for Computational Linguistics. Bai, W. (2007). ​Reading Notes on A Tutorial on Principal Component Analysis. http://www.doc.ic.ac.uk/~wbai/notes/Shlens-PCA/Shlens-PCA.html Centers for Disease Control and Prevention, National Center for Immunization and Respiratory Diseases (NCIRD). (2020). Past Seasons Estimated Influenza Disease Burden. https://www.cdc.gov/flu/about/burden/past-seasons.html Chretien, J. P., George, D., Shaman, J., Chitale, R. A., & McKenzie, F. E. (2014). Influenza forecasting in human populations: a scoping review. ​PloS one,​ 9(4). Kontokosta, C. E., Hong, B., Johnson, N. E., & Starobin, D. (2018). Using machine learning and small area estimation to predict building-level municipal solid waste generation in cities. ​Computers, Environment and Urban Systems,​ 70, 151-162. Feldman, P. J., & Steptoe, A. (2004). How neighborhoods and physical functioning are related: the roles of neighborhood socioeconomic status, perceived neighborhood strain, and individual health risk factors. ​Annals of Behavioral Medicine,​ 27(2), 91-99. Fuller, E. J., Abramson, D. M., & Sury, J. (2007). Unanticipated Consequences of a Pandemic Flu in New York City: A Neighborhood Focus Group Study. Molinari, N. A. M., Ortega-Sanchez, I. R., Messonnier, M. L., Thompson, W. W., Wortley, P. M., Weintraub, E., & Bridges, C. B. (2007). The annual impact of seasonal influenza in the US: measuring disease burden and costs. ​Vaccine,​ 25(27), 5086-5096. Mueller, N., Rojas-Rueda, D., Cole-Hunter, T., De Nazelle, A., Dons, E., Gerike, R., ... & Nieuwenhuijsen, M. (2015). Health impact assessment of active transportation: a systematic review. ​Preventive medicine, ​76, 103-114. Nsoesie, E. O., Brownstein, J. S., Ramakrishnan, N., & Marathe, M. V. (2014). A systematic review of studies on forecasting the dynamics of influenza outbreaks. ​Influenza and other respiratory viruses,​ 8(3), 309-316.

26


Pineda, A. L., Ye, Y., Visweswaran, S., Cooper, G. F., Wagner, M. M., & Tsui, F. R. (2015). Comparison of machine learning classifiers for influenza detection from emergency department free-text reports. ​Journal of biomedical informatics, 58, 60-69. Sanche S, Lin YT, Xu C, Romero-Severson E, Hengartner N, Ke R. (2020) High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. ​Emerging Infectious Diseases,​ 2020 Jul. [May 5, 2020]. https://doi.org/10.3201/eid2607.200282 Yang, W., Olson, D. R., & Shaman, J. (2016). Forecasting influenza outbreaks in boroughs and neighborhoods of New York City. ​PLoS computational biology, 12(11).

DATA SOURCE NYC 311. (2020). 311 Service Requests from 2010 to Present [Data file]. https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-t o-Present/erm2-nwe9 NYC Department of City Planning, Information Technology Division. (2020). MapPLUTO 20V1 (shoreline clipped) [Data file]. https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mapplut o.page NYC Department of Health and Mental Hygiene. (2020). Influenza-like illness Syndromic Surveillance Data [Data file]. https://a816-health.nyc.gov/hdi/epiquery/visualizations?PageType=tsi&Populat ionSource=Syndromic&Topic=1&Subtopic=39&Indicator=Influenza-like%20illne ss%20(ILI)&Year=202 NYC Department of Health and Mental Hygiene. (2020). New York City Locations Providing Seasonal Flu Vaccinations [Data file]. https://data.cityofnewyork.us/Health/New-York-City-Locations-Providing-Seas onal-Flu-Vac/w9ei-idxz New York State Department of Health. (2020). Health Facility Map [Data file]. https://health.data.ny.gov/Health/Health-Facility-Map/875v-tpc8 United States Census Bureau. (2018). 2013-2017 ACS 5-year Estimates [Data file]. https://www.census.gov/programs-surveys/acs/technical-documentation/tabl e-and-geography-changes/2017/5-year.html

27


THANK

XIYU

YOU

CHEN

xc2521@columbia.edu

HANZHANG YANG hy2510@columbia.edu

TING

ZHANG

tz2436@columbia.edu


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.