BROADBAND INTERNET ACCESS
Measuring Disparities between Two Counties in New York State Juan Sebastián Moreno, Sebastian Salas, Zeineb Sellami, Hanzhang Yang
Average Download Speed (Mbps) in NYS . data source: Ookla (2021)
Table of Contents 4
The Promise, the Trouble & the Question
6 8 10
Definitions, Operationalizations, and Assumptions The Benefits of Kriging Exploring Social Vulnerability
12 12 12 14 18
Methodology and Findings 1. Getis-Ord GI* 2. Anselin Local Moran’s I (ALM) 3. Geographically Weighted Regression (GWR) Impacts
20
Datasets, Bibliography, and Photos
THE PROMISE, THE TROUBLE & THE QUESTION Over the past year, our reliance on virtual networks requiring Internet access has become evermore crucial for personal, professional, and educational purposes, to name a few. In the midst of a pandemic, the Internet provided a certain comfort in maintaining social ties by attending zoom weddings, family dinners, happy hours with friends, etc. A stable connection allowed those whose employment shifted to “working from home” to continue holding staff and client meetings, creating an online office space. Internet accessibility became synonymous with accessibility to education for students of all ages. The online world mimicked public space, boardrooms, and classrooms, effectively replacing most of our everyday physical interactions. For more than a year, reliable connections to the Internet have allowed us to keep in touch, while establishing barriers to our contact with our former everyday lives. We like to think of the Internet as the ultimate democratization of information and communication, and yet we cannot deny its exclusionary powers. Those who have unobstructed Internet access, proper hardware, and the possibility of remote work or education are fortunate enough to stay connected. Those less fortunate have lost their jobs as a result of the pandemic due to their inability to work remotely. Across the globe, education has been put on hold for school systems that do not have access to the necessary resources for at-home learning. The premise to our research begins with the acknowledgment that Internet connectivity is a vital service. The Internet should be considered a public good or more specifically a public infrastructure delivered in the same way as water or power. As we will see, this is not the case.
4
Our project intends to further investigate the spatial factors behind differences in Internet access in New York State. Disparities in connectivity can be analysed to understand the way spatial segregation has beget techno-spatial divides while reinforcing existing social rifts within localities. Our goal is to identify connectivity gaps in two rural counties as a way to understand disparities between regions in general. The methodologies applied throughout this report could be replicated at a larger scale providing a broader understanding of these disparities at the state level and beyond. We selected Livingston and Ontario counties because, despite being contiguous and sharing a similar population density, they have a disparity in accessibility to broadband Internet service. The number of Internet providers per census block group and percent of households with a broadband subscription varies significantly between the two.
What is the spatial relationship between broadband accessibility and social vulnerability in two rural counties in New York State pre-pandemic?
5
@avirichards | Unsplash.com
DEFINITIONS, OPERATIONALIZATIONS, AND ASSUMPTIONS Broadband “Broadband [...] refers to a high-speed connection to the Internet that is always available. [...] Any of the multiple technologies that deliver digital telecommunications to end users—including digital subscriber lines (DSL), cable, fiber, or satellite—can qualify as broadband.” (Tomer et. al., 2017) For the purpose of our study, we are focused on examining residential broadband connections.
Accessibility For the purpose of this study, we define accessibility as a measure of availability and quality.
Availability: Number of Providers Availability is defined as the number of broadband Internet providers per block group. The U.S. Federal Communications Commission (FCC) collects detailed data on broadband Internet access including the name of unique providers and download speeds. Broadband providers “are required to file data with the FCC twice a year (Form 477).” For the purpose of this research we used Fixed Broadband Deployment Data from December 2019. The FCC data provides block level information. Acknowledging that all other indicators being used for the purpose of our research is aggregated at the block group level, we counted the number of unique providers per block group to create a statewide map, showing the number of providers per block group.
Quality: Internet Speed Quality is defined as the average broadband download speed. “Speed is the single most important metric of interest in characterizing the ‘quality’ of broadband service.” (Bauer et. al., 2010:1) 6
To analyze the quality of Internet connection, we first downloaded the Speedtest by Ookla Global Fixed and Mobile Network Performance Maps. The dataset contains information regarding the average download speed, upload speed, and latency of Internet connections, and reflects the speedtest results from January 1, 2020 to April 1, 2020. For the purposes of our research we selected average download speeds as our main measure of Internet quality. The Ookla Speedtest is not without its limitations. The dataset only provides information where Internet subscribers have requested a speedtest. As such, data is unavailable for unpopulated areas or locations in which a speedtest was not requested. This explains the number of “blind spots” on the map.
A Note on Cost While we recognize that cost is another important measure of accessibility when it comes to overall accessibility, compiling the necessary data to evaluate such a measure was not possible given the time and scope of our semester-long research.
Site Selection Based on Beynon et al. (2015), we understand that the urban/rural divide carries profound conceptual and methodological meaning to GIS-based measures and analyses, as well as to policy making. While the concept of the rural carries a conceptual importance to our site selection, we did not consider it as an analytical construct and, therefore, we did not apply indices to determine a degree of difference between our two sites and metropolitan areas such as Rochester, Buffalo, or New York City. Instead, we observed the notable differences in population density in order to categorize Livingston and Ontario as predominantly rural areas.
LOCATING PROVIDERS
Number of Internet Providers per Block Group 1 to 3 4 to 5 6 7 8 to 12
Ontario Livingston
AVERAGE DOWNLOAD SPEED 40 in Mbps by tiles (zoom level =16)* Miles
0 to 20 21 to 50 51 to 100 101 to 200 201 to 810
N
100 miles
7
Average Download Speed (Mbps)
THE BENEFITS OF KRIGING
STEP 1 . Speedtest tiles within 5 miles of two counties were selected.
In order to compensate for the lack of data and paint a broader picture of Internet quality in Livingston and Ontario, we performed a Kriging interpolation followed by a spatial join, thus creating a block group level average download speed dataset for the two counties. “Kriging is an advanced geostatistical procedure that generates an estimated surface from a scattered set of points with z-values. The Kriging tool fits a mathematical function to a specified number of points, or all points within a specified radius, to determine the output value for each location.” (ESRI)
1 45 95 180 330 620
STEP 2 . Using Kriging, we created a raster dataset of average Internet download speed covering the two counties.
* About the data, from the source: “Hundreds of millions of Speedtests are taken on the Ookla platform each month. In order to create a manageable dataset, the raw data is aggregated into tiles. The size of a data tile is defined as a function of “zoom level” (or “z”). At z=0, the size of a tile is the size of the whole world. At z=1, the tile is split in half vertically and horizontally, creating 4 tiles that cover the globe. This tile-splitting continues as zoom level increases, causing tiles to become exponentially smaller as we zoom into10 a given region. By this definition, tile sizes are actually some fraction of the width/height of Earth according to Web Mercator projection (EPSG:3857). As such, tile size varies slightly depending on latitude, but tile sizes can be estimated in meters. For the purposes of these layers, a zoom level of 16 (z=16) is used for the tiling. This equates to a tile that is approximately 610.8 meters by 610.8 meters at the equator (18 arcsecond blocks). The geometry of each tile is represented in WGS 84 (EPSG:4326) in the tile field.” (Ookla, 2021)
8
Low
High
STEP 3 . A spatial join provides the average Internet download speed for each block group in the two counties. Miles
21 63 84 96 113 160 10 mi
9
EXPLORING SOCIAL VULNERABILITY
To gain a better understanding of the demographic make-up of the study areas, we first investigate social vulnerability indicators at the block level. Drawing upon the CDC’s Social Vulnerability Index, we developed our own index comprised of the following census variables: 1. Median Household Income 2. Educational Attainment: Age 25+ without a highschool diploma 3. Children: Population age less than 18 4. Seniors: Population age 65 and above 5. Minority: Non-white Population 6. Unemployment 7. Household Broadband Subscription 8. Population Density: people per square mile We understand that selecting different social vulnerability indicators is a subjective task, and that adding or removing certain variables could lead to different results. We view our selection as a crossover between commonly accepted indicators and additional variables such as levels of household broadband subscription that are uniquely relevant to this particular study.
10
INCOME $ Income
BROADBAND SUBSCRIBERS
Median Household Income in thousands of dollars
%
18 46 58 65 77 130Income
Non High School Diploma
EDUCATION %
Percentage of People over the age of 25 without a Broadband High School PerDiploma Household
Broadband Per Household Less Than 18 High School Diploma Non High School Diploma Non White IncomeNon 3 5 8 11 55
MINORITY
% Less Than 18 nd Per Household School Diploma
Percent of Non-White Households 2
Income
5
Over 65
Less Than 18
8 15Non 58 White
Non High School Diploma
CHILDREN
s Than 18
Percent of Households Over 65 with Children under the age of 18
% nd Per Household Non High School Diploma
Less Than 18
1.3 1.8 2.1 2.4 3.7
10 mi
Non White
Percent of Households with a Residential Broadband Subscription
Broadband Per Household Non High 0 School 34 63 Diploma 75 80 100 Non White
Less Than 1
Non White
Population De
SENIORS
Percentage of Households with Seniors over the18 age of 65 Less Than
%
Over 65 Population Density
Non 1.3 White 1.7 2.1 2.6 5.2
Over 65
Population Density Unemploym
UNEMPLOYMENT
Percent of People Unemployed
%Over 65
Unemployment Population 0 Density 2.1 3.6 7.0 32 Non White
Unemployment Population Density
POPULATION DENSITY
Population per square-mile Unemployment in each block group
# Over 65
Population Density 20 70 140 760 1900 9000
Unemployment
11
METHODOLOGY & FINDINGS
1. GETIS-ORD GI*
We employed three methods to understand household accessibility to broadband Internet in Ontario and Livingston. As explained in the previous section, we defined accessibility in terms of availability (number of consumer broadband providers) and quality (average download speed) per block group. These methods strive to determine spatial clusters of high and low broadband accessibility, as well as the relationship between fixed Internet access and the social vulnerability variables defined earlier.
We conducted this test to identify statistically significant clusters of hot and low values, determined by high and low z-scores, respectively. Larger z-scores indicate clustering of high values of broadband providers and reported download Internet speeds around specific block groups, while smaller scores represent clustering of low values. We first ran this test for each of the counties individually, and then for the two geographies combined. The results of the latter approach show that there are hotspots of Internet providers and reported speeds for Ontario county, and analogous coldspots for Livingston county.
2. ANSELIN LOCAL MORAN’S I As a way to contrast the results from the Getis-Ord GI* method, we employed the Anselin Local Moran’s I test to determine whether individual block groups were clustered around high and low values, and to identify potential spatial outliers. The Locals Moran’s I method was used to compare previous hotspot findings and find clusters of block groups of outliers containing high/low numbers of broadband providers and reported download speed.
12
NUMBER OF PROVIDERS
Broadband hot/cold spots of accessibility in terms of availability and quality varies throughout Livingston and Ontario County illustrate an uneven spatial distribution of these two variables. The significance in the clustering of providers is notoriously high, and points at underlying differences in broadband access that can be expanded in terms of income, race, education attainment, among other social vulnerability variables.
AVERAGE DOWNLOAD SPEED
Cold Spot
NUMBER OF PROVIDERS
Level of Significance 99% 95% 90% n.s. 90% 95% 99%
Hot Spot
We obtained mixed results: the results from the Getis-Ord GI* seemed to be confirmed by the Local Moran’s I for the hotspots and coldspots of providers, indicating a cluster of high number of providers to the east of our case study, in Ontario county, and a significant low number of providers in western Livingston county. However, the test for the average download speed did not yield similar findings. Instead, the Local Moran’s I indicated a number of outliers in both counties.
AVERAGE DOWNLOAD SPEED
High-Low Outlier High-High Cluster Low-Low Cluster Low-High Outlier Not Significant 10 mi
13
INCOME
METHODOLOGY & FINDINGS
$
Median Household Income in thousands of dollars
18 46 58 65 77 130Income
3. GEOGRAPHICALLY WEIGHTED REGRESSION A GWR was used to model spatially varying relationships between broadband accessibility and download speeds and the following INDEPENDENT VARIABLES: Income, Education level, Minority, Children under the age of 18, Household Broadband Subscription, Population Density, Household Broadband Subscription, Seniors, Unemployment, and Population Density per square mile.
EDUCATION
Income
%
Percentage of People over the age of 25 without a Broadband High School PerDiploma Household 3
Diploma 5Non8High 11 School 55
A Geographically weighted regression (GWR) is a method to explore the spatial relationships between a series of independent variables (demographic data, in our case) and a dependent variable, in a model that interpolates and predicts the influence of each variable in the observed outcome. MINORITY Income
GWR . Number of Providers
Broadband Per Household Non High School Diploma
%
Income
Percent of Non-White Households 2
5
Less Than 18
58 White 8 15Non
Non High School Diploma
Among the demographic and density variables we analyzed, population density has the highest spatial correlation with the number of broadband providers per block group. We also encountered high correlation between educational attainment and the number of providers, especially in Livingston county. Considering the clustering of providers we Less Than 18 Broadband Per Household diagnosed through the previous methods, we should Broadband Per Household look more carefully at the relationship between low levels of educational attainment, measured as the proportion of the adult population without a high school diploma, and decreased Internet accessibility.
14
CHILDREN %
Percent of Households Over 65 with Children under the age of 18 Less Than 18
1.3 1.8 2.1 2.4 3.7
hold a
a
R2 [dependent = Number of Providers] 0.02 0
0.1
0.05
0.35
BROADBAND SUBSCRIBERS %
0.18
Non High School Diploma
Percent of Households with a Residential Broadband Subscription Broadband Per Household
0 34 63 75 80 100
Non White
R2 [dependent = Number of Providers] 0.006 0.03 0
0.09
Less Than 18
0.01
0.05
Population Density
SENIORS 0.02 0
0.11
0.05
0.34
0.19
Less Than 18 Non White
%
Percentage of Households with Seniors over the age of 65 Over 65 Population Density
1.3 1.7 2.1 2.6 5.2
0.005 0.05 0
0.13
Unemployment
0.02
0.07
UNEMPLOYMENT 0.02 0
0.14
0.07
0.48
0.30
Over 65
%
Percent of People Unemployed Unemployment
0 2.1 3.6 7.0 32
0.008 0.08 0
0.03
0.21
0.14
POPULATION DENSITY 0.008 0.06 0
0.02
0.28
0.12
Non White
#
Population per square-mile in each block group
Population Density
20 70 140 760 1900 9000
0.01 0
0.07
0.04
0.20
0.13
15
INCOME
METHODOLOGY & FINDINGS
$
Median Household Income in thousands of dollars
18 46 58 65 77 130Income
3. GEOGRAPHICALLY WEIGHTED REGRESSION
EDUCATION
Income
%
Percentage of People over the age of 25 without a Broadband High School PerDiploma Household 3
Diploma 5Non8High 11 School 55
MINORITY Income
GWR . Download Speed
Broadband Per Household Non High School Diploma
%
Income
5
Less Than 18
58 White 8 15Non
CHILDREN Less Than 18
However, the remaining local R-squared maps show Broadband Per Household that in block groups with low values, the GWR model is performing poorly, and other variables need to be studied and included to refine it.
16
2
Non High School Diploma
The influence of educational attainment is illustrated with more clarity in the following set of maps. Educational levels have the highest correlation with the reported download Internet speeds. This connection could hint to overlapping forms of exclusion, which can only be exacerbated by the sudden transition we have experienced to online forms of learning and working. Broadband Per Household
Percent of Non-White Households
%
Percent of Households Over 65 with Children under the age of 18 Less Than 18
1.3 1.8 2.1 2.4 3.7
hold a
a
R2 [dependent = Download Speed] 0.01 0
0.09
0.05
0.54
BROADBAND SUBSCRIBERS %
0.19
Non High School Diploma
Percent of Households with a Residential Broadband Subscription Broadband Per Household
0 34 63 75 80 100
Non White
R2 [dependent = Download Speed] 0.003 0.06 0
0.34
Less Than 18
0.01
0.15
Population Density
SENIORS 0.02 0
0.11
0.05
0.35
0.23
Less Than 18 Non White
%
Percentage of Households with Seniors over the age of 65 Over 65 Population Density
1.3 1.7 2.1 2.6 5.2
0.02 0
0.09
0.40
Unemployment
0.05
0.24
UNEMPLOYMENT 0.008 0.05 0
0.02
0.22
0.09
Over 65
%
Percent of People Unemployed Unemployment
0 2.1 3.6 7.0 32
0.02 0
0.08
0.05
0.32
0.12
POPULATION DENSITY 0.007 0.05 0
0.02
0.19
0.10
Non White
#
Population per square-mile in each block group
Population Density
20 70 140 760 1900 9000
0.02 0
0.12
0.06
0.51
0.26
17
IMPACTS The Covid-19 pandemic has proven that Internet use and reliability are foundational not only to work, shop or play but also to learn. Lack of accessibility to this service in certain areas can impose a considerable burden on the education of kids and teenagers, and erase the gains of years of social policy that aimed to bridge pre-existing disparities. Among our main findings, we discovered that areas with low access to broadband providers are collocated with areas where a larger proportion of adults have no high school diploma, which could increase educational (and socioeconomic) gaps in the new hybrid education model the U.S. and the rest of the world are experiencing. While these issues tend to be concentrated in lowerincome households, they are not necessarily limited to them. As we observed in the results, population density has the highest spatial correlation with the number of broadband providers per block group. In consequence, residents of less dense areas, usually in rural zones of the state, can also experience higher difficulties to access reliable broadband connections. As broadband has become an essential service for economic mobility and social opportunities, lowincome families living in less dense counties need urgent access to high-speed, reliable connections. Further research could be aimed at understanding the affordability variable of Internet accessibility. Households living in poverty not only need fast connections and a range of providers to choose from, but also a broadband service provided at an affordable price. Overlapping forms of social differentiation could further drive apart connectivity rates in the two counties we analyzed.
18
Our approach can allow policymakers to understand unserved markets and develop specific public policies to foster connectivity in specific geographic areas or for people who lack adequate connectivity. At the same time, our methodology highlighted several concerns of measuring accessibility - usually operationalized as a distance to a destination, instead of the possibility to connect to a public service. We added a different angle to this concept, using the spatial distribution of Internet providers and the rate of connected households as ways to pin down the supply and demand of broadband services. However, this research can be expanded with additional questions about dispersion and density of households and population, using the conceptual dichotomy of ‘urban vs rural’ to determine the effects of density to Internet connectivity at a larger (e.g., statewide) scale. On a different scale (the neighborhood level, perhaps) issues of segregation may arise when some communities are compared to others. Understanding the nuances of these problems, both in their methodological development as well as in their spatial materiality, will be a crucial element of expanding infrastructure and encouraging a more equitable distribution of resources to connect communities.
19
@discoversavsat | Unsplash.com
DATASETS
BIBLIOGRAPHY
Ookla. (2021). Speedtest® by Ookla® Global Fixed and Mobile Network Performance Maps. Based on analysis by Ookla of Speedtest Intelligence® data for January 1, 2020, to April 1, 2020. [shapefile] Provided by Ookla and retrieved on May 2021 from: https://github.com/teamookla/ookla-open-data
Bauer, S., Clark, D. D., and Lehr, W. (2010) Understanding Broadband Speed Measurements. Massachusetts Institute of Technology (MIT). TPRC 2010, Available at SSRN: https://ssrn.com/ abstract=1988332
U.S. Federal Communications Commission. (2021). Fixed Broadband Deployment Data: December 2019, New York State.[dataset] Retrieved April 2021 from: https://opendata.fcc.gov/Wireline/FixedBroadband-Deployment-Data-December-2019/ whue-6pnt U.S. Census Bureau. (2019). 2019 TIGER/ Line Shapefiles: Block Groups. [shapefile] Retrieved April 2021 from: https://www. census.gov/cgi-bin/geo/shapefiles/index. php?year=2019&layergroup=Block+Groups U.S. Census Bureau. (2021). 2015-2019 American Community Survey 5-year estimates [dataset]. Retrieved April 2021 from Social Explorer.
20
Beynon, M.J., Crawley, A., and Munday, M. (2016) “Measuring and understanding the differences between urban and rural areas.” Environment and Planning B: Planning and Design, Vol. 43(6): 11361154. Tomer, A., Kneebone, E., and Shivaram, R. (2017) Signs of digital distress: Mapping broadband availability and subscription in American neighborhoods. Brookings Institution. https://www. brookings.edu/research/signs-of-digital-distressmapping-broadband-availability/
21
Advanced Spatial Analysis . Spring 2021 Professor Leah Meisterlin Columbia University, Graduate School of Architecture, Planning and Preservation MS Urban Planning Juan Sebastián Moreno, Sebastian Salas, Zeineb Sellami, Hanzhang Yang