GRD Journals- Global Research and Development Journal for Engineering | Volume 5 | Issue 11 | October 2020 ISSN- 2455-5703
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat Payal Zaveri Assistant Professor Department of Civil Engineering SCET, Gujarat Technological University
Simerdeep Kaur Sood Bachelor of Engineering Department of Civil Engineering SCET, Gujarat Technological University
Abstract Regression analysis is a statistical approach to find the interrelation between dependent and independent variables. The basis of regression analysis is correlation among the variables. Correlation is a statistical method to evaluate the strength of the relationships between variables. The higher the coefficient of correlation the more is the strength. This paper focuses on the study of trip generation and the factors affecting the same along with the degree to which the individual factor affects the trip. Linear regression is the study of one dependent and few independent variables and their correlation. The data collection technique taken into consideration is ‘Household survey’. Household survey is the procedure for collection and analysis of general situation and specific characteristics affecting the individual households or residential areas. The precision of the result depends on the sample size and the range of the sample. The more is the sample size better is the correlation. Keywords- Correlation Analysis, Household Survey, Microsoft Excel, Multiple Regression Analysis, Trip Generation
I. INTRODUCTION To economically develop a city, it is important to develop a good transportation infrastructure and to provide good communication facilities. This includes planned development of the road connections in the city and railway connections for interstate travel. The first phase of developing a good transportation facility is to plan. This includes data collection of the study area and inventory. The second phase is to understand the pattern of trip generation. This includes the relation of various independent and dependent variables affecting the trip generation. This analysis is done by generating a model for understanding and accurate results. Trip is a one-way movement of a person by a mechanical means of mode. It has two ends an origin which is a starting point of a trip and a destination which is the end point of the trip. Trips are divided into two types ‘Home based’ trips where one end is home and ‘Non home based’ trips where none of the two end is home. [1] Before the Start of first phase it is important to select a few factors affecting the trip generation, based on which our data collection survey will depend. These are called the independent variables. These variables are themselves independent, but the trip generation is dependent of these factors. Factors affecting the trip generation may include number of family members, income of the family, age group, number of vehicles owned by the family, working hours, number of trips per day. [2] The second phase of planning of a transportation system or facility is to start with modeling. Correlation analysis is the first step to generation of the model. It is the process to find an independent variable most likely to affect the dependent variable. It is the study of finding the strength of relationship among the variables. [3] Higher the correlation coefficient better is higher is the strength of the variables. Generally, the values lie between -1 to 1 with 1 being the best relation among variables. After finding the variables with suitable correlation coefficient the final step in analysis is to perform a regression analysis. Multiple linear regression analysis is a statistical method of fitting mathematical relationship between the dependent and independent variables. [4] [5] This model helps in generating equations which ultimately help in finding the number of trips generated due to a selected number of factors affecting. Multiple linear regression analysis is an accurate method and can be used easily by novices or students as well as experts. This research paper is an original study conducted through field work by collection of data in residential areas and analysis through computer software. This study is based on the data of the prevailing year collected through household survey. Questionnaires are prepared to obtain most relevant data which can be used for the regression analysis.
II. STUDY AREA Surat is the largest urban agglomeration and ninth eight largest city in India. Surat along with being famous for diamond and textile industry is the commercial and economical hub of South Gujarat. Surat is known for being the ‘Diamond city’, ‘Silk city’ and the ‘Green city’ of India. It is the administrative capital of Surat district. Surat is 284 km to the south of the Gandhinagar which is the capital of Gujarat, 265 km to the south of Ahmadabad and 289 km north of Mumbai. Tapi River is the major river passing through the city and is responsible for the economic growth of the city. It has a coastline which touches the Arabian Sea. Surat has the GDP growth of 11.5% over seven fiscal years.
All rights reserved by www.grdjournals.com
20
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat (GRDJE/ Volume 5 / Issue 11 / 002)
The major intercity transportation infrastructure in Surat includes a Railway station which is under the administrative control of Western Railway Zone of the Indian Railways. Surat BRTS is a bus rapid transit system, an intracity network, having 245 buses connecting major locations. Surat has the second busiest airport in Gujarat Airport. There are 83 bridges in operation and four flyovers and many still in the stage of construction phase.
Fig. 1: Location of the study area on Google Maps [6]
Vesu is 5 km away from airport and 10 km away from the railway station. Vesu is a suburb in the South West Zone of Surat. It is the newest area and fast developing area in terms of public transport infrastructure, residential area, complex and business parks. The latitude location of Vesu is 21.1447217 ̊ and longitude is 72.7717735 ̊. The study area selected is a residential area which is well connected to commercial area as well as is extremely near to the airport and BRTS facilities. It is in the midst of other residential apartments. Vesu is a relatively newly developed area of Surat which continuously undergoes various transformations with respect to its infrastructural development. This study will help the future development to occur in a systematic and organized manner which latter on will require less reconstruction and rehabilitation. It will also increase the efficiency of the area in terms of transportation.
III. DATA COLLECTION As discussed, the first step of the planning is data collection. Among the many data collection methods ‘Household Survey’ is selected here. Household survey is the process of data collection of the general characteristics of an area usually residential. [2] It is one of the easiest and moderately efficient data collection survey type. The process includes filling up forms which includes the age, occupation, address/ zone of residence, number of family members, vehicles owned, type of vehicle owned, mode of public transportation if chosen, working hour, and frequent place of visit and income of the family. [7] [8] These variables are the independent variables and will affect the trip generation (dependent variable). A data sheet is prepared after the analysis of the questionnaire. This sheet is finally used in the regression analysis. The data sheet for this research paper is given in annexure at the end of the paper.
IV. METHODOLOGY
Fig. 2: Flowchart of the Methodology
To start with the regression analysis the following steps should be followed: Step 1: Prepare a Data table based on household survey Step 2: Install the ‘Analysis ToolPak’ in Microsoft Excel Step 3: Correlation analysis in Excel Step 4: Selection of ‘Independent Variables’
All rights reserved by www.grdjournals.com
21
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat (GRDJE/ Volume 5 / Issue 11 / 002)
Step 5: Regression Analysis in Excel Step 6: Equation generation from the regression tables Step 7: Selection of the best and most effective equations Equation for the regression analysis is as follows: Y = A + B1 X1 Y = A + B1X1 + B2 X2 Y = A + B1 X1 + B2 X2 +B3 X3 Here, Y = Dependent variable = Trip generation A = Intercept from multiple linear regression table B1, B2, B3 = variables in the table X1, X2, X3 = Independent variables = Earning people, Working hours, Vehicles owned The first step is the correlation analysis. The correlation coefficient for different independent variables is given below: A. Number of Vehicles and Trips Table 1: Vehicles v/s Trips Vehicles Trips Vehicles 1 Trips 0.89723 1 B.
Number of Earning People and Trips Table 2: Earning people v/s Trips Earning people No. of trips Earning people 1 No. of trips 0.9583148 1
C.
Number of Age Group and Trips Table 3: Age Group v/s Trips Age group Trips Age group 1 Trips 0.2677264 1
D. Trip duration (hours) and Trips Table 4: Trip duration and Trips Trip duration Trips Trip duration 1 Trips -0.1137283 1
E. Working hours and Trips Table 5: Working Hours and Trips Working hours Trips Working hours 1 Trips 0.8701299 1
F. Number of Family Members and Trips Table 6: Family members v/s Trips Family members Trips Family members 1 Trips 0.8660714 1
In the above correlation analysis, the first variable is the independent variable which affects the selected dependent variable which is trip generation. Depending on the correlation coefficients the best three independent variables selected for further procedure are as follows: Sr. no. 1 2 3
Table 7: Selected independent variables Independent Variable Coefficient Number of Earning people 0.95 Number of vehicles owned 0.89 Number of Working hours 0.87
V. RESULT AND ANALYSIS After selecting the independent variables various multiple linear regression analysis is done with one, two or three independent variables. [6]
All rights reserved by www.grdjournals.com
22
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat (GRDJE/ Volume 5 / Issue 11 / 002)
A. Number of Earning People and Trips Number of people earning may increase the trips as the will be more trips for going to office from home and back and for other necessities. This analysis considers the following: Independent variables = Number of earning people Dependent variable = Number of trips Table 8: Regression Analysis Multiple R 0.958314847 R Square 0.918367347 Adjusted R Square 0.908163265 Standard Error 0.447213595 Observations 10 Table 9: Coefficients for equations Coefficients Standard Error Intercept (A) 0.295918367 0.242011 Earning people (B1) 0.459183673 0.0484022
B. Number of Working Hours and Trips Number of working hours may range from 6 -11 hours in a day, depending on the number of people working and their occupation. This analysis considers the following: Independent variables = Number of Working hours Dependent variable = Number of trips Table 10: Regression analysis Multiple R 0.87012987 R Square 0.757125991 Adjusted R Square 0.72676674 Standard Error 0.771389216 Observations 10 Table 11: Coefficients for equations Coefficients Standard Error Intercept (A) -3.727272727 1.7248787 Working hours (B2) 1.107438017 0.2217588
C. Number of Vehicles owned and Trips Number of Vehicles may increase or decrease the number of trips. A person with higher ownership of vehicles will not use public transport. However, sometimes he may try to save money on fuel and use other means of transportation. This analysis considers the following: Independent variables = Number of Vehicles Dependent variable = Number of trips Table 12: Regression analysis Multiple R 0.897234169 R Square 0.805029155 Adjusted R Square 0.780657799 Standard Error 0.691142946 Observations 10 Table 13: Coefficients for equations Coefficients Standard Error Intercept (A) 1.107142857 0.6786889 Vehicles (B3) 1.678571429 0.2920612
D. Number of Earning people, Working hours and Trips Number of people earning in the family and the working hours are directly related to each other and will also increase the trip generation. This analysis considers the following: Independent variables = Earning people, working hours Dependent variable = Number of trips Table 14: Regression analysis Multiple R 0.98641472 R Square 0.973013999 Adjusted R Square 0.965303713 Standard Error 0.274883253 Observations 10
All rights reserved by www.grdjournals.com
23
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat (GRDJE/ Volume 5 / Issue 11 / 002)
Table 15: Coefficients for equations Coefficients Standard Error Intercept (A) -2.272727273 0.6446584 Earning people (B1) 1.454545455 0.1943718 Working Hours (B2) 0.446280992 0.1185351
Number of Earning people, Vehicles owned and Trips If the number of people earnings more there is a higher chance of having more number of vehicles and hence their number of trips may increase individual but decrease in a group may increase the trips as the will be more trips for going to office from home and back and for other necessities. This analysis considers the following: Independent variables = Earning people, vehicles owned Dependent variable = Number of trips E.
Table 16: Regression analysis Multiple R 0.983504138 R Square 0.96728039 Adjusted R Square 0.957931931 Standard Error 0.302679545 Observations 10 Table 17: Coefficients for equations Coefficients Standard Error Intercept (A) -0.184782609 0.3693596 Earning people (B1) 1.391304348 0.2361474 Vehicle (B3) 0.684782609 0.2116876
F. Number of Working hours, Vehicles owned and Trips Working hours and vehicles owned does not have much of a correlation but both affect the trip generation. This analysis considers the following: Independent variables = Working hours, vehicles owned Dependent variable = Number of trips Table 20: Regression analysis Multiple R 0.932016134 R Square 0.868654073 Adjusted R Square 0.831126666 Standard Error 0.606439276 Observations 10 Table 21: Coefficients for equations Coefficients Standard Error Intercept (A) -1.636363636 1.6044875 Vehicles (B3) 1.045454545 0.4288173 Working Hours (B2) 0.537190083 0.291725
Number of Earning People, Working hours and Trip This analysis includes all three independent variables and the dependent variable. This analysis considers the following: Independent variables = Earning people, Vehicles and Working hours Dependent variable = Number of trips3 G.
Table 22: Regression analysis Multiple R 0.993372087 R Square 0.986788104 Adjusted R Square 0.980182156 Standard Error 0.207747109 Observations 10 Table 23: Coefficients for equations Coefficients Standard Error Intercept (A) -1.636363636 0.5496472 Working hours (B2) 0.311294766 0.1045865 Earning people (B1) 1.242424242 0.1696248 Vehicles (B3) 0.424242424 0.1696248
VI. RESULT AND DISCUSSION After the regression analysis with ‘Microsoft Excel’ equations are made for every different combination of independent and dependent variables. [9] [10] In the following equations,
All rights reserved by www.grdjournals.com
24
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat (GRDJE/ Volume 5 / Issue 11 / 002)
Y = Number of Trips X1 = Number of Earning people X2 = Number of Working Hours X3 = Number of Vehicles owned Sr. No. 1 2 3 4 5 6 7
Table 24: Generated equations Conditions Equation Number of Earning people v/s Trips Y = 0.296 + 0.459X1 Number of Working hours v/s Trips Y = -3.727 + 1.11X2 Number of Vehicles owned v/s Trips Y = 1.10 + 1.678X3 Number of Earning people and Vehicles owned v/s Trips Y = -0.1847 + 1.39X1 + 0.685X3 Number of Earning people and Working hours v/s Trips Y = -2.272 + 1.45X1 + 0.446X2 Number of Vehicles and Working hours v/s Trips Y = -1.636 + 1.04X3 + 0.537X2 Number of Working hours, earning people and Vehicles owned v/s Trips Y = -1.636 + 0.311X1 + 1.24X2 + 0.424X3
r2 0.918 0.757 0.805 0.967 0.973 0.869 0.987
There are two criteria to be taken into consideration for the final selection of the model which is best suited to estimate future trip generation: The regression coefficient r2 should be greater than 0 and as close to 1 as possible. It is preferable to not have negative values of constants in the equation. Based on the first criteria the model selected is (7) Number of Working hours, earning people and Vehicles owned v/s Trips Y = -1.636 + 0.311X1 + 1.24X2 + 0.424X3 r2= 0.987 However, here the constant is negative. Hence the best suited model is (1) Earning people v/s Trips Y = 0.296 + 0.459X1 r2= 0.918 And the equation is taken as the most influential independent variable affecting the trip generation.
VII.
LIMITATIONS
One of the most important factors for the successful analysis of the model is the legitimacy of the data collected through household survey, because this data is provided to us by the people we must consider a certain percent of human error. Factors affecting the trips and the number of factors considered may be large and may change as per individual preference which will give different results for different studies. [11] [12] The sample size is also a limitation, as it may not include a large area because it is physically difficult. However, census data can be used in this study if this data is up to date.
VIII. CONCLUSION The conclusion of the survey work can be explained by a few pointers. – Sample size affects the precision and accuracy of the coefficients and the constants. – The range of data should not be vast. This may result to faulty correlation analysis. – The best suited model should follow the two criteria and if the equation is not satisfactory another analysis and planning method can be selected. The best model here selected has a correlation coefficient of 0.958 and has no negative constants. Also, the number of earning people will affect the home-based trips largely as people come home and go to office individually and trips increases.
APPENDIX The data collected from the questionnaire is put up in a table called data sheets which is then used for correlation analysis and multiple linear regression analysis. The grouped data can be converted into numerical data by dividing it into smaller groups and counting the number of groups affecting the individual entry. For instance, the age group here age groups are divided into 0-16, 17- 30, 31-45,45-60,61 and above depending on their work efficiency and contribution. Sr. No. 1 2 3 4 5 6 7
Family Members (Number) 6 4 6 4 5 2 1
Income Range (Lakhs) 2.5 - 10 2.5 - 10 > 10 1 - 2.5 2.5 - 5 > 2.5 1- 2.5
Table 25: Data Collection from Household Survey Earning Vehicle Trips Trip People Ownership (Numbe Duration (Number) (Number) r) (Hours) 3 3 6 8am - 6pm 2 2 4 9am - 6pm 3 3 6 7am - 5pm 2 2 4 8am - 11 pm 3 2 5 8am - 11 pm 2 1 3 7am - 5pm 1 1 2 8 am - 6pm
Age Grou p 18-50 14-42 17-85 13-50 19-59 19-59 13-50
Working Hours (Hours) 9 7 8 8 7 6 6
All rights reserved by www.grdjournals.com
No. of Age Group 3 3 4 4 3 3 4
25
Analysis and Generation of Multiple Linear Regression for Residential Area of Vesu, Surat (GRDJE/ Volume 5 / Issue 11 / 002)
8 9 10
4 4 6
2.5 - 5 2.5 - 10 2.5 - 5
3 3 3
2 3 3
6 6 6
7am - 5pm 8 am - 6pm 9am - 6pm
15-65 15-54 13-70
9 8 9
4 4 5
ACKNOWLEDGMENT I would like to show my gratitude towards my mentor and guide Prof. Payal Zaveri for her guidance and encouragement throughout the duration of the research. I am extremely grateful for the words of wisdom and the support provided by her during the research. I would take this opportunity to thank her for her counseling and remarks on the earlier versions of the manuscript, however any errors are of my own and should not tarnish the reputation of the esteemed professor. I would also like to thank my family for their belief, support, and assistance in finishing the paper in limited frame of time.
REFERENCES K. V. K. R. Tom V. Mathew, “Trip generation. Introduction to Transportation Engineering.” NPTEL India. 2007. L. Kadiyali, Ninth edition, Traffic Engineering and Transport Planning, India: Khanna Publishers., 1999. S. Senthilnathan, “Usefulness of correlation analysis.” new zealand , 2019. T. B. &. S. Fidell, “Using multivariate statistics. (Third Edition).,” Harper Collins College Publishers, New York., 1996. S. K. Mahak Dawr, “Developing Trip Generation Model Utilizing Multiple Regression Analysis Case Stud. International,” Journal of Innovative Research in Science, Engineering and Technology, vol. 6, no. 2, 2017. [6] “Google Maps,” [Online]. Available: https://goo.gl/maps/rhopX18JaPahGk236. [7] G. P. H. Wootton, ““A Model for Trips Generated by Household”,” Journal of Transport Economics and Policy, pp. 137-153, 1967. [8] M. J. K. R. R. a. G. T. Ravi Gadepalli, “Multiple Classification Analysis for Trip Production Models Using Household Data: Case Study of Patna, India.” 2014. [9] B. N. T. a. C. D. Ramesh B. Ranpise, “Assessment and MLR Modeling of Traffic Noise at Major Urban Roads of Residential and Commercial Areas of Surat City,” Springer Nature Singapore., p. 2020. [10] F. A. W. A. C. Cameron, ““An R-squared measure of goodness of fit for some common nonlinear regression models”,” pp. 16-32, 1995. [11] A. B. Asokan Mulayath Variyath, “Variable selection in multivariate multiple regression,” 2020. [12] B. A. Sathiyaraj Rajendran, “Short term trafc prediction model for urban transportation using structure pattern and regression: an Indian context,” 2020. [1] [2] [3] [4] [5]
All rights reserved by www.grdjournals.com
26