e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
PREDICTING STOCK MARKET USING MACHINE LEARNING ALGORITHMS S. Vijayarani*1, E. Suganya*2, T. Jeevitha*3 *1Assistant *2Ph.D
Professor, Department of Computer Science, Bharathiar University, Coimbatore, India.
Research Scholar, Department of Computer Science, Bharathiar University, Coimbatore , India.
*3M.Sc
Student, Department of Computer Science, Bharathiar University, Coimbatore, India.
ABSTRACT The fundamental goal of this paper is to find the best model to forecast the estimation of the stock market. During the way toward considering different procedures and factors that must be considered and discovered that techniques like Random Forest, Support vector machine were not altered completely. In this paper is to present and survey a more feasible technique to predict the stock development with higher precision. The most important thing that have considered is the dataset of the stock market expenses from earlier year. The dataset was pre-processing and adjusted for actual analysis. Thus the paper will likewise concentrate on data preprocessing of the raw dataset. Besides, after pre-processing the data, will review the use of random forest, support vector machine on the dataset and the results it generated. Moreover, the proposed works shows at the uses of the prediction framework in real-word settings and issues related with the precision of the general qualities given. The paper additionally presents a machine learning model to predict the life span of stock in an inexpensive market. The effective prediction of the stock will be an extraordinary resource for the stock market foundations and will give genuine answers for the issues that stock holders face. Keywords: Random Forest Algorithm, Support Vector Machine, Stock Market Prediction.
I.
INTRODUCTION
The stock market is fundamentally a collection of different customers and suppliers of stock. A stock all in all speaks to proprietorship claims on business by a specific individual or a gathering of individuals. The attempts [3] to decide the future estimation of the stock exchange is known as a stock market prediction. The forecast is relied upon to be robust, exact and effective. The framework must work as indicated by the real-life situations and should be appropriate to real-world settings. The framework is likewise expected to consider all the factors that may influence the stock's worth and execution. There are different techniques and methods of actualizing the expectation framework like Fundamental Analysis, Technical Analysis, Machine Learning, Market Mimicry, and Time series aspect structuring. With the progress of the advanced time, the prediction has climbed into the technological domain. The most particular and [4] promising procedure includes the use of Artificial Neural Networks, Recurrent Neural Networks, that is fundamentally the usage of machine learning. Machine learning includes artificial intelligence which enables the framework to take in and improve from past encounters without being customized over and over. Customary techniques for prediction in machine learning use algorithm like Backward Propagation, otherwise called back propagation mistakes. Of late, numerous specialists are utilizing a greater amount of group learning strategies. It would utilize low cost and time [6] delays to predict future highs while another system would utilize slacked highs to predict future highs. These forecasts were utilized to form stock prices [1]. The datasets of the stock market prediction model contains details like the closing price opening price, the data and different factors that are expected to predict the object variable which is the price in a given day. The strategies used to predict the stock market incorporates a time series forecasting determining alongside technical analysis, machine learning demonstrating and predicting the variable stock market [2]. The main target is to structure a model that gains from the market data using machine learning systems and measure the future patterns in stock worth turn of events. Stock market prediction beats when it is treated as a regression issue however performs well when treated as a classification. The SVM method, that plot each and every information segment as a point in n-dimensional space (where n is the quantity of highlights of the dataset accessible) with the estimation of highlight being the estimation of a www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1054]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
specific arrange and, thus classification is performed by finding the hyper plane that differentiates the two classes explicitly. Predictive methods like Random forest technique are used for the identical. The random forest algorithm follows a group learning system for classification and regression [2]. The random forest algorithm takes the normal of the different subsamples of the dataset; this builds the predictive accuracy and decreases the over-fitting of the dataset.
II.
PROBLEM DEFINITION
Stock market prediction is essentially characterized as attempting to decide the stock value and offer a strong thought for the individuals to know and predict the market and the stock prices. It is by and large introduced utilizing the quarterly financial proportion utilizing the dataset. In this way, depending on a solitary dataset may not be adequate for the prediction and can give an outcome which is inaccurate. Thus, there is thinking about towards the investigation of machine learning with different datasets incorporation to predict the market and the stock patterns. The issue with assessing the stock prices will stay an issue if a superior stock market prediction algorithm isn’t proposed. Predicting how the stock market will perform is very troublesome [5]. The development in the stock market is normally controlled by the opinions of thousands of stockholders. Stock market prediction requires a capacity to predict the impact of ongoing occasions on the stockholders. These occasions can be political occasions like an announcement by a political pioneer, a bit of news on trick and so on. It can likewise be a worldwide occasion like sharp developments in exchanges, goods and services forms and ware and so on. Every one of these occasions influences the corporate profit, which thusly influences the slant of stockholders [7]. It is past the extent of practically all speculators to effectively and reliably predict these hyper parameters. Every one of these components makes stock value forecast exceptionally very difficult. When the correct data is gathered, it at that point can be utilized to prepare a machine and to create a predictive outcome.
III.
METHODOLOGY
This paper proposed the framework "Stock market price prediction" that have predicted the stock exchange value utilizing the random forest algorithm. In this proposed system, that the option to prepare the machine from the different information focuses from the past to make a future forecast [8]. Data are collected from the earlier year stocks to prepare the model. In this work is significantly utilized two machine learning libraries to solve the issue. The first was numpy, which was utilized to clean and control the data, and preparing it into a structure for analysis. The other was scikit, which was utilized for real analysis and forecast. The dataset was used from the earlier year's stock market gathered from the open database accessible on the web, 80 % of data was used to prepare the machine and the rest 20 % to test the data [12]. The fundamental methodology of the supervised learning model is to get familiar with the examples and connections in the data from the training set and afterward recreate them for the test data. This paper shows the python pandas library for data handling which joined various datasets into an data outline. The adjusted data outline permitted us to set up the data for feature extraction. The data frame highlights were date and the end cost for a specific day. The paper utilized every one of these highlights to prepare the machine on random forest model and predicted the object variable, which is the price for a given day. The proposed framework contacts various regions of research including data pre-handling, random forest, etc.
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1055]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
Fig.-1: System Architecture 1. Data Collection Data Collection is a fundamental module and the first step towards the project. It for the most part manages the collection of the correct dataset. The dataset that will be utilized in the market prediction must be used to be separated dependent on different perspectives. Data Collection additionally supplements to improve the dataset by including more information that are outer. Our data mostly comprises of the earlier year stock prices. At first, initially analyzing the Kaggle dataset and as indicated by the accuracy, that will utilize the model with the data to analyze the prediction accurately. 2. Data Pre Processing Data pre-processing is a piece of data mining, which includes changing raw data into a more lucid configuration. Raw data is generally, inconsistent or incomplete and as a rule contains numerous errors. The data pre-preparing includes looking at for missing qualities, searching for unmitigated qualities, parting the dataset into preparing and test set lastly do an element scaling to limit the scope of factors with the goal that they can be thought about on regular environs. 3. Training the Machine The idea behind the preparation of the model is that to find some initial qualities with the dataset and afterward enhance the parameters which have need to in the model. The preparation of the model incorporates cross-approval get a very much grounded estimated execution of the model utilizing the preparation data. Preparing the Machine Training the machine is like taking care of the data to the algorithm to finish up the test data. The test sets are untouched, as a model ought not to be made a decision about dependent on unseen data. 4. Data Scoring The way toward applying a predictive model to a lot of data is referred to as scoring the information. The procedure used to process the dataset is the Random Forest Algorithm. Random Forest includes a troupe strategy, which is typically utilized, for classification and just as regression. In light of the learning models, www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1056]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
accomplish that the interesting outcomes. The last module along these lines depicts how the result of the model can assist with predicting the likelihood of a stock to rise and sink dependent on specific parameters [10]. It likewise shows the weaknesses of a specific stock or substance. The client confirmation system control is actualized to ensure that lone the approved elements are getting to the outcomes. 5. Classification Classification is an occurrence of supervised learning where a set is evaluated and categorized depends on a common attribute. From the qualities or the data are given, classification draws some determination from the observed esteem. On the off chance that more than one info is given, at that point grouping will attempt to predict at least one result for the equivalent. A couple of classifiers that are utilized here for the stock market forecast incorporate the random forest classifier, SVM classifier. Random Forest Classifier Random forest classifier is a kind of group classifier and furthermore a supervised algorithm. It basically creates a set of decision trees, which yields some result. The basic approach of random class classifier is to take the decision aggregate of random subset decision tress and yield a final class or result based on the votes of the random subset of decision trees. Random Forest Algorithm is being utilized for the stock market forecast. Since it has been named as one of the most easiest to utilize and adaptable machine learning, it gives great accuracy in the prediction. This is normally utilized in the classification tasks. Due to the high instability in the stock exchange, the task of predicting is somewhat challenging. In stock market prediction and the work is utilizing random forest classifier which has the equivalent hyper parameters starting at a decision tree [15]. The decision tool has a model like that of a tree. It takes the decision dependent on potential results, which incorporates factors like occasion result, asset cost, and utility. The random forest algorithm speaks to a calculation where it randomly chooses various perceptions and highlights to construct a few decision trees and afterward takes the total of the few decision trees results. The data is part into partitions dependent on the inquiries on a label or an attribute. The dataset index that utilized was from the earlier year's stock market gathered from the open database accessible on the web, 80 % of data was utilized to prepare the machine and the rest 20 % to test the data. The fundamental methodology of the supervised learning model is to become familiar with the examples and connections in the data from the training set and afterward recreate them for the test data. Parameters The parameters remembered for the random forest classifier are estimator's which is complete number of decision trees, and other hyper parameters like oob-score to decide the speculation accuracy of the random forest, max. features which incorporates the quantity of highlights for best-split. Min. weight fraction leaf is the base weighted portion of the aggregate of loads of all the information tests required to be at a leaf hub [7]. Tests have equivalent weight when test weight isn't given. SVM classifier SVM classifier is a sort of discriminative classifier. The SVM utilizes supervised learning for example named training data. The outputs are a hyper plane which arranges the new dataset. They are supervised learning models that uses associated learning algorithm for classification and just as regression. The main task of the support vector machine algorithm is to recognize a N-dimensional space that noticeably orders the data points. Here, N represents various highlights. Between two classes of data points, there can be numerous conceivable hyper planes that can be picked. The goal of this algorithm is to locate a plane that has greatest edge. Maximizing edge refers to the distance between data purposes of the two classes. The advantage related with expanding the edge is that it gives some reinforcement so future data values can be all the more easily classified. Decision limits that help group data points are called hyper planes. In view of the situation of the data points relative toward the hyper plane they are credited to various classes. The component of the hyper plane depends on the quantity of attributes, in the event that the www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1057]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
quantity of characteristics is two, at that point the hyper plane is only a line, on the off chance that the quantity of properties is three, at that point the hyper plane is two dimensional. Parameters The tuning parameters of SVM classifier are kernel parameter, gamma parameter and regularization parameter. • Kernels can be classified as a straight and polynomial kernel computes the expectation line. In linear kernels prediction for info is determined by the dot product between the input and the support vector. • C parameter is known as the regularization parameter; it decides if the precision of model is increments or diminishes. The default estimation of c=10.Lower regularization esteem prompts misclassification. • Gamma parameter gauges the impact of a single training on the model. A low worth implies a long way from the conceivable edge and high qualities signifies closeness from the conceivable edge.
IV.
EXPERIMENTAL RESULTS
The xlxs record contains the raw data dependent on which will publish our findings. There are eight attributes and 1236 instances that describe the rise and fall in stock prices. Some of these attributes are date, open, high, low, last, close, total trade quantity, and turnover. The columns Open and close represent the starting and final price at which the stock is traded on a particular day. High, Low and Last represent the maximum, minimum, and last price of the share for the day. Total Trade Quantity is the number of shares bought or sold in the day and Turnover (Lacs) is the turnover of the particular company on a given date.
Fig.-3.1: Stock dataset for Yahoo Finance
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1058]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
Fig.-3.2: Time series plot of GP The plot is of the attributes “CLOSE PRICE” vs “DATE”. The figure provided below is the candle stick plot. The candle stick plot was generated using the attributes 'DATE', 'OPEN PRICE', 'HIGH', 'LOW','CLOSE PRICE'.
Fig.-3.1: Candlestick plot Historical prices Historical prices are obtained from Yahoo Finance. Every exchange date comprises of open value, close value, low value, high value, adjusted close price and volume exchanged on that day. Adjusted close price and close price depicts the closed price of stock on a specific day [13]. Adjusted close price will be adjusted for profits and parts. Adjusted close price is considered as stock price as in different investigates.
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1059]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
Fig.-3.2: Histogram of CLOSEP-OPENP
www.irjmets.com
Fig.-3.3: Histogram of HIGH-LOW
The over two figures are histograms plotted among CLOSE PRICE and OPEN PRICE and the attributes HIGH and LOW. This is done on the grounds that accept the present closing price and opening price along with the high and least price of the stock during a year ago will influence the price of the stock sometime in the future. Based on such reasoning contrived logic if today’s CLOSE PRICE is greater than yesterday's CLOSE PRICE, at that point assign the value 1 to DEX or else assign the value - 1 to DEX. Utilizing the sklearn libraries import SVC classifier and fit it with the training data. In the wake of preparing the model with the data and running the test data through the prepared model the confusion matrix is demonstrated as follows. Table-4.1: Confusion Matrix Precision
Recall
F1-score
-1.0
0.79
0.93
0.86
1.0
0.87
0.68
0.75
Micro average
0.81
0.81
0.81
Macro average
0.83
0.77
0.76
Weighted average
0.85
0.79
0.78
Along with this, utilize the same dataset to prepare another model. This model uses the Random Forest Classifier having a place with the ensemble technique. Subsequent to fitting the model with the information and running it against predicting information that find this has a precision score of 85. To summarize it, the precision of the SVC Model in Test Set is 79 though the accuracy score of the random forest classifier is determined to 85. Table-4.2: Algorithm Accuracy Algorithm
Accuracy %
Support Vector Machine (SVM)
79.7 %
Random Forest Algorithm
84.8 %
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1060]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020
Impact Factor- 5.354
www.irjmets.com
ALGORITHM ACCURACY SVM
RANDOM FOREST
180 160 140
(%)
120 100 80 60 40 20 0
1
2
Machine Learning Algorithms Fig.-3.4: Accuracy Measure
V.
CONCLUSION
By measuring the accuracy of the various algorithms, found that the most reasonable algorithm for predicting the market price of a stock dependent on different data focuses from the historical data is the random forest algorithm. The algorithm will be an great resource for representatives and financial specialists for investing money in the stock market since it is prepared on an huge collection of historical data and has been picked after to being tested on an sample data. The project exhibits the machine learning model to predict the stock prices with more accuracy when compared with recently implemented machine learning models. FUTURE ENHANCEMENT In the future, the stock market prediction can be further improved by utilizing a much bigger dataset than the one being utilized currently. This would help to increase the accuracy of our prediction models. Furthermore, other models of Machine Learning could also be studied to check for the accuracy rate resulted by them. Future scope of this project will include adding more parameters and elements like the financial proportions, numerous examples, and so on. The more the parameters are considered more will be the accuracy. The algorithm can likewise be applied for analyzing the substance of open comments and along these lines decides patterns and connections between the client and the corporate representative. The use of conventional algorithms and data mining techniques can likewise help to predict the corporation’s execution structure all in all.
VI. [1] [2] [3] [4] [5]
REFERENCES
Ashish Sharma, Dinesh Bhuriya, Upendra Singh. "Survey of Stock Market Prediction Using Machine Learning Approach", ICECA 2017. Loke.K.S. “Impact Of Financial Ratios And Technical Analysis On Stock Price Prediction Using Random Forests”, IEEE, 2017. Xi Zhang1, Siyu Qu1, Jieyun Huang1, Binxing Fang1, Philip Yu2, “Stock Market Prediction via Multi-Source Multiple Instance Learning.” IEEE 2018. VivekKanade, BhausahebDevikar, SayaliPhadatare, PranaliMunde, ShubhangiSonone. “Stock Market Prediction: Using Historical Data Analysis”, IJARCSSE 2017. SachinSampatPatil, Prof. Kailash Patidar, Asst. Prof. Megha Jain, “A Survey on Stock Market Prediction Using SVM”, IJCTET 2016.
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1061]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:12/December -2020 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
Impact Factor- 5.354
www.irjmets.com
https://www.cs.princeton.edu/sites/default/files/uploads/Saahil_magde.pdf Hakob GRIGORYAN, “A Stock Market Prediction Method Based on Support Vector Machines (SVM) and Independent Component Analysis (ICA)”, DSJ 2016. RautSushrut Deepak, ShindeIshaUday, Dr. D. Malathi, “Machine Learning Approach In Stock Market Prediction”, IJPAM 2017. Pei-Yuan Zhou, Keith C.C.Chan, Member, IEEE, and Carol XiaojuanOu, “Corporate Communication Network and Stock Price Movements: Insights From Data Mining”, IEEE 2018. K. Hiba Sadia, Aditya Sharma, Adarrsh Paul, SarmisthaPadhi, Saurav Sanyal. Stock Market Prediction Using Machine Learning Algorithms 2019 Sreelekshmy Selvin, Vinayakumar R, Gopalakrishnan E.A, Vijay Krishna Menon, Soman K.P. STOCK PRICE PREDICTION USING LSTM,RNN AND CNN-SLIDING WINDOW MODEL 2019 Manish Agrawal, Asif Ullah Khan, Piyush Kumar Shukla. Stock Price Prediction using Technical Indicators: A Predictive Model using Optimal Deep Learning 2019 Ishita Parmar, Navanshu Agarwal, Sheirsh Saxena, Ridam Arora, Shikhin Gupta, Himanshu Dhiman, Lokesh Chouhan. Stock Market Prediction Using Machine Learning 2019 Aparna Nayak, M. M. Manohara Pai∗ and Radhika M. Pai. Prediction Models for Indian Stock Market 2016 Qasem A. Al-radaideh, Adel Abu Assaf, Eman Alnagi. Predicting Stock Prices Using Data Mining Techniques
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[1062]