ISSN 2394-3777 (Print) ISSN 2394-3785 (Online) Available online at www.ijartet.com
International Journal of Advanced Research Trends in Engineering and Technology (IJARTET) Vol. 4, Issue 12, December 2017
Large-scale Sentiment Analysis Using Hadoop Dasari Prasad1, G.N.V.G. Sirisha2, G. Mahesh3, G.V. Padma Raju4 P.G Student, CSE Department, SRKR Engineering College, Bhimavaram, India 1 Assistant Professor, CSE Department, SRKR Engineering College, Bhimavaram, India 2 Associate Professor, CSE Department, SRKR Engineering College, Bhimavaram, India 3 Professor, CSE Department, SRKR Engineering College, Bhimavaram, India 4 Email: {prasad.dasari11261, sirishagadiraju2, gadirajumahesh3, gvpadmaraju4}@gmail.com
Abstract: Sentiment analysis involves the usage of text analytics to identify and categorize the polarity of opinions expressed in a piece of text. Sentiment analysis analyzes the intension of a customer from a given feedback text. Supervised machine learning techniques are one of the popular methods for sentiment analysis. The accuracy of the algorithms increases with the increase in size of training data. Large volumes of user reviews are available online, to leverage them Hadoop-based Sentiment Analysis system is proposed in this paper. The proposed system applies Naïve Bayesian Classifier for detecting the polarity of users’ opinions. The system achieved 94% accuracy and 86% accuracy when tested on two datasets namely product review dataset and movie review dataset. These accuracies even without applying pre-processing steps like Parts of Speech tagging. Keywords: sentiment analysis, Hadoop, supervised machine learning, opinions, text analytics, naïve Bayesian Analysing the sentiment polarity of a piece of text has I. INTRODUCTION become the need of the day. From data granularity point of Sentiment analysis helps in identifying the writer’s View, there are three levels of sentiment analysis; document attitude towards an individual, organization, event, product level, sentence level and aspect (feature) level. There is not or topic is positive, negative or neutral. World Wide Web much difference between document level and sentence level has become a part of everyone’s life. More and more people sentiment analysis as sentences are nothing but short are using www for a number of tasks like information documents as in [1]. Sometimes, the writer expresses retrieval, e-commerce, social networking etc. it has enabled different views about different aspects (features) of same companies, organizations, political parties to easily reach product. their customers and citizens through online advertising, Human beings are social beings and it is natural for us to online campaigning. People are also using blogs, social ask the opinion of others before we make any choice. Earlier networking sites to share their opinions with friends, family generally people used to take the opinions of friends, family and society at large. but with the availability large volumes of opinion data online Compared to traditional media, broadcasting and in the form of reviews, ratings etc. we now rely on them. So, narrowcasting are much easier and cost effective with World it is also necessary to automatically identify the polarity of Wide Web. This is only one side of the coin; the other side large review datasets and consolidate them so that it of coin is that the fake news, negative opinions are also becomes easy for the users to use them. spreading at a faster rate through www. People are sharing A number of tools and techniques were developed in the their dislikes, dissatisfaction, and anger about a product, area of sentiment analysis over the past decade. Some of the event, individual or party through blog posts, reviews and examples for tools are face book insights, red opal etc. as in social media. If actions like immediate dialogue with [3]. Sentiment analysis methods are broadly classified into dissatisfied customers, machine learning methods, lexicon based methods and Compensation to product errors are taken the spread of hybrid methods as in [2]. Machine learning methods in turn negative sentiment can be reduced. Organizations and classified as supervised and unsupervised method. Lexicon governments are resorting to find the success of based approaches are classified as dictionary based approach product/scheme by analysing customers or citizen’s response and corpus based approaches. Hybrid approaches uses both in social media networks, reviews, and tweets etc. so, machine learning and lexicon based method. Among these
6 All Rights Reserved © 2017 IJARTET