Clear Journal MARCH 2017 Edition

Page 1

CLEAR DECEMBER 2016

1


CLEAR DECEMBER 2016

2


CLEAR Journal (Computational Linguistics in Engineering And Research) M. Tech Computational Linguistics, Dept. of Computer Science and Engineering, Govt. Engineering College, Sreekrishnapuram, Palakkad678633 www.simplegroups.in simplequest.in@gmail.com Chief Editor Dr. Ajeesh Ramanujan Assistant Professor Dept. of Computer Science and Engineering Govt. Engineering College, Sreekrishnapuram, Palakkad678633 Editors Ayishathahira C H Manjusha P D Rahul M Sreelakshmi K Cover page and Layout Rahul M

Editorial………………………………………… 4 News & Updates…………………………….5 CLEAR June 2017 Invitation………………………………………19 Last word………………………………………20

Cyber Bullying Detection........................................06 Nijila M, Sruthy K Bhaskaran Sentiment Analysis: Amazon Product Review Data………....................................09 Silpa K S Analysing Human Activity from Mobile Phone Call Detail Records ......................................................13 Fathima Shirin A, Shabna Nasser, Fathima Riswana K, Fathima Shabana K Analyzing Sentiments of Visual Contents......................................17 Uma E.S, Jenny George

CLEAR DECEMBER 2016

3


Dear Readers! Greetings! This edition of CLEAR magazine contains articles about some trending topics like Sentiment Analysis from Amazon Data, Cyber Bullying Detection, Analysing Human Activity from Mobile Phone Call Detail Records and Analyzing Sentiments of Visual Contents. In our last edition, we focused mainly on researches done in the field of information retrieval, like Semantically Similar Short Text Retrieval, Entity-centric Summarization, Automatic Summarization of Student Feedback and Detection of Review spams. Our readers include a faction of people who have shown a keen interest in natural language engineering. They have continuously encouraged and criticized all our efforts and it has served as a catalyst to the entire CLEAR team. On this hopeful prospect, I proudly present this edition of CLEAR to the readers and look forward to your opinions and criticism. Best Regards, Dr. Ajeesh Ramanujan (Chief Editor)

CLEAR DECEMBER 2016

4


Invento’17 Humanity through Technology

Invento’17, the national level technical fest was organized at GEC Sreekrishnapuram on 10th, 11th and 12th of March 2017. Honourable Education minister of Kerala, Prof. C .Raveendranath made the momentous kick off of Invento'17 by launching its official logo on 29th October 2016.

Various departments hosted interesting events, workshops and exhibitions. Department of Computer Science and Engineeering organized competition events like Hackathon, Coding, Counter Strike: Global Offensive, Digital Painting: Da vinci, Quizy Puzzles, Enigma and Code Relay.

CLEAR DECEMBER 2016

5


Cyber Bullying Detection

Nijila M1, Sruthy K Bhaskaran2 M.Tech Computational Linguistics Government Engineering College, Sreekrishnapuram nijilabhaskar11@gmail.com1, sruthybhaskaran714@gmail.com2

With the increase of social networking sites, online activity and messaging apps, cyber bullying is on the increase. Most of the apps and social networking sites are for people aged 13 and over. They also state that bullying, abusive behaviours which includes harassment, impersonation and identity theft are banned and not allowed. However, results from our national bullying survey, shows 91% of people who reported cyber bullying said that no action was taken. This can leave users feeling disbelieved, vulnerable and knock their self-esteem.

Cyberbullying Detection with User Context: Studies on automatic cyberbullying detection are few and typically limited to the individual comments and do not take context into account. This method show that taking user context, such as a user’s comments history and user characteristics, into account can improve the performance of detection tools for cyberbullying incidents considerably. The task is treated as a supervised classification task.

1) Context based method which employs a supervised algorithm.

There are so many video sharing websites like Vimeo, Daily motion , Youtube and so on. Out of all these YouTube is considered as king in the field of video sharing. Through YouTube users can share their mood, thoughts, ideas, by sharing the videos and posting comments on videos. This make it a platform that is eligible for bullying and therefore an appropriate platform for collecting datasets for cyberbullying studies. This approach employs a supervised learning method for detecting whether a particular comment is bullying or non-bullying.

2) Based on Semantic-Enhanced Marginalized Denoising Auto-Encoder.

The following three feature sets were used to train cyberbullying classifier.

There is an urgent need to study cyberbullying in terms of its detection, prevention and mitigation. Cyberbullying is defined as an aggressive, intentional act carried out by a group or individual, using electronic forms of contact, repeatedly and over time, against a victim who cannot easily defend him or herself. There three methods that are used to detect cyberbyllying:

3) Based on Text-Stream ClassiďŹ cation. Content-Based Features: These features are based on the contents of the comments itself CLEAR DECEMBER 2016

6


and are frequently used for sentiment analysis. The following features are included: 1) The number of profane words in the comment, based on a dictionary, normalized by the total number of words in the comment. The dictionary consists of 414 profane words including acronyms and abbreviation of the words. The majority of the words are adjectives and nouns. 2) To detect the comments which are personal and targeting a specific person, we included the normalized number of first and second person pronouns in the comment, based on a list of pronouns. 3) Profanity windows of different sizes (2 to 5 words) were chosen. These are Boolean features which indicate whether a second person pronoun is followed by a profane word within the size of the window. 4) To capture explicit emotions, the number of emoticons was counted and normalized by the number of words. And finally 5) to capture shouting in comments, the ratio of capital letters in a comment was computed. Cyberbullying Features. The second set of features aims at identifying frequent bullying topics such as minority races, religions and physical characteristics. It consists of: 1) the (normalized) number of cyberbullying words, based on a manually compiled dictionary, and 2) in order to detect typically short bullying comments, the length of the comment. User-Based Features. To be able to exploit information about the background of the users in the detection process, we looked at the history of user’s activities in our dataset and used the averaged content-based features on the users’ history to see whether CLEAR DECEMBER 2016

there was a pattern of offensive language use. We checked the frequency of profanity in their previous comments.

Based on Marginalized Encoder :

Semantic-Enhanced Denoising Auto-

Machine learning techniques make automatic detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. This method propose a new representation learning method to tackle cyberbullying problem. The method named Semantic-Enhanced Marginalized Denoising Auto-Encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising auto encoder. The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn a robust and discriminative representation of text. ADVANTAGES

1.

2. 3.

Proposed method is able to exploit the hidden feature structure of bullying information To learn a robust and discriminative representation of text. It makes automatic detection of bullying messages in social media networks

7


4.

And this could help to construct a healthy and safe social media environment.

Cyberbullying Detection based on Text-Stream Classification: Here proposed a novel cyberbullying detection approach under a text streaming scenario and using a one-class classification technique to detect cyberbullying instances, both effectively and automatically. This method devised a way to train our system using a small set of positive trainings only, where the system automatically extracts reliable negative and more strong positive samples for training from the huge amount of unlabelled data. In addition, to this investigations, it was found that cyberbullying detection is a complex phenomenon because of the difference between cyberbullying, cyber-teasing and cyber-jokes. This differentiation leads to false positive and false negative cases, hence diminishes performance of the classifier, as cyberbullying is a subject to personal feeling and interpretation. This method can also be extended by incorporating a sophisticated user feedback system to filter out cyberbullying like

CLEAR DECEMBER 2016

instances such as cyber teasing or cyber jokes cases from the real cyberbullying. This will improve the learning (discriminating) capability of the classifiers. More advanced bullying specific features including users cyberbullying patterns will be incorporated along with the baseline swear-keywords method in the session-based one-class ensemble learner scheme. It will also be interesting to look at the user groups, or circles, to which users belong. For instance, Facebook enables its users to create and add friends to various groups such as close friends, family, acquaintances etc, while Google allows people to create and add to circles. These groups indicate the relationship between the users to some extent, which will be helpful to detect cyberbullying more accurately, as cyberbullying, also known as peer victimization, where cyberbullies include friends, siblings, fellow students and people unknown to the victim.

References 1. Dadvar, M., Trieschnigg, D., Ordelman, R. & de Jong, F. (2013), Improving cyberbullying detection with user context, in ‘European Conference on Information Retrieval’, pp. 693–696.

8


Sentiment Analysis: Amazon product review data Silpa K S M Tech Computational linguistics Government Engineering College, Sreekrishnapuram silpasasidharan75@gmail.com

Sentiment analysis which is also known as opinion mining, studies people’s sentiments towards certain entities. Sentiment analysis on product review data have great importance. From business perspective it helps both the customer and the company, which release the product. According to the review customer can buy good products and the company can improve the quality of their product. Data used in this article is the set of product reviews collected from Amazon and the rating is based on a star-scaled system, where the highest rating has 5 stars and the lowest rating has only 1 star.

Fundamental problem of sentiment analysis is sentiment polarity categorization. The sentiment polarity categorization process is depicted in the above flowchart. In this, phase 2 and phase 3 are comparatively complex than phase 1. In phase 1, Analyze the review data, which obtained from the amazon. Next step is the extraction of sentiment sentence. Sentiment sentence is a sentence which contain at least one sentiment word, either positive or negative. Subjective sentence are mainly contain sentiments, so extract those kind of sentence from the dataset and do pre-processing. Part-Of – Speech tagging is also given to the tokenized words. POS tagging is very useful in the sentiment analysis because of two reason 1) Nouns and pronouns do not contain any sentiment. 2) It distinguish words that can be used in different parts of speech. The sentiment words are mainly include in the category of adjective, adverb and verbs.

CLEAR DECEMBER 2016

9


In phase 2, an algorithm is proposed and implemented for negation phrases identification. The reason for a separate algorithm for negation phrase identification is that, with a negative prefix the sentiment of particular word is reversed. The examples of negative prefixes are no, not, nothing etc. There are two kind of such phrases. 1) Negation-Of-Adjective (NOA). 2) NegationOf-Verb (NOV).

In the next step a mathematical approach is proposed for sentiment score computation. Given a token t, the formula for t’s sentiment score (SS) computation is given as:

CLEAR DECEMBER 2016

Occurrencei (t) is t’s number of occurrence in i-star reviews, where i = 1... 5. Since 5-star reviews take a majority amount through the entire dataset, there is a đ?›ž 5,i ,which is the ratio of number of 5 star to number of i star reviews. Consequently, every sentiment score should fall into the interval of [1, 5]. For positive word tokens, the median of sentiment score should exceed 3. For negative word tokens, the median should be less than 3. There by the positive and negative polarity words can be identified. Feature vector formation is an important step in sentiment polarity categorization. Sentiment tokens and sentiment scores are considered as the features. In order to train the classifiers, each entry of training data needs to be transformed to a vector that contains those features, namely a feature vector. The problem with the feature vector is about its dimensionality. The challenge is actually twofold: Firstly, a vector should not contain an abundant amount (thousands or hundreds) of features or values. Secondly, every vector should have the same number of dimensions, in order to fit the classifiers. These problems can be rectified by introducing two binary strings (for word and phrase), which represent each token’s appearance. For instance, if the ith word (phrase) token appears, the word (phrase) string’s ith bit will be flipped from “0" to “1". Finally, instead of directly saving the flipped strings into a feature vector, a hash value of each string is computed using Python’s built-in hash function and is saved. Hence, a sentencelevel feature vector totally has four elements: two hash values computed based on the 10


flipped binary strings, an averaged sentiment score, and a ground truth label. Experiments for both sentence-level categorization and review-level categorization have been performed. In sentence-level categorization 200 feature vectors are formed based on the 200 manually-labelled sentences. In which Random forest model gives the best performance. The experiments were also done in machine labelled sentences. 2million feature vectors (1 million with positive labels and 1 million with negative labels) are generated from 2-million machine-labelled sentences. Again random Forest model gives the best performance in it. In review-level categorization, 3-million feature vectors are formed for the categorization. It can be clearly observed that both the SVM model and the NaĂŻve Bayesian model are identical in terms of their performances. Both models are generally superior than the Random Forest model on all vector sets. Methods: The classification models selected for categorization are: NaĂŻve Bayesian, Random Forest, and Support Vector Machine. NaĂŻve Bayesian classifier: The NaĂŻve Bayesian classifier works as follows: Suppose that there exist a set of training data, D, in which each tuple is represented by an n-dimensional feature vector, X = x1, x2, .., x2, indicating n measurements made on the tuple from n attributes or features. Assume that there are m classes, C1, C2... Cm. Given a tuple X, the classifier will CLEAR DECEMBER 2016

predict that X belongs to Ci if and only if: P(Ci|X) > P(Cj|X), where i, j ∈[ 1,m] and i ≠j. P(Ci|X) is computed as: P(Ci|x)âˆ?đ?‘›đ?‘˜=1 đ?‘ƒ(đ?‘‹đ?‘˜|đ??śđ?‘–). đ?‘ƒ(đ??śđ?‘–)

Random forest: It is essentially an ensemble method based on bagging. The classifier works as follows: Given D, the classifier firstly creates k bootstrap samples of D, with each of the samples denoting as Di. A Di has the same number of tuples as D that are sampled with replacement from D. By sampling with replacement, it means that some of the original tuples of D may not be included in Di whereas others may occur more than once. The classifier then constructs a decision tree based on each Di. As a result, a “forest" that consists of k decision trees is formed. To classify an unknown tuple, X, each tree returns its class prediction counting as one vote. The final decision of X’s class is assigned to the one that has the most votes. Support vector machine: Support vector machine (SVM) is a method for the classification of both linear and nonlinear data. If the data is linearly separable, the SVM searches for the linear optimal separating hyper plane (the linear kernel), which is a decision boundary that separates data of one class from another. Mathematically, a separating hyper plane can be written as: W.X+b=0, where W is a weight vector and W = wi,w2, ..., wn. X is a training tuple. b is a scalar. In order to optimize the hyper plane, the problem essentially transforms to the minimization of ‖đ?‘Šâ€–, which is eventually computed as:∑đ?‘›đ?‘–=1 đ?›źđ?‘–đ?‘Śđ?‘–đ?‘Ľđ?‘–,

11


where αi are numeric parameters, and yi are labels based on support vectors, Xi.

Wibe

If the data is linearly inseparable, the SVM solve the problem by using kernel functions. The Gaussian Radial Basis Function (RBF) is taken as the kernel function for this.

What's New in Android O?

Wibe is a combination of Wikipedia and YouTube

that

save’s

the

time

in

switching over to YouTube and searching for videos relevant to the topic we were exploring product Starting with Android 7.0, Android can restrict

certain

activities

an

application wants to do while it's in the background. Android O builds on this beginning and places top priority on saving power and improving battery life without the user (that's us!) having to do anything or install anything. With Android O Google is introducing

new Notification channels: grouping notifications together by their type. Platform support for autofill means better security and a powerful way for an

application

to

store

repetitive

information. Android O promotes fonts

on that

Wikipedia. increases

Wibe

is

the

“Interest

Quotient” of Wikipedia and

YouTube

content by combining the power of both at a single place. We can explore this combined power of both just by installing the Wibe’s chrome extension after which we can watch out the relevant Videos corresponding to the Wikipedia Page. Wibe

provides

corresponding

videos to

every

from

YouTube

heading

and

subheading of Wikipedia article, so that we can get a better insight about every part of the Wiki Article. Wibe also gives you “Stick Tabs” feature by which we can stick a video tab on top of page while scrolling through Wikipedia content.

to a full resource type.

CLEAR DECEMBER 2016

a

12


Analysing Human Activity From Mobile Phone Call Detail Records Fathima Shirin A1, Shabna Nasser2, Fathima Riswana K3, Fathima Shabana K4 M.Tech Computational Linguistics Government Engineering College 1

2

3

Fathimashirin94@gmail.com , shabnanasser@gmail.com , fathimariswana024@gmail.com , 4 fathimamkd1993@gmail.com

Mobile phones or smartphones are rapidly becoming the central computer and communication device in people’s lives. Importantly, today’s smartphones are programmable and come with a growing set of cheap powerful embedded sensors, such as an accelerometer, digital compass, gyroscope, GPS, microphone, and camera, which are enabling the emergence of personal, group, and community scale sensing applications. Mobile device is a very convenient tool to understand city dynamics, environment and the behavioural patterns of the people within. This kind of data can be very effective in understanding the relationship of the people and their daily activities in the context of an urban setup. By analysing such data, it is possible to identify patterns and relations, revealing insightful information about the city, which in turn facilitates the authority, service providers and citizens with a better way of understanding, decision-making, discovery and exploration of the urban life. Again, analysis of human behaviour and mobility patterns is a very important research topic in various fields such as geography, urban and

CLEAR DECEMBER 2016

transportation planning, telecom sector, social science, and human psychology. With the increasing use of mobile devices, now it is possible to collect different data about the day-to-day activities of personal life of the user. CDR data is used to identify user activities in a layered approach for analysing and linking them with different facts like City dynamics, working pattern, Mobility of citizens, the status of transportations etc. What is a CDR data? It is necessary to know what a CDR data contains, how the data are used here and how and on what the information mining is performed. Call detail record (CDR) is a data record generated by telecommunication equipment like a telephone exchange or cell tower. This data records are log files containing details of a single instance of communication activity, like voice calls, short messaging service (SMS) texts, and Internet and data services initiated by the phone user, which has been processed by specific telecommunication equipment. The mobile phone service providers keep records of all outgoing communication activity of mobile devices 13


for billing and other purposes. Every single entry of CDR data includes the following parameters: a random ID number of the phone, independent of the user, device and phone number; the exact time and date; call duration and location in latitude and longitude of the cell tower that provided the network signal for the communication activity of the mobile device. The CDR data is stored in encrypted files for ensuring the anonymity and privacy of the mobile device users. The working principle of our model is that in each layer we have used a set of procedures to find out different information or patterns about the user by mining the raw CDR data and the information obtained in the previous layers. In each layer CDR data mining and the information derived from it in different layers for identifying further activities in consecutive layers. As proceeding deeper within the layers, obtain more detailed information using the facts obtained in the previous layers. Using location data and frequency of calls in different locations we have conveniently detected the home, working places, places of interests, etc. Similarly, information like places of interest for the citizens in different times of the day, working days and holidays are detectable using the CDR. In the subsequent layers, it is very convenient to find out the traveling distance and area covered by users for developing a mobility pattern for the people in a city. Also classified the users on the basis of their mobility in different classes like, frequent travellers, regular travellers, random travellers, etc. Using the information about CLEAR DECEMBER 2016

home and workplaces, distinguished the regular working people from the nonworking people like housewives, retired and unemployed people. Further, here categorized the working people in groups like regular workers, irregular workers etc. Also, discovered additional information like, usual weekends and working days of the citizens. The analytical model can identify residential and commercial areas, important stay locations of the citizens, the population density of people in different areas of the city in various times of the day to determine the city dynamics in multiple layer. The complexity of discovered information increases with the number of layers used. The working of different layer is different, mining based on different attributes. The hierarchical layered approach is shown in fig.1. In first layer, used the CDR data to get the usual stay locations of the users. For every user, in the CDR data have their locations with the date and times of the call activity made. For each user, identified all the unique locations and the number of calls made from each unique location. Key to the query is the unique ID for each individual. By sorting the unique locations on the basis of number of calls made, found the top stay locations of a user. Total calls give an idea of place of interest. In the next layer, when the home and workplace of the regular working group is found, here used this information to calculate the distance between their home and working place. Here calculated it from the coordinates of their home and working place that found in the first layer. The haversine formula is used to calculate the great-circle distances 14


between the home and workplace from their longitudes and latitudes. The haversine formula is an equation important in navigation, giving great-circle distances between two points on a sphere from their longitudes and latitudes. It is a special case of a more general formula in spherical trigonometry, the law of haversines, relating the sides and angles of spherical triangles.

Fig.1: The hierarchical approach

Locations as the key, calls at working hours and off hours are identified. Sorting the data gives an idea of stay location. For each of the stay locations, if a user made most of the calls in the working hours, we can consider it as his workplace. On the other hand, if the user made most of the calls from a stay location during the off-hours or in the CLEAR DECEMBER 2016

usual holidays, considered it as his home. When applied this method on the CDR data of one month, found two groups of people. One group, the regular workers, has a certain call activity pattern which enabled us to distinguish their home and workplace. Another group has no regular working pattern for distinguishing home and workplace and they are the irregular workers. A map marker is placed in the locations from where he has made one or more calls. The home and workplace are marked with different symbols. Beside every marker the number of calls made from that location is mentioned. From this visualization, get the complete picture of the mobility and city area covered by a single user determined by activity level in different times. Here the query is based on the exact time given CDR .It is also possible to determine his other places of interest and the usual routes of traveling in the city. For understanding the city dynamics, can detect the call activities of all users in any given time slot of the day. It gave us an idea about intensity of activity in the city during different periods of the day, as phone calls are directly related to other regular activities of urban life. By applying this method on all regular workers in our CDR data, can found their distances travelled for attending workplace from home. Based on this information classify to some groups for having some idea about the traveling pattern of the working people. From the information found in this layer, can proceed to further layers for finding more information. From their home and workplace information, can find their working days and weekends, 15


sleeping habits, frequently visited places, places of interest other than home and workplace, traveling pattern, mobility pattern, etc. And by adding more layers can investigate deeper into the behavioral and mobility patterns of users from CDR data. The main application of this model to

provide the concerned people with tools that help them in better decision making discovery understanding and exploration of the city, for research in the ďŹ elds of geography, urban and transportation planning, telecom sector, business, social science, human psychology, etc.

IBM Watson

IBM’s Watson is at the forefront of a new era of computing, Cognitive

computing. It’s radically a new kind of computing, very different from the programmable

systems

that

preceded

it.

Watson

and

its

cognitive

capabilities mere some of the key cognitive elements of the human expertise. System that reason about problems like a human does. Just as humans becomes experts going through the process of observation, evaluation and decision making, Cognitive systems like Watson use similar processes to reason about the information they read. Watson can also do this at massive speed and scale. When it comes to text, Watson does not just look for keyword matches and synonyms like a search engine. It does this by breaking down a sentence grammatically, relationally and structurally deciding meanings from the semantics of a certain material.

CLEAR DECEMBER 2016

16


Analyzing sentiments of visual contents Uma E.S1, Jenny George2 M.Tech Computational Linguistics Government Engineering College, Sreekrishnapuram jennygeorge763@gmail.com1, umasankaranarayanan03@gmail.com2

Presenting Information through text can be consider as the traditional approach. Apart from this, using visual contents to present the topics has gained importance in many fields. Visual sentiment analysis framework can predict the sentiment of an image by analysing the image contents. This is widely used in social medias to express the reviews about products and services. The video and image sentiments can strengthen opinions in the content. This may create unpredictable effects some way especially in social media communications. Understanding sentiment expressed in visual content will also help in entertainment and educational applications. Sentiment analysis on such contents will be helpful to identify user responses and can be used to predict political elections, measure economic indicators, and so on. Using ANP: Adjective Noun Pairs (ANP) is a visual representation that describes visual features by text pairs, such as “cloudy sky”, “colourful flowers”. It is formed by merging the low-level visual features to the detected mid-level objects and map-ping them to a dictionary. They are very useful for detecting emotions contained in the images. ANP’s CLEAR DECEMBER 2016

combine sentiment strength of adjectives and detectability of nouns. Examples are “beautiful car”, and “happy dog”. ANPs are discovered based on co-occurrence since they do not directly express emotions. There are many challenges while modelling visual sentiment concepts. Important challenges are: 1) Object based concepts need to be localized. It is very difficult to localize each concept. Because there is a lack of bounding box annotation of ANP. If we model concepts using features of image then it will lose specific characteristics of the objects. 2) Ambiguity Semantically related adjectives have similar visual reflexes in images. So visual sentiment concepts annotation are ambiguous. Examples are “cute dog” and “adorable dog”. An efficient method to solve ambiguity is to model object first and then model object based sentiment attributes. Detection of nouns and modeling visual adjective: Visual sentiment concepts are inconsistent across objects. The features that effectively detect sentiment concepts is different from object detection. In order to 17


extract features we need to first localize the object. Objects (nouns) are much easier to detect than visual sentiment attributes. Removing background interference leads to success for the difficult task of sentiment attribute classification. ANP classification is the second step in the hierarchy. The goal is to train classifiers for different ANPs that contain same noun. Multiple instances may come from same training image and some training images are discarded.

VSO:

The features are divided into two: low level features and aesthetic features. In low level features, these features are extract in object based method based upon image, inside the object and background only. Aesthetic features include dark channel, sharpness etc.

Visual sentiment analysis is a challenging and interesting problem. The analysis of emotion affect and sentiment from visual content has become an exciting area in the multimedia community. It allows to build new applications for brand monitoring, advertising and opinion mining.

There is one corpora for sentiment analysis on visual content. This include a Visual Sentiment Ontology (VSO) consisting of 3244 adjective noun pairs (ANP), SentiBank a set of 1200 trained visual concept detectors providing a mid-level representation of sentiment, associated training images acquired from Flickr, and a benchmark containing 603 photo tweets covering a diverse set of 21 topics.

SentiBank: It a large scale visual sentiment ontology which includes 1,200 semantic concepts and corresponding automatic classifiers. Each concept is defined as an Adjective Noun Pair (A NP), made of an adjective strongly indicating e motions and a noun corresponding to objects or scenes that have a reasonable prospect of automatic detection. It has several exciting applications enabled like robust prediction of affects (sentiment or emotion) in visual content, interactive exploration of large image data sets along the high dimensional sentiment concept space using efficient visualization tools such as emotion wheel and tree map, and finally a multimodal interface (incorporating novel sound and animation effects) for monitoring the sentiment concepts present in live social multimedia streams. CLEAR DECEMBER 2016

References [1] Stuti Jindal and Sanjay Singh “Image Sentiment Analysis using Deep Convolutional Neural Networks with Domain Specific Fine Tuning”, International Conference on Information Processing (ICIP) 2015. [2] Jyoti Islam, Yanqing Zhang “Visual Sentiment Analysis for Social Images Using Transfer Learning Approach” IEEE International Conferences on Big Data and Cloud Computing 2016.

18


M.Tech Computational Linguistics Dept. of Computer Science and Engg, Govt. Engg. College, Sreekrishnapuram Palakkad www.simplegroups.in simplequest.in@gmail.com

SIMPLE Groups Students Innovations in Morphology Phonology and Language Engineering

Article Invitation for CLEAR - June - 2017 We are inviting thought-provoking articles, interesting dialogues and healthy debates on multifaceted aspects of Computational Linguistics, for the forthcoming issue of CLEAR (Computational Linguistics in Engineering And Research) Journal, publishing on June 2017. The suggested areas of discussion are:

The articles may be sent to the Editor on or before 10th June, 2017 through the email simplequest.in@gmail.com. For more details visit: www.simplegroups.in Editor,

Representative,

CLEAR Journal

SIMPLE Groups

CLEAR DECEMBER 2016

19


Hello world, With the increasing demand of social networking, cyber bullying is also at the hike. So the detection of cyber bullying is an area that demands immediate attention. Sentiment analysis has been an evergreen area since the kick-start of e-commercing. Analysing the sentiment of product reviews in e-commerce websites like amazon, flipkart etc reveals alot about emerging trends in market. Smart phones are ruling the market for nearly a decade and the data generated by them can be effectively used for analysing the behaviour of a person. It can make some valuable contribute to various fields like crime detection. Visual medias express a lot of sentiment and emotion and the techniques for the detection of these sentiments is still in starting phase. Its of high importance because of the contribution it can make to fields like education, e-commerce etc. This issue of CLEAR focuses on the researches done on Sentiment Analysis, Cyber Bullying Detection, Analysing Human Activity from Mobile Phone Call Detail Records and Analyzing Sentiments of Visual Contents. The articles are penned with the hope of shedding some light to the various trending fields related to computational linguistics. CLEAR is grateful to all who have given their valuable time and effort for introducing reviving ideas. Simple group invites more strivers in this field. Wish you all the success in your future endeavors‌!!!

Sreelakshmi K

CLEAR DECEMBER 2016

20


CLEAR DECEMBER 2016

21


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.