Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in
Social Set Analysis: A New Approach to Big Data Analysis Ms. Bhagyodaya S. Aher, Ms. Kishori R. Nikam, Ms. Pratibha D. Pawar, Ms. Bhavini N. Naik, & Prof. D. S. Thosar SVIT COE, Chincholi, Nashik. Abstract: Now days analytical approaches in Computational Social Science can characterized by four dominant paradigms: text analysis (information extraction and classification), social network analysis, social complexity analysis, social simulations. However, when it comes to organizational and societal units of analysis, there exists no approach to conceptualize, model, analyze, explain and predict social media interactions as individuals associations with ideas, values, identities, etc. To address this limitation, based on the sociology of associations and the mathematics of set theory, this paper presents a new approach to big data analytics called Social Set Analysis Social Set Analysis consists of generative framework for philosophies of computational social science, theory of social data, conceptual and formal models of social data, and an analytical framework for combining big social datasets with organizational and societal datasets. Three empirical studies of big social data are presented to illustrate and demonstrate Social Set Analysis in terms of fuzzy settheoretical sentiment analysis, crisp set-theoretical interaction analysis and event-studies oriented settheoretical visualizations. Implications for big data analytics, current limitations of the set-theoretical approach, and future directions are outlined. Keyword: Big social data, formal models, Social set analysis, Big data visual analytics, New computational models for big social data.
1. Introduction In this paper, data mining learns relevant patterns from a numerical Re-presentation of the entire collection, and the patterns discovered are derived by analyzing the collection as a whole. The rule builder, on the other hand, relies only on personal experience and knowledge to formulate rules that will be useful for sentiment analysis. Because they approach the problem so differently, data mining and rule-based systems can complement one another. They can do this in two ways. First, un-supervised data mining can be used as a tool for the rule builder; and second, the supervised data mining model can be combined with the rule-based model in such a way that the
Imperial Journal of Interdisciplinary Research (IJIR)
strengths of each model are combined, and any possible mistakes made by one model can be corrected by the other. Data mining of the Text for the Rule Builder: The challenge of the rule builder is to devise and formulate rules that capture the sentiment contained in the collection. To do this, the rule builder must have some understanding of the content of the documents that are being categorized. For instance, in our movie review collection, are all the reviews about a specific movie or are they about a specific genre of movies? If we know, we can save time by writing rules that are only directed to a particular movie or genre. On the other hand, if the reviews are about movies from many different 12 genres, we must consider how that knowledge affects the rules we write. Otherwise, we might not capture the sentiment accurately. For instance, when discussing a horror movie, the statement the scariest thing I have ever seen is typically an indicator that the reviewer enjoyed the movie? But it could be a negative indicator if the reviewer was discussing a children’s movie.
2. Methodology An important application of text analytics is to automatically characterize the sentiment of documents in a variety of domains, whether it is positive, negative or neither. In this project we explore the benefits of combining domain-specific linguistic rules with data mining methods to improve both the effectiveness of your models and the efficiency of the model builder. In this project, we assume that the user has basic computer knowledge. We also assumed that user has basic knowledge of using Internet. The accuracy of the system depends upon the input given by user.
1) Non Functional Requirements: i. Performance Requirement The Server system used for this software should be in good workability and virus free to increase the performance of this software. Sufficient memory should be available to run the application and store the database.
Page 1251
Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in
ii.
Safety Requirement Application _les should not be corrupted. Application should run in proper manner to ensure the reliability of the product.
iii.
Security Requirement
Access to database or the main system should be protected from unauthorized user. Users personal data should be preserved from any misuse. Also one company should not get information about another companies important aspects.
speed optimization. As the attributes and functional dependencies are stored in the same linked list, thus the memory required is less. This eliminates the use of two linked lists. Three types of inputs will be accepted The SUM normal tool will be flexible as far as input is concerned as it will accept 3 types of inputs which include an XML file, an ER Diagram and an existing database. Code generation facility available (As an extension) it will provide a UML editor. This editor will not only provide the facility of drawing UML diagrams but will also generate the code for it.
4. Conclusion 3. Implementation The system will accept the input in 3 formats: ER diagrams, XML editor and existing database. The first type of input will be converted to XML format and then 22 passed on to the XML parser. The XML parser will parse the XML _le, either which is derived from the ER diagram, or accepted as an input. The parser will extract information, and pass to matrix builder. The matrix builder will build the entity matrix and pass it to Normalizer. The matrix builder will also accept the input in the form of an existing database and pass it to the Normalizer. The Normalizer will normalize the data and store it.
In this work, we studied the data mining learns relevant patterns from a numerical representation of the entire collection, and the patterns discovered are derived by analyzing the collection as a whole. The rule builder, on the other hand, relies only on personal experience and knowledge to formulate rules that will be useful for sentiment analysis.
5. Acknowledgements This research work was support by Prof. D. S. Thosar, SVIT COE Nashik. We thank him for guiding us and providing insight which greatly assisted our research work. We also thank Prof. S. M. Rokade, H.O.D. SVIT COE Nashik for his constant motivation. We would also like to show our gratitude to Dr. Prof. S. A. Patil, Principal SVIT COE Nashik and thank him to their constant encouragement and support.
6. References [1] B. Liu. Sentiment Analysis and Subjectivity. Handbook of Natural Language Processing, Second Edition, (editors: N. Indurkhya and F. J. Damerau), 2010.
[2] B. Pang and L. Lee, Opinion Mining and Sentiment Analysis. Foundations and Trends in information Retrieval 2(1-2), pp. 1135, 2008 [3] J. Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin, Learning Subjective Language, Computational Linguistics, vol. 30, pp. 277308, September 2004.
A. System Architecture It is a gist of the entire system. It consist the representation of the components of the system and how these components are interconnected. As it uses only a single linked list, the time required to insert, update or delete any entity is less which results in
Imperial Journal of Interdisciplinary Research (IJIR)
[4] M. Hu and B. Liu, Mining and Summarizing Customer Reviews, Proceedings of the ACMSIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 168177, 2004.
[5] N. Jindal, and B. Liu. Opinion Spam and Analysis. Proceedings of the ACM Conference on WebSearch and Data Mining (WSDM), 2008.
Page 1252
Imperial Journal of Interdisciplinary Research (IJIR) Vol-3, Issue-2, 2017 ISSN: 2454-1362, http://www.onlinejournal.in
[6] Albright, Russ. Taming Text with the SVD. January 2004. SAS: Cary, NC. Web: http://ftp.sas.com/techsup/download/EMiner/Taming TextwiththeSVD.pdf. [7] 2Pang et al. Thumbs Up? Sentiment Classification Using Machine Learning Techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Conference on Empirical Methods in Natural Language Processing. 2002. 79-86.
[8] Rod Adams, Gabriel Nicolae, Cristina Nicolae, and Sanda Harabagiu. Textual entailment through extended lexical over lap and lexicosemantic matching. In Proceedings of the ACL- PASCAL Workshop on Textual Entailment and Paraphrasing, pages 119124, Prague, June 2007.URLhttp://www.aclweb.org/anthology/W/W07/W071420.
[9] Antti Airola, Sampo Pyysalo, Jari Bjorne, Tapio Pahikkala, Filip Ginter, and Tapio Salakoski. A graph kernel for protein-protein interaction extraction. In Proceedings of Bio NLP 2008: Current Trends in Biomedical Natural Language Processing (ACL08), 2008.
[10] Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In Proceedings of the 20th International Joint Conference on Articial Intelligence (IJCAI2007), 2007.
Imperial Journal of Interdisciplinary Research (IJIR)
Page 1253