e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:08/August-2020
Impact Factor- 5.354
www.irjmets.com
AUTOMATED CHATBOT IMPLEMENTED USING NATURAL LANGUAGE PROCESSING Naveen S*1 *1Department
of Computer, Science & Anna University, Chennai, INDIA. E-mail : naveenatt99@gmail.com
ABSTRACT In this paper we focus on, providing a Chatbot that will see to all our queries and will provide a solution or answer to that. Usually, companies will be having a backend team who will be answering the customer’s questions. This is generally a time consuming and tedious job to be done. For solving these problems, Chatbot was created. Generally, the frequently asked customer questions corresponding answers are stored in a text file. So in this model, it will take the customer’s question as input, preprocessing them using some Natural Language Processing techniques that include Tokenization, Lemmatization, and stemming, find the cosine similarity between the question and answers, and provides a score for each answer, and the answer with more score will be considered as the answer for the given question. The answer text file varies from company to company since questions can vary between companies based on the different products available. Hence, the main purpose of the Chatbot is to provide high accuracy by proving the correct and satisfying answer to the customer's question for a company. This paper will be useful to all the Multi-National Companies, by proving a Chatbot model that would output accurate and satisfying answers for the questions asked by their renowned customers. KEYWORDS: Natural Language Processing; Tokenization; Lemmatisation; Stemming; Cosine Similarity; Term Frequency-Inverse Document Frequency;
I.
INTRODUCTION
When a customer buys a product from a company, they will be having lots of speculations about the details of the product. So to solve these problems, companies have hired a backend team, to provide answers to the customer's queries. This is generally a hectic job and requires a large team to operate on it. Hence, the motive is to provide a Chatbot that will answer all the questions of the customers [2]. This will consume time and saves a large amount of money. Now a day, Companies are replacing their backend team by Chatbot. This Chatbot can also be useful in the field of academics, real estate, marketing was there will be more queries to be solved. Generally in this model, the repeatedly asked question’s answer will be stored in a text file and each answer will be provided with a score based on the user given question [1]. The answer with the higher score will be provided as the answer to the user's query. The main techniques that are used in the score calculation are the Cosine Similarity and the TF-IDF approach. Hence by these techniques, the answers with high precision are provided to the user, which is the main goal of this Chatbot Model [3].
II.
SYSTEM ARCHITECTURE
Fig.1: Proposed Model for converting Answer Text Sheet to a pre-processed answer text www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[620]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:08/August-2020
Impact Factor- 5.354
www.irjmets.com
Fig.2: Final Proposed Model of Chatbot using Natural Language Processing
III.
PSUEDO CODE FOR THE MODEL
Algorithm for providing the score for the answers based on the customer's question: 1.
Start
CLASS PROCESSING: 2. a. b. c. d. e. f. g. 3. a. b. 4. a. b. 5. a. b. c. i. 1. d.
FUNCTION initialization(): Start Open the Chatbot dataset in the read mode Assign the data to read to a variable Raw Raw=Raw.lower() Store the sentence tokens in sen_tok Store the word tokens in word_tok Lemmatize the tokens and remove punctuations. FUNCTION LemTokens(): //Lemmatization Lemmatize the tokens END FUNCTION stemTokens(): //Stemming Stemming the tokens END FUNCTION greeting(sentence): Create GREET_INPUTS=array (“hello”,”hi”,”greetings”,”sup”,”hey”). Create GREET_RESPONSES = array ("hi", "hey", "hi there", "hello", "I am glad! You are talking to me"). FOR words in the sentence: IF word in GREETING_INPUTS: RETURN random choice in GREET_RESPONSES END
CLASS USER: 6. a. b. c. d. e. f.
FUNCTION response(user_responses): APPEND sen_tok and user_responses CREATE a variable Tf-idVector to store the vector //TF-IDF Approach STORE the vector of sen_tok in Tf-idVector STORE the cosine similarity of the question and answer into VAL SORT the values in VAL and STORE it in FLAT // Cosine Similarity STORE the index value
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[621]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:08/August-2020 g. h. i. ii. iii. i. i. ii. j. 7. a. b. i. ii. 1. a. iii. 1. a. 2. a. c. d. e. f.
Impact Factor- 5.354
www.irjmets.com
FIND the SCORE in FLAT [-2] IF SCORE==0: Response=”I am sorry! I don’t understand you” ADD the unknown question into the question dataset RETURN Response ELSE: Response= sen_tok[index] RETURN Response END FUNCTION Chatbot(): CALL PROCESSING FUNCTION send(): GET user_responses. IF user_responses!=”bye”: IF user_responses= “Thank You”: END ELSE: IF user_responses=GREETINGS(): CALL greeting. ELSE: CALL response(user_responses). END FUNCTION send(). CREATE an executable file for the Chatbot. RUN the user_responses on the executable file of Chatbot. END.
Fig.3: Final Output after implementing the above algorithm
IV.
TOKENIZATION
Generally, NLP Pre-Processing techniques are used to decrease the processing time. In this Chatbot Model [1], the most important NLP pre-processing technique that needed to be used is the Tokenization. In This Pre-processing technique, each of the words in the given sentence is separated and it is stored in a separate list for words. This pre-processing technique is applied to both the question and the answers. In this, the tokens of the question will be stored in one separate list and the tokens of the answers will be stored in other separate lists [5] so that comparison and providing a score between these lists can be done easily. This tokenization will help in increasing the accuracy of providing the correct score when the www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[622]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:08/August-2020
Impact Factor- 5.354
www.irjmets.com
tokenized word of question is compared with the tokenized word of the answers [4]. This tokenization also helps to faster the processing of Lemmatisation and Stemming. All these pre-processing techniques are available in the Natural Language Processing Tool Kit (NLTK).
Fig.4: The above figure shows the tokenization into words
Fig.5: The above figure shows the tokenization into sentences
V.
LEMMATISATION AND STEMMING
For decreasing the processing time, the other pre-processing techniques like lemmatization and stemming are included in the model. Lemmatization and stemming both the techniques are served for the same particular purpose. These pre-processing techniques are used to reduce the count of words that are repeated in the given sentence. It reduces the count of words by finding the root of the word. For Example, if a sentence has words like Playing, Play, Played, then these techniques will consider all these three words as one word as Play, so that it decreases the number of words stored in the list and increases the processing speed. This is done by both techniques. These techniques will also find the root word for the words that are adjectively related. This NLP pre-processing technique is considered in all the models were the Natural Language Processing plays a major role. In this Chatbot model, it helps in increasing the processing speed and decreasing the processing time.
VI.
COSINE SIMILARITY AND TERM FREQUENCY-INVERSE DOCUMENT FREQUENCY (TF-IDF)
After the pre-processing techniques, next comes the important part of the Chatbot model. After the question and all the answers are pre-processed into two separate lists, they needed to be compared to get the scores. So this is done using Cosine Similarity and TF-IDF approach. The first approach is to rescale the frequency of words by how often they appear in all documents so that the scores for frequent words like “the” that are also frequent across all documents are penalized. This approach is generally called as Term Frequency-Inverse Document Frequency or TF-IDF [6]. To generate a response from the Chatbot for input questions, the concept of Cosine similarity will be used. A function is defined in which searches the user’s question for one or more known Tokens and returns one of several possible answers. In this Cosine Similarity, Score will be provided for each answer based on the occurrence of the token in the question. If www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[623]
e-ISSN: 2582-5208 International Research Journal of Modernization in Engineering Technology and Science Volume:02/Issue:08/August-2020
Impact Factor- 5.354
www.irjmets.com
the token in the question occurs in the answer more, than that answer will be provided with a high score. After providing every answer a score, the answer with a higher score will be displayed as an answer to the corresponding question.
Fig.6: Code for the cosine similarity and TF-IDF approach
VII.
CONCLUSION
Technology in recent years is in great growth. All the fields in the technology market are being adapted and being updated to the newer technology. Customer satisfaction plays a major role in the company's improvement. So the outdated call centers needed to be replaced by the Chatbot to provide highly accurate answers and good customer satisfaction. This Chatbot will reduce the man's work and drastically reduces the cost the company spends on the call center or a backend team. If the questions are unable to answer, then the answer dataset can be updated to newer questions and increase their accuracy. So this Chatbot model will be helpful in all the fields that require complete customer satisfaction leaving them without any questions about the product.
ACKNOWLEDGEMENT I would like to thank Ms. Dr.D.Indhumathi (Mentor), PSG College of technology for supporting my work, and other faculties of the Department of Computer Science and engineering and its staff; students, and my colleagues who helped me in publishing my work.
VIII.
REFERENCE
[1]
Abu-Jbara, A., Ezra, J., and Radev, D. R., 2013. Purpose and polarity of citation: Towards nlp-based bibliometrics. In HLT-NAACL, Atlanta, Georgia, USA, Association for Computational Linguistics, pp. 596–606.
[2]
Feldman, S. (1999). NLP Meets the Jabberwocky: Natural Language Processing in Information Retrieval. ONLINE-WESTON THEN WILTON-, 23, 62-73.
[3]
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing (Vol. 999). Cambridge: MIT press.
[4]
Menaka. Text Classification using Keyword Extraction Technique, Corpus ID: 212463857, June 2014.
[5]
Saif M. Mohammad. 2020b. Nlp scholar: A dataset for examining the state of nlp research. In Proceedings of the 12th Language Resources and Evaluation Conference (LREC-2020), Marseille, France.
[6]
X. Chen, H. Xie, F. Wang, Z. Liu, J. Xu, and T. Hao, “Natural Language Processing in Medical Research: A Bibliometric Analysis,” BMC Medical Informatics and Decision Making, vol. 18, supplement 1, no. 14, 2018.
[7]
X. Schmitt, S. Kubler, J. Robert, M. Papadakis and Y. LeTraon,A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate,2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain, 2019, pp. 338-343.
www.irjmets.com
@International Research Journal of Modernization in Engineering, Technology and Science
[624]