International Journal of Computer Trends and Technology (IJCTT) – volume 9 number 2– Mar 2014
A Survey of Preprocessing Method for Web Usage Mining Process Harmit kaur1 Hardeep singh2 1 2
(Department of CSE/Lovely Professional University,INDIA) (Department of ECE/Lovely Professional University,INDIA)
Preprocessing of log file,
behavior of user by applying data mining techniques.web usage mining deals with the information which is used to understanding the behavior of user who is interacting with a web site. The information is used to improve the structure website, improve performance, and provide fast and reliable access to users. Web usage mining is divided into three phases preprocessing of log data, pattern discovery. Pattern analysis. Preprocessing of log file is complex task but it improves the quality of other two steps pattern discovery and pattern analysis. This paper is organized as follows in section II literature review of web usage mining is explained. In section III and IV sources of log file and attributes of log file are described. In section V formats of log files are explained. In section VI web usage mining process is described. In section VII application of web usage mining are described. Conclusion is given in section VIII.
1. Introduction
2.Literature review
Data mining is a process of finding useful information from large database. Data mining is a process of knowledge discovery which uses different techniques for extracting the knowledge from database.web data mining is an application of data mining It is process of extracting the information from web.web data mining is categorized into three types that are web content mining, web structure mining and web usage mining[1].Web content mining is a type of web data mining which is extraction of contents from web sources there are different web sources from where user can get the informaion.web content mining is divided into two type’s text mining and multimedia mining.web structure mining is a process of discovering link structure of web. There are many tools available to retrieving information from web page but tools ignore valuable information containing in web links.web usage mining is a process to extract the
A paper [1] which described techniques of preprocessing that are used in data cleaning data filtering, path completion, user identification, session identification and web session clustering. They described the different sources of log files, log file formats, preprocessing techniques, algorithms applied and data support to data preprocessing phase. A survey is done by authors on preprocessing techniques used in preprocessing phase. A paper [2] in this paper web log data preprocessing is divided into steps that are log consolidation, data cleaning, user identification and transaction identification .log consolidation is the first step in preprocessing in which the logs from different servers are combined into one place for data cleaning. Next step is data cleaning which is divide into two parts first is page element cleaning in which files with extension.gif, jpeg, .jpg are removed and second
Abstract The amount of web applications are increasing in large amount and users of web applications are also increasing rapidly with high speed. By increasing number of users the size of log file also increases .The information which stores in log files cannot be directly used for analysis. Therefore preprocessing of log files is necessary to improve the quality of web usage mining process. Preprocessing of log data improves performance of other two steps pattern discovery and pattern analysis. Preprocessing involves data cleaning, user identification, session identification, path completion. In this paper the survey of different preprocessing techniques are done and identify better techniques to improve the performance.
Keywords – web usage mining, web server log,
ISSN: 2231-2803
http://www.ijcttjournal.org
Page62