AN APPROACH TO USER NAVIGATIONPATTERN BASED ON ANT by ISERP ISERP

Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 1, Issue No. 2, 112 - 117

AN APPROACH TO USER NAVIGATION PATTERN BASED ON ANT BASED CLUSTERING AND CLASSIFICATION USING DECISION TRESS Mrs. V. SUJATHA 1*

Dr. PUNITHAVALLI2

Computer Science Department CMS College of Science and Commerce, Coimbatore, India sujatha.padmakumar@rediffmail.com.

Computer Application Department 2

SNS Arts and Science College women’s Coimbatore, India mpunitha_srcw@yahoo.co.in

mining techniques to automatically discover web

discovery of user access pattern from web servers.

documents and services, uncover general pattern on

Abstract: Web Usage Mining (WUM) is the automatic Organizations collect large volumes of data in their

daily operations, generated automatically by web

servers and collected in server access logs. It can also provide information on how to restructure a website to

the web and to observe user behavior (viewing, book marking and browsing history).Web mining is the process of finding out what users are looking for on the internet .Some users might be looking at only

secondary data (web logs) derived from the users'

textual data, whereas some others might be interested

interaction with the web pages during certain period of

in multimedia data. Web usage mining is classified

Web sessions. At first Ant-based clustering algorithm is

into three and are web content mining, web structure

applied to pre-processed log files to extract frequent

mining, web usage mining.

service effectively. This paper presents how to mines the

patterns, then it is displayed in an interpretable format and secondly decision tree method is used to find and predict user’s navigation behavior. Two type of approaches are used were the offline phase is based on

Ant based clustering and the online phase is based on

Web

usage

mining

focuses

techniques that could predict user behavior while the user interacts with the web. As mentioned before the mined data in this category are the secondary data on

decision trees. The experimental results represent that

the web as the result of interaction. These data could

the approach can improve the quality of clustering for

range very widely but generally it is classified into

user navigation pattern in web usage mining systems.

usage data that resides in the web client, proxy server

These results can be use for predicting user’s next

and servers. The aim of understanding the navigation

request in the huge web sites.

preferences of the visitors is to enhance the quality of

Keywords -Web usage mining, web mining, web

log files, classification and navigation pattern

I. INTRODUCTION Web mining The term web mining is

electronic

commerce

services

ecommerce,

personalize the Web portals or to improve the Web structure and Web server performance.

The first

stage is preprocessing, next stage is pattern discovery and the last stage is pattern analysis.

coined by Etzioni in 1996, to signify the use of data

ISSN: 2230-7818

Page 112

Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 1, Issue No. 2, 112 - 117



Association Rules discover correlations among pages accessed together by a client.



Sequential

Patterns

extract

frequently

occurring inter-session patterns such that the presence of a set of items s followed by another item in time order. 

Dependency Modeling determines if there are any significant dependencies among the variables in the Web.

C. Pattern Analysis Pattern Analysis is the final stage of

Fig 1: General Architecture for Web Usage Mining II. WEB USAGE MINING ARCHITECTURE Pre-processing "consists of converting the usage, content, and structure information contained in

the various available data sources into the data

abstractions necessary for pattern discovery". This step can break into at least four sub steps: Data

Cleaning, User Identification, Session Identification

and Formatting. Unneeded data will be deleted from

raw data in web log files in the data cleaning step. At least two log file formats exists: Common Log File format (CLF) and Extended Log File format ([16] for more details). Our university log file

consists of these fields: Date, Time, client IP address, Method, URI stem, Protocol status, Bytes sent, Protocol version, Host, User Agent and Referrer. B. Pattern Discovery 

Statistical Analysis such as frequency analysis, mean, median, etc.



validation and interpretation of the mined pattern. Validation: to eliminate the irrelevant rules or patterns and to extract the interesting rules or patterns

A.Preprocessing

WUM (Web Usage Mining), which involves the

from the output of the pattern discovery process. Interpretation: the output of mining algorithms

is mainly in mathematic form and not suitable for direct human interpretations. III. RELATED WORK

Identifying Web browsing strategies is a

crucial step in Website design and evaluation, and requires approaches that provide information on both the extent of any particular type of user behavior and the motivations for such behavior [9].Pattern discovery from web data is the key component of web mining and it converge algorithms and techniques from several research areas. Baraglia and Palmerini (2002) proposed a WUM system called SUGGEST that provide useful information to make easier the web user navigation and to optimize the web server performance. Liu and Keselj (2007)

Clustering of users help to discover groups

proposed the automatic classification of web user

of users with similar navigation patterns

navigation patterns and proposed a novel approach to

(provide personalized Web content).

classifying user navigation patterns and predicting

Classification is the technique to map a data

users’ future requests and Mobasher (2003) presents

item into one of several predefined classes.

a Web Personalizer system which provides dynamic

ISSN: 2230-7818

Page 113

Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 1, Issue No. 2, 112 - 117

recommendations, as a list of hypertext links, to users. Jespersen et al. (2002) [10] proposed a hybrid approach for analyzing the visitor click sequences. Jalali et al. (2008a [7] and 2008b [8]) proposed a system for discovering user navigation patterns using a graph partitioning model. An undirected graph based on connectivity between each pair of Web pages was considered and weights were assigning to edges of the graph. Dixit and Gadge (2010) [5] presented another user navigation pattern mining

system based on the graph partitioning. An

Figure 2 Offline & Online phase

undirected graph based on connectivity between Referrer and URI pages was presented along with a

A. Offline phase of the architecture

preprocessing method to process unprocessed web

This phase consists of two major

modules Data pretreatment and Navigation Patterns

of the undirected graph. Ant-based clustering due to

Mining. In this phase starting with the primary Web-

its flexibility and self-organization has been applied

Log Preprocessing (Data pretreatment) to extract user

in a variety of areas from problems arising in e-

navigation session from dataset and Clustering

commerce to circuit design, and text-mining to web-

algorithm to mining navigational patterns in offline

mining, etc (Jianbin et al., 2000. The various works

phase .

log file and a formula for assigning weights to edges

proposed in this area with particular emphasize on

web usage mining, clustering and classification was

B. Online phase of the architecture During the online phase, when a new

provided in this section. In this present work,

request arrives at the server, the URL requested and

research work is one another attempt made to

the session to which the user belongs are identified,

propose a hybrid system that uses clustering and

the underlying knowledge base is updated, and a list

classification

of suggestion is appended to the requested page[6].

methods

discover

the

user’s

C. Prediction Engine.

web log file.

The main objective of prediction engine in this

navigation pattern and analyze them from the server’s IV METHODOLOGY

The refined web log files are given as an input to

part of architecture is to classify user navigation patterns and predicts users’ future requests.

the ant based clustering algorithm to find the user

D. Ant-based Clustering

behavior pattern, then with that classification method

In the case of ant-based clustering and sorting,

using decision trees are applied to predict the user’s

two related types of natural ant behaviors are

next request in the huge web sites. The hybrid system

modeled. When clustering, ants gather items to form

improves the quality of clustering for user navigation

heaps. And when sorting, ants discriminate between

pattern in web usage mining systems.

different kinds of items and spatially arrange them according to their properties. Lumer and Faieta in

ISSN: 2230-7818

Page 114

Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 1, Issue No. 2, 112 - 117

proposed ant-based data clustering algorithm (shown in Figure 3), which resembles the ant behavior

described in [4].

Input: training samples, represented by discrete attributes; the set of candidate Attributes, attribute-list. Output: set of classes Method: 1. Create a node N; 2. If samples are all of the same class C, then Return N as a leaf node labeled with the class C; 3. If attribute list is empty then Return N as a leaf node labeled with the most common class in samples (majority voting) 4. Select test attribute, the attribute among attribute-list with the highest information gain ratio; 5. Label node N with test-attribute; 6. For each known value ai of test-attribute 7. Grow a branch from node N for the condition testattribute= ai; 8. Let si be the set of samples in samples for which test-attribute = ai; 9. If si is empty then 10. Attach a leaf labeled with the most common class in samples; 11. Else attach the node returned by generate decision- tree

Figure 4: Classification using decision trees V. EXPERIMENTAL EVALUATION In order to test the effectiveness of

the proposed system, server web log data file was

Figure 3: Ant based algorithm E. Decision Trees

obtained. The system was tested with several data collected from 90 days for easy discussion,

experiments projected here are from one day, that is,

classification and prediction. It is simple yet a

data collected on 29-12-2009. As mentioned in

powerful way of knowledge representation. The

section 3, the preprocessing is conducted in four

models produced by decision trees are represented in

steps, namely (i) Cleaning (ii) User Identification (iii)

the form of tree structure. A leaf node indicates the

Session Identification and (iv) formatting

trees

are

Decision

used

class of the examples. The instances are classified by sorting them down the tree from the root node to leaf node.

Figure 5: clusters group

ISSN: 2230-7818

Page 115

Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 1, Issue No. 2, 112 - 117

3 4 5 6

116.68.91.110 117.204.97.156

118.94.8.197 119.27.62.254 121.242.52.2

User Profile

Unique Pages

1  15  3  8  15  17 1  8  3  11  15  6  1  17  23  6

{1, 15, 3, 8, 17} {1, 6, 3, 11,

1286  17  2 149  11  23 1  8  13  1  17

{1, 2, 8, 6, 17} {1, 4, 9, 11, 23} {1, 8, 13, 17}

122.178.146.123

1  4  11  15  4 Figure 6 Extracted navigation patterns NP number 1 2 3 4

15, 17, 23}

{1, 4, 15}

11,

Navigational Pattern

(P1, P15 ,P3 ,P8 ,P17 ) (P1, P6 ,P3 ,P11 ,P15 ,P17 ,P23 ) (P1,,P2 P8, P6 ,P17 ) (P1, P4 ,P9 ,P11 ,P23 )

5 ( P1, P8 ,P13 ,P17 ) 6 ( P1, P4 ,P11 ,P15 ) Figure 7: Navigation pattern Generated by clustering algorithm

VI. CONCLUSION

In this paper, a new method to extract navigational patterns from web logs. The work focused on group of the frequently accessed patterns of interested users. It assists the web site designers to improve the performance of the web by giving preference to the patterns navigated by the regular interested users. After the clustering is completed, alignment processing has been applied to the extracted sequences in each cluster and extract the representative for each cluster. A Classification algorithm is used for online phase to predict the user future request. VII. REFERENCES

A. Output

Figure: 9 interested user & non interested user

IP Address

S.No.

[1] Abraham. Natural Computation for Business Intelligence from Web Usage Mining, Proceeding of Seventh International Symposium on Symbolic and Numeric

Algorithms

for

Scientific

Computing

(SYNAC2005), pp. 3-11, 2005. [2] Baraglia, R. and Palmerini, P. (2002) SUGGEST: A web usage mining system, Proc. of IEEE Int’l Conf. on Information Technology: Coding and Computing, P.282.

Figure 8: Effect of cleaning step on raw web log file

[3] Clark, L., Ting, I.H., Kimble, C., Wright, P. and Kudenko, D. (2006) Combining ethnographic and

clickstream

data

identify

user

Web

browsingstrategies, Information Research, Vol. 11, No. 2.

ISSN: 2230-7818

Page 116

Mrs. V. SUJATHA et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 1, Issue No. 2, 112 - 117

[4] Deneubourg, J.L., Goss, S., Franks, N.,

Data Warehousing and Knowledge Discovery, LNCS

Sendova–Franks, A., Detrain, C. and Chretien, L.

2454, Y. Kambayashi, W. Winiwarter, M. Arikawa

(1990) The Dynamics of Collective Sorting Robot–

(Eds.), Pp. 73-82.

Like Ants and Ant – Like Robots. From Animals to Animals, Proc. Of the 1st Int. Conf. on simulation of Adaptive Behaviour, Pp. 356–363. [5] Dixit, D. and Gadge, J. (2010) A New Approach for Clustering of Navigation Patterns of Online Users, International Journal of Engineering 1676. [6] Handl, J. and Meyer, B. (2002) Improved ant-based clustering and sorting in a document retrieval interface, Proceedings of the Seventh

International Conference on Parallel Problem Solving

Science and Technology, Vol. 2, No.6, Pp. 1670-

from Nature, Vol. 2439 of LNCS, Springer-Verlag, Berlin, Germany, and Pp. 913–923.

[7] Jalali, M., Mustapha, M., Mamat, A. and Sulaiman,

M.N.B.

(2008a)

new

clustering

approach based on graph partitioning for navigation

patterns mining, 9th International Conference on

Pattern Recognition, Pp. 1- 4.

[8] Jalali, M., Mustapha, N., Mamat, A., Sulaiman, N.B. (2008b) Web user navigation pattern mining approach based on graph partitioning algorithm, Journal of Theoretical and Applied

Information Technology, Pp. 1125-1131

[9] Jalali, M., Mustapha, N., Sulaiman, N.B. and

Mamat, A. (2008c) A web usage mining approach based on LCS algorithm in online predicting recommendation

systems,

12th

International

Conference Information Visualization, IEEE Computer Society, Pp. 302307. [10] Jespersen S.E., Thorhauge J., and Bach T. (2002), A Hybrid Approach to Web Usage Mining,

ISSN: 2230-7818

Page 117