V.R.Phani Sridath* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 065 - 069
Visualization of Time-series Cluster Graphs using Hierarchical Clustering Technique 2
V.R.Phani Sridath II Year M.Tech
Dr. Kudipudi Srinivas Ph.D
3
Dr.V. Srinivasa Rao Ph.D
PROFESSOR
sridath@gmail.com
PROFESSOR AND HOD
kudipudi72@gmail.com 1,2
drvsrao9@gmail.com
Department of Computer Science and Engineering V R Siddhartha Engineering College Vijayawada--522007
technique that presents complex multiattribute temporal data in a cohesive graphical manner by building on wellestablished data mining methods. Business intelligence tools gain their strength by supporting decision-makers, and our technique helps the users leverage their domain expertise to generate knowledge visualization diagrams from complex data and further customize them. Organizations and firms are capturing increasingly more data, and this data is often transactional in nature, containing multiple attributes and some measure of time. For example, through their websites, e-commerce firms capture the click stream and purchasing behavior of their customers, and manufacturing companies capture logistics data (e.g., on the status of orders in production or shipping information). One of the common analysis tasks for firms is to determine whether trends exist in their transactional data. For example, a retailer may wish to know if the types of its regular customers are changing over time, a financial institution may wish to determine if the major types of credit card fraud transactions change over time, and a website administrator may wish to model changes in website visitors’ behavior over time. Visualizing and analyzing this type of data can be extremely difficult because it can have numerous attributes (dimensions). Additionally, it is often desired to aggregate over the temporal dimension (e.g., by day, month, quarter, year, etc.) to match corporate reporting standards. The approach that we take in the paper for addressing these types of issues is to mine the data according to specific time periods and then compare the data mining results across time periods to discover similarities.
ES
Abstract- Organizations capture more data about their business environment. Most of this data is multiattribute (multidimensional) and temporal in nature. However, mining temporal relationships typically is a complex task. Time series analysis is used to mine a sequence of continuous real valued elements based on regression concept and are examples of supervised learning. Existing systems that find the trends in multiattribute transactional data are not often using hierarchical clustering techniques which are particularly well suited for real-time updates.
T
1
IJ
A
We propose a Cluster-based time-series representation of data , a system that implements the time-series cluster graph construct, which maps multiattribute time-series data to a two-dimensional directed graph that identifies trends in dominant data types over time. The proposed system uses clusters, identified in multiple time periods and identifies trends based on similarities between clusters over time. Trend discovery may be better addressed using unsupervised learning techniques, because models of trends and specific relationships between variables may not be known. In this system we used dendograms data structure for storing and extracting cluster solutions generated by hierarchical clustering algorithms. It provides the end user with the ability to generate graphs from data and adjust the graph parameters dynamically. Interaction techniques provide the user with the ability to dynamically change visual representations and can empower the user’s perception of information.
1. INTRODUCTION
Bussiness intelligence applications represent an important opportunity for data mining techniques to help firms gather and analyze information about their performance, customers, competitors, and business environment. Knowledge representation and data visualization[1] tools constitute one form of business intelligence techniques that present information to users in a manner that supports business decision-making processes. In this paper, we develop a new data analysis and visualization[1]
ISSN: 2230-7818
1.1.TEMPORAL CLUSTER GRAPHS In this paper, We present a new data mining technique for identifying and visualizing trends in multiattribute temporal data. We build on both temporal data mining
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 65
V.R.Phani Sridath* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 065 - 069
1.2. NEED AND IMPORTANCE OF PROJECT PROBLEM 1.2.1 Existing System
1.2.2 Limitations of Existing System
2..METHODOLOGY
2..1 Preprocessing Phase (offline) 1. Transformation of data from excel sheet.
IJ
A
In business intelligence context, trend discovery may be better addressed using unsupervised learning techniques, because models of trends and specific relationships between variables may not be known. Specifically clustering is the unsupervised discovery of groups in a dataset. There are no existing techniques which provide the end user with the ability to generate graphs from data and adjust the graph parameters dynamically so as to analyze the graph and observe the trends. Existing Schemes consumes more time and difficult to observe the trends in Multiattribute data which is temporal in nature. 1.2..3 Proposed System
This system uses clusters identified in multiple time periods and identifies trends based on similarities between clusters over time. It is a clustering approach for discovering temporal patterns, which builds on temporal clustering methods and complements existing temporal mining methods. Graphs contain important information about the relative proportion of common transaction types across time periods, and similarities between common transaction types, and trends in common transaction types over time. In this project we use Dendogram data structure for storing and Extracting cluster solutions generated by hierarchical clustering algorithms Calculations are made using Tree data structures.
ISSN: 2230-7818
It provides the end user with the ability to generate graphs from data and adjust the graph parameters dynamically. Visual data exploration is the process of presenting data in some visual form and allowing the human to interact with the data to create insightfulness representations. It typically follows the overview, zoom and filter, and details on demand. Interaction techniques provide the user with the ability to dynamically change visual representations and can empower the user’s perception of information. Interactive filtering involves dynamically partitioning a data set into segments and focusing on interesting subsets by either direct selection or specification of subset properties. Interactive zooming is a common technique that provides the user with variable display of data at different levels of analysis. It builds on temporal data mining techniques and develops a tool that provides the user with the ability to interact with temporal cluster graph data visualization. Temporal cluster graphs use hierarchical and graph-based techniques to explore temporal data and providing interactive filtering and zooming capabilities for visualization.
ES
Existing time series analysis techniques are used to mine a sequence of continuous real valued elements and if often regression based, relying on the prespecified definition of a model. Moreover standard time series analysis techniques typically are examples of supervised learning, in other words, they estimate the effects of a set of independent variables on a dependent variable. Existing systems are not often using hierarchical clustering which is particularly well suited for real-time updates because the clustering process has to be performed only once to create a complete set of solutions (which also makes the zoom operation very efficient).
1.2.4 Advantages of Proposed System
T
techniques and visual data exploration techniques and develop a tool that provides the user with the ability to interact with temporal cluster graph data visualization. Temporal cluster graphs use hierarchical and graph-based techniques to explore temporal data and provide interactive filtering and zooming capabilities for visualization.
User provides the temporal data set to the system in which he wants to analyze the trends through excel sheet. Then our system imports the dataset present in the excel sheet to database. 2.
Partitioning according to time periods.
In the preprocessing phase, the data set is partitioned based on time periods, and each partition is clustered using one of many traditional clustering techniques such as a hierarchical approach. The results of the clustering for each partition are used to generate two data structures: the node list and the edge list. Creating these lists in the preprocessing phase allows for more effective (real-time) visualization of time-series cluster graphs using hierarchical clustering technique output graphs. Based on these data structures, graph entities (nodes and edges) are generated and rendered as a temporal cluster graph in the system output window 3. Dendrogram Tree construction and extraction. 2..2 Interactive Analysis phase (Online) 1.
Visualization of Clusters
2.
Construction of Graph.
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 66
V.R.Phani Sridath* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 065 - 069
Node Filtering (applying Alpha filter).
4.
Edge Filtering (applying Beta filter). 3.GRAPH PARAMETERS
3.1 Visualization of Clusters It refers to the ability to dynamically change the size of the clustering solution in a data partition as the zoom feature. Temporal cluster graphs provide the user with the control to adjust and visualize the clustering solution for each time partition in real time. For example, in a partition with 100 data points, changing ki = 3 to ki = 2 would recluster the data points from three clusters into two clusters. The zoom feature allows the users to apply their domain expertise by adjusting in real time the underlying clustering solution used to build a trend graph and interactively evaluate multiple trend views.
It is possible that not all clusters will be large enough to be considered relevant to the analysis at hand. For example, in a data set of 2,000 data points, a cluster of size s = 2 (i.e., containing only two data points) would likely be spurious for many practical applications. 3..3 Beta Filter
4.RESULTS The database table1contains 5 fields which are name of the player, hits, runs, outs, walks and the year. All the fields are of numeric data type except the name of the player. The name of the player will be of string data type. the sample dataset will be look like the following dataset. TABLE 1: CRICKET DATASET
NAME SACHIN RAHUL STEVE SEHWAG
HITS 420 400 380 465
RUNS 1290 1189 546 1498
ES
3.2 Alpha filter
spurious edges based on their weight. An edge is included in the output graph if it meets two criteria: 1) the edge is incident to two nodes that are both included in the output graph (as determined by the clustering solution and within-period trend strength ), and 2) the edge weight is less than or equal to a threshold that depends on the crossperiod trend strength . The edge threshold is calculated by taking the average of the weights of all the possible edges among the nodes in two adjacent data partitions.
T
3.
WALKS 300 351 321 365
.
ISSN: 2230-7818
YEAR 2001 2000 2001 2000
For the analysis, four attributes were used: hits, home runs, strike outs, and walks. The data was partitioned into one year subsets. The analysis of the partitioned data reveals some interesting trends in the cricket data. First, there is a strong trend over the years for average hitters. The performance of average hitters has not changed much over the past 40 years, and this is indicated by the very small distances between clusters in adjacent time partitions.
IJ
A
In temporal cluster graphs, edges are used to represent relationships between nodes (clusters) in adjacent time partitions. Since an edge is possible between any two nodes in adjacent partitions, it is desirable to limit the edges included in a graph to those that are incident to “very similar” nodes, thus representing a trend over time. Because the concept of what is “very similar” can be domain specific, we introduce a user-specified crossperiod trend strength parameter that is used to filter out
OUTS 715 750 625 596
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 67
V.R.Phani Sridath* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 065 - 069
ES
T
Figure 1: Displaying of trend graph which maps multiattribute temporal transactional data to two dimensional graph to analyze trends when alpha filet=0 and beta filter =2
IJ
A
Figure 2: Displaying of trend graph which maps multiattribute temporal transactional data to two dimensional graph to analyze trends when alpha filet=0 and beta filter =0.8
Figure 3: Displaying of trend graph which maps multiattribute temporal transactional data to two dimensional graph to analyze trends when alpha filet=0.3and beta filter =0.8
ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 68
V.R.Phani Sridath* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 065 - 069
[6] G.C. Battista, P. Eades, R. Tamassia, Drawing. Prentice Hall, 1999.
and I.G. Tollis, Graph
[7] B. Becker, R. Kohavi, and D. Sommerfield, “Visualizing the Simple Bayesian Classifier,” Proc. ACM SIGKDD Workshop Issues on the Integration of Data Mining and Data Visualization, 1997. [8] B. Bederson, “Pad++: Advances in Multiscale Interfaces,” Proc. Conf. Human Factors in Computing Systems (CHI ’94), p. 315, 1994. [9] D.J. Berndt and J. Clifford, “Finding Patterns in Time Series: A Dynamic Programming Approach,” Advances in Knowledge Discovery and Data Mining, pp. 229-248, 1995. [10] J. Bertin, Semiology of Graphics: Diagrams, Networks, Maps, W.J. Berg, translator, Univ. of Wisconsin Press, 1983. [11] C.G. Beshers and S.K. Feiner, “Visualizing n-Dimensional Virtual Worlds within n-Vision,” Computer Graphics, vol. 24, no. 2, pp. 37- 38, 1990. [12] C.G. Beshers and S.K. Feiner, “AutoVisual: Rule-Based Design of Interactive Multivariate Visualizations,” IEEE Computer Graphics and Applications, vol. 13, no. 4, pp. 41-49, 1993.
IJ
A
ES
By harnessing computational techniques of data mining, we have developed a new temporal clustering technique for discovering, analyzing, and visualizing trends in multiattribute temporal data. The proposed technique is versatile, and the implementation of the technique as the visualization of time-series cluster graphs using hierarchical clustering technique system gives significant data representation power to the user .domain experts have the ability to adjust parameters and clustering mechanisms to fine-tune trend graphs. we demonstrated that the visualization of time-series cluster graphs using hierarchical clustering technique implementation is scalable: the time required to adjust trend parameters is quite low even for larger data sets, which provides for real-time visualization capabilities. Furthermore, the proposed temporal clustering analysis technique is applicable in many different data analysis contexts and can provide insights for analysts performing historical analyses and generating forecasts. Here in this project Annual batting statistics were collected for every MLB player that played in the years 1991- 1995. For the analysis, four attributes were used: hits, home runs, strike outs, and walks. The data was partitioned into one year subsets. The analysis of the partitioned data reveals some interesting trends in the cricket data. First, there is a strong trend over the years for average hitters. The performance of average hitters has not changed much over the past 40 years, and this is indicated by the very small distances between clusters in adjacent time partitions. Another interesting trend is the periodic appearance of clusters of power hitters (i.e., hitters with significantly more home runs). Additionally, subpar hitters are apparent in the early years of the data set but are either absorbed by other clusters or are not as prevalent in later years. So we can conclude that we have visualized trends in multiattribute transactional data which is temporal in nature.
T
4.CONCLUSION
5.REFERENCES
[1] J. Abello and J. Korn, “MGV: A System of Visualizing Massive Multi-Digraphs,” IEEE Trans. Visualization and Computer Graphics, vol. 8, no. 1, pp. 21-38, Jan.-Mar. 2001. [2] R. Agrawal, K.I. Lin, H.S. Sawhney, and K. Shim, “Fast Similarity Search in the Presence of Noise, Scaling, and Translation in TimeSeries Databases,” Proc. 21st Int’l Conf. Very Large Data Bases (VLDB ’95), pp. 490-501, 1995. [3] M.S. Aldenderfer and R.K. Blashfield, Cluster Analysis. Sage Publications, 1984.
[4] C.M. Antunes and A.L. Oliveira, “Temporal Data Mining: An Overview,” Proc. ACM SIGKDD Workshop Data Mining, pp. 1-13, Aug. 2001. [5] C. Apte, B. Liu, E. Pednault, and P. Smyth, “Business Applications of Data Mining,” Comm. ACM, vol. 45, no. 8, pp. 49-53, 2002.
ISSN: 2230-7818
@ 2011 http://www.ijaest.iserp.org. All rights Reserved.
Page 69