Paper id 26201438

Page 1

International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

Analyzing the Materialized View Selection Cost Mrs. P.S. Gotmare1, Mr. A. Mohod2, Ms. M.J. Sawarkar3, Mrs. R. S. Ashtankar4 Information Technology1,4 , Computer Science & Engineering2, 3, P.J.L.C.E.1, 2, 3, 4 Email: priya.gotmare@gmail.com1 , mohod.ashish@gmail.com2 Abstract- Data warehouse store information collected from the heterogeneous, sovereign and distributed databases. Many applications require access to the distributed data warehouses. The distributed data warehouses are collected at the centralized data warehouse for the sec of maintenance. The materialized view can significantly improve the performance of distributed databases. Materialized view selection impacts on the efficiency as well as the total cost of establishing and running a data warehouse. Selecting views for materialization plays the crucial role. When join query are executed repeatedly could induce the traffic, affects the overall performance of the distributed system. A Materialized Query Table Advisor (MQTA) is often used to recommend and create Materialized view. Index Terms- data warehouse, materialized view, algorithm, MV 1. INTRODUCTION Data warehouse is the repository of large data from various data sources consist of set of materialized views. Materialized view is a technique which helps to answer user queries quickly, effectively and efficiently. Materialized views are the derived relations, which are stored as relations in the database. In materialized view recompilation of query is avoided due to which it is possible to improve the performance of query execution. In distributed environment where many heterogeneous nodes are used with different constraints on CPU,IO are used, where each node issue many different queries and update at different rate this is called distributed view selection problem[5]. When a base relation is update, all its dependant materialized views have to be updated in order to maintain the consistency and integrity of the database. Due to this it is difficult to find the optimal set of materialized views in complex, distributed scenario. The process of updating a materialized view in response to the changes in the base relation is called ‘View Maintenance’ that incurs a ‘View Maintenance Cost’. Due to the space and time constraints it is not possible to materialize all the views. This need to select appropriate set of vies to materialize for answering the queries was denoted view selection problem (VSP). This impact on the efficiency as well as the total cost of establishing and running the data warehouse. Materialized views associated with two types of cost special cost and query cost. Firstly the materialized view need storage space this leads to special cost, secondly when user’s query make use of materialized view leads to query cost. Three factors must be considered while selecting materialized view that is query cost, maintenance cost and special cost [1]. Therefore, to select an appropriate set of a view is the major target that diminishes the entire query response time and also maintains the selected views. So, many literatures try

to make the sum of that cost minimal [2-4] Materialization of all possible views is not recommended due to memory space and time constraints [1]. Now a days disk space is very cheap due to this the factor that prevents us from materializing all the views in the data warehouse is not the space constraint but fast answers for queries. the real constraining factor is the response time that reflects users’ needs for fast answers. MV selection for complex distributed scenario is difficult because of the fact that the distributed view selection problem is known to be NP-hard [3], due to the fact that the distributed DBMS consisting of nodes with different above mentioned resource constraints, this results in non monotonic cost model and hence the greedy algorithm which propose MVs based on optimal solution cannot be applied. The number of possible views to materialize grows exponentially with the number of computer nodes and queries, and with the numbers of columns, joins predicates, grouping clauses and tables referenced in each query. Due to the huge solution space, brute-force strategies, e.g., backtracking, as well as biology inspired solutions like ant colony optimization or genetic algorithms cannot be applied directly[5]. The distributed view selection it is important to obtain good MVs even thought the underlined cost model is simplified or inaccurate. In different business scenario guaranteed quality of services is required and some machines may be designed for specific queries. The view selection must allow exchange of model. Materialized view selection consists of three optimization problems, i.e., query optimization, multiple query optimization, and materialized view selection. The layout of the paper is as follows. In section 2, we address related work. Section 3, presents a comparative study of the various research works explored in the previous section. Lastly, we conclude in section 4 and section 5 will provide the references.

79


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 2. RELATED WORK Data warehouse is a large repository of data used for querying and analysis due to which it is crucial to identify and store the most appropriate set of materialized views in the warehouse. Gupta and Mumick [6] proposed an inverted-tree greedy algorithm and an A*-heuristic to reduce the maintenance cost incurred with materialized view selection and claimed that the A*-heuristic can guarantee to reach an optimal solution. [8] Two approaches are being proposed in this work for multi query optimization. Both approaches consider a cost model that combines query and maintenance costs of views. In first approach, three algorithms are proposed (1) The IRVSA (Incremental Recomputation strategy View Selection Algorithm, (2)The IVSA Incremental strategy View Selection Algorithm, and (3) The RVSA Recomputation strategy View Selection Algorithm. The second approach presents a strategy for searching the view to reduce the search space. It propose the IMDVSA (Incremental strategy Materialized Descendants View Selection Algorithm) both the approaches achieves the multi query optimization by using an MVPP ((Multi-View Processing Plan) as a search space. A view is added in the list of materialized views by calculating the total cost which has two components: The total query cost of all queries and the total maintenance cost of all materialized views. These two costs depend upon the frequency of querying and frequency of updating views respectively. Calculations are done to find as to whether the addition of view to set of materialized view has decreased the total cost. If it results in reducing the total cost then a positive benefit results, and the view should be materialized. The view maintenance optimization is achieved through incremental and recomputation strategies in the cost model. In the IRVSA algorithm of first strategy, the views are ordered in a descending order according to their benefit when not materialized, and inserted into the list. The views are then picked from list one at a time. The benefit of the view in the list is calculated and views of the interested benefit are considered for materialization. IMDVSA Algorithm This algorithm presents a simple search strategy that reduces the search space when incremental update strategy is used. It is based on a fact that when a view is materialized , materialize its descendants. Thus the descendants of a materialized view are removed from the search space. It uses certain views which are used to update other views. Several simulation experiments were carried out to study the

performance of the proposed algorithms compared to each other. [7] an hybrid evolutionary algorithm which combine deterministic and randomized algorithms is proposed to optimizing global processing plans and materialized view selection. The data model considered is based on selection-projection-join (SPJ) model. It considers the issues 1) query optimization; 2) multiple query optimization; 3) materialized view selection. Query optimization finds the query with lowest processing cost, multiple query optimizations, multiple join query processing plan is constructed using directed acyclic graph (DAG). This paper adopts a holistic approach to materialized view selection and considers local processing plans, global processing plans, and materialized view selection in an integrated framework and algorithm. The design of the algorithm is a two level structure which creates many global processing. The higher level algorithm searches for good global processing plans from local processing plans based on queries. The lower level algorithm selects the best set of materialized views with the minimal total cost for a particular global processing plan. Evolutionary Algorithms This algorithm is suited for large and complex problems where little prior knowledge is available experiment is carried out which shows that the hybrid evolutionary algorithm delivers better performance than both the evolutionary algorithm and heuristics used alone in terms of the minimal query and maintenance cost and the evaluation cost to obtain the minimal cost. [9] this work addresses the issue of the maintenancecost view-selection Problem.i.e. to select a set of materialized views under certain resource constraints for the purpose of minimizing the total query processing cost. A new constrained evolutionary algorithm has been proposed. Constraints are incorporated into the algorithm through a stochastic ranking procedure. No penalty functions are used. New Stochastic Ranking Evolutionary Algorithm Similar to other algorithms this algorithm also uses both crossover and mutation. In this work the crossover operator used is uniform crossover operator It exchanges the information of two chromosomes to generate two new chromosomes. The mutation operator we use is similar to the most usually used one, this algorithm differs from other evolutionary algorithm is that it uses the modified stochastic rank procedure. It is used for ranking the union of new and old individuals. [1] in this paper the author have developed the algorithms to select a set of views to materialize in data warehouse in order to minimize the total view

80


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 maintenance cost under the constraint of a given query response time. They named this problem as query cost view selection problem (QC_VSP). To solve this QC_VSP problem author define and constructed View selection cost graph (VSCG). In order to carry out the storage and computation of view effectively from this VSCG a view node matrix (VNM) is constructed. Author gave three algorithms for solving QC_VSP; namely Greedy Dynamic Improved Algorithm (GR_VSP), genetic algorithm (GA_VSP) and a improved synthesis algorithm (SAGA_VSP). They do the experiments to show that the algorithm they adapted produce optimal solution. The author combines the genetic algorithm with the simulated annealing algorithm and solve QC_VSP problem in the paper. The SAGA_VSP algorithm proposed can produce better solution. The approximate optimization solution produced by combining algorithms is stable. Simulation is adopted. The results show that this algorithm works better in practical cases. They implemented their algorithms and a performance study of the algorithms shows that the proposed algorithm delivers an optimal solution. Finally, they discuss the observed behavior of the algorithms. They also identify some important issues for future investigations. By carrying out experiment and comparing above mentioned approaches they attain the following conclusions. (1)So as to QC_VSP problems could be solved by applying random algorithm. In the paper, they adopt genetic algorithm and the synthesis algorithm which is composed by genetic algorithm and simulated annealing algorithm and obtain a approximate optimum materialized views. (2) Random algorithm is better than traditional greedy algorithm in VSP, whatever is the quality of solution and spatial cost. The solutions of synthesis algorithm are better than that of single random algorithm.

(3) The parameters could be adjusted to obtain balance between the quality and performing time of solution in the random method. This method is suitable to solve the large-scale problems. It will be valuable tool in data warehouse exploitation to perform materialized view selection in the random algorithm. 3. PROBLEM SPECIFICATION To avoid accessing the original data sources and increase the efficiency of the queries posed to a DW, Based on a set of frequently asked DW queries, select a set of views to materialize so that the total query and maintenance cost is minimized under certain constraints these constraints like disk space and maintenance cost. Maintenance cost consider that how long the views must be updated. The overall considered scenario is shown in the figure [10].

Figure1: space and time reduction scheme Hence, view selection problem is formally defined as a process of identifying and selecting a group of materialized views that are most closely-associated to user-defined requirements in the form of queries in order to minimize the query response time, maintenance cost and query processing time under certain resource constraints [1].

Author

Technique

Issue Addressed

Proposed work

Pros

Cons

tools

Noha A.R. Yousri, Khalil M.Ahmed, Nagwa M. EI -Makky ( 2005) [8]

Materialized view selection

View selection under disk space & maintenance cost

Decrease time complexity

IMDVSA Uses only incremental update

Not addressed

Chuan Zhang, Xin Yao, and Jian Yang (2001)[7]

An hybrid evolutionary algorithm

Materialized view selection

IRVSA incremental Recomputation strategy View selection algorithm + IMDVSA (Incremental Strategy Materialized Descendants view selection algorithm). An abstract framework with hybrid evolutionary algorithm

Reduced query and view selection cost

Not addresses

SUN OS 5.5.& C language

81


International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

Jeffrey Xu Yu, Xin Yao, ChiHon Choi, and Gang Gou[9]

Select materialized views

View selection under disk space & maintenance cost

Stochastic Ranking Evolutionary Algorithm

Minimize total query processing cost for OLAP queries

Not address ed

Lijuan Zhou1,Min Xu ,Qian Shi ,Zhongxiao Hao(2008) [1]

View selection & maintenance cost

View selection under disk space & maintenance cost constraints.

Comparison of Greedy Algorithm (GR_VSP) + genetic algorithm. (GA_VSP) + synthesis algorithm (SAGA_VSP), for QC_VSP (Query cost_ view selection problem)

reducing search space minimizing maintenance cost

Single random algorith m

Sun Blade/ 1000 workstation with a 750 M Hz UltraSPAR C-III, Solaris 2.8 Not addressed

4. CONCLUSION Materialized view query and maintance cost being an important parameters for materialized view selection. The various methods for materialized view selection cost have been analysed by considering the diffrent parameters. Comparitive analysis of the methods have been done and arranged in a tabular format. REFERENCES [1] Lijuan Zhou, Min Xu, Qian Shi, Zhongxiao Hao. Research on Materialized Views Technology in Data Warehouse. Proceeding of Knowledge Acquisition and Modeling Workshop, 2008. KAM Workshop 2008. IEEE International Symposium on, 21-22 Dec. 2008. [2] Shukla, A., Deshpande, P.M., and Naughton, J.F. Materializes View Selection for Multidimensional Datasets. In Proc. 24th VLDB Conference, pp. 488-499, August 1998. [3] H. Gupta. “Selection of Views to Materialize in a Data Warehouse”. Proceedings of International Conference on Database Theory, Athens, Greece 1997. [4] C. Zhang, X. Yao, and J. Yang. “An evolutionary Approach to Materialized View Selection in a Data Warehouse Environment”. IEEE Transactions on Systems, Man and Cybernetics, vol. 31, no.3, pp. 282–293, 2001. [5] Leonardo Weiss F. Chaves, erik Bhchman,Fabian hueske, Klemens bohm. “Towards Materialized View Selection for Distributed Databases”. ACM EDBT 2009,saint Petersburg Russia. [6] H. Gupta and I. S. Mumick, “Selection of views to materialize under a maintenance cost

constraint,” in Proc. 7th Int. Conf. Database Theory, 1999, pp. 453–470. [7] Chuan Zhang, Xin Yao, Senior Member, IEEE, and Jian Yang “An Evolutionary Approach to Materialized Views Selection in a Data Warehouse Environment” IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 31, NO. 3, AUGUST 2001. [8] Noha A. R. Y ousri Khalil M. Ahmed Nagwa M. El-Makky, “Algorithms for Selecting Materialized Views in a Data Warehouse”,IEEE,2005. [9] Jeffrey Xu Yu, Xin Yao, Fellow, IEEE, Chi-Hon Choi, and Gang Gou,” Materialized View Selection as Constrained Evolutionary Optimization”,IEEE,2003. [10] T.Nalini, , Dr. A.Kumaravel, Dr.K.Rangarajan, “A comparative study analysis of materialized view for selection cost”, International Journal of Computer Science & Engineering Survey (IJCSES) Vol.3, No.1, February 2012.

82


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.