Paper id 26201438

Page 1

International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637

Analyzing the Materialized View Selection Cost Mrs. P.S. Gotmare1, Mr. A. Mohod2, Ms. M.J. Sawarkar3, Mrs. R. S. Ashtankar4 Information Technology1,4 , Computer Science & Engineering2, 3, P.J.L.C.E.1, 2, 3, 4 Email: priya.gotmare@gmail.com1 , mohod.ashish@gmail.com2 Abstract- Data warehouse store information collected from the heterogeneous, sovereign and distributed databases. Many applications require access to the distributed data warehouses. The distributed data warehouses are collected at the centralized data warehouse for the sec of maintenance. The materialized view can significantly improve the performance of distributed databases. Materialized view selection impacts on the efficiency as well as the total cost of establishing and running a data warehouse. Selecting views for materialization plays the crucial role. When join query are executed repeatedly could induce the traffic, affects the overall performance of the distributed system. A Materialized Query Table Advisor (MQTA) is often used to recommend and create Materialized view. Index Terms- data warehouse, materialized view, algorithm, MV 1. INTRODUCTION Data warehouse is the repository of large data from various data sources consist of set of materialized views. Materialized view is a technique which helps to answer user queries quickly, effectively and efficiently. Materialized views are the derived relations, which are stored as relations in the database. In materialized view recompilation of query is avoided due to which it is possible to improve the performance of query execution. In distributed environment where many heterogeneous nodes are used with different constraints on CPU,IO are used, where each node issue many different queries and update at different rate this is called distributed view selection problem[5]. When a base relation is update, all its dependant materialized views have to be updated in order to maintain the consistency and integrity of the database. Due to this it is difficult to find the optimal set of materialized views in complex, distributed scenario. The process of updating a materialized view in response to the changes in the base relation is called ‘View Maintenance’ that incurs a ‘View Maintenance Cost’. Due to the space and time constraints it is not possible to materialize all the views. This need to select appropriate set of vies to materialize for answering the queries was denoted view selection problem (VSP). This impact on the efficiency as well as the total cost of establishing and running the data warehouse. Materialized views associated with two types of cost special cost and query cost. Firstly the materialized view need storage space this leads to special cost, secondly when user’s query make use of materialized view leads to query cost. Three factors must be considered while selecting materialized view that is query cost, maintenance cost and special cost [1]. Therefore, to select an appropriate set of a view is the major target that diminishes the entire query response time and also maintains the selected views. So, many literatures try

to make the sum of that cost minimal [2-4] Materialization of all possible views is not recommended due to memory space and time constraints [1]. Now a days disk space is very cheap due to this the factor that prevents us from materializing all the views in the data warehouse is not the space constraint but fast answers for queries. the real constraining factor is the response time that reflects users’ needs for fast answers. MV selection for complex distributed scenario is difficult because of the fact that the distributed view selection problem is known to be NP-hard [3], due to the fact that the distributed DBMS consisting of nodes with different above mentioned resource constraints, this results in non monotonic cost model and hence the greedy algorithm which propose MVs based on optimal solution cannot be applied. The number of possible views to materialize grows exponentially with the number of computer nodes and queries, and with the numbers of columns, joins predicates, grouping clauses and tables referenced in each query. Due to the huge solution space, brute-force strategies, e.g., backtracking, as well as biology inspired solutions like ant colony optimization or genetic algorithms cannot be applied directly[5]. The distributed view selection it is important to obtain good MVs even thought the underlined cost model is simplified or inaccurate. In different business scenario guaranteed quality of services is required and some machines may be designed for specific queries. The view selection must allow exchange of model. Materialized view selection consists of three optimization problems, i.e., query optimization, multiple query optimization, and materialized view selection. The layout of the paper is as follows. In section 2, we address related work. Section 3, presents a comparative study of the various research works explored in the previous section. Lastly, we conclude in section 4 and section 5 will provide the references.

79


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.