Data warehouse architectures - I

Page 1

Data Warehouse Architectures - I Sabir Asadullaev, Execitive IT Architect, SWG IBM EE/A 19.10.2009 http://www.ibm.com/developerworks/ru/library/sabir/axd_1/index.html

Abstract This paper starts a series of three articles on data warehousing (DW) architectures and their predecessors. The abundance of various approaches, methods and recommendations makes a mess of concepts, advantages and drawbacks, limitations and applicability of specific architecture solutions. The first article is concerned with the evolution of OLAP role understanding, with DW architecture components, with virtual DW and independent data marts. The second article considers the Centralized DW (CDW) with ETL (Extract, Transform, Load), CDW with ELT (Extract, Load, Transform), CDW with operational data store, and extended model with data marts. The third article discusses centralized ETL with parallel DW and data marts; DW with intermediate application data marts, DW with integration bus, and the recommended DW architecture.

OLAP and OLTP Any transactional system usually contains two types of tables. One of them is responsible for quick transactions. For example, ticket sale system has to ensure reliable exchange of short messages between the system and a large number of ticket agents. Indeed, entered and printed information that deals with passengers’ names, flight dates, flight number, seat, flight destination, can be estimated to be around 1000 bytes. Thus, passenger service requires fast processing of short records. Another type of tables contains a summary of sales for a specified time, by directions, and by categories of passengers. These tables are used by analysts and financial specialists in the end of month, quarter, or year, when company’s financial results are needed. And if the number of analysts is ten times smaller than the number of ticket agents, the volume of data required for analysis exceeds an average transaction size by several orders of magnitude. Not surprisingly, execution of analytical queries increases the system’s response time to ticket availability request. Creating a system with a reserve of computational power can mitigate the negative impact of analytical processing load on transactional activity, but leads to a significant cost increase of the required software and hardware, due to the fact that excess processing capacity remains unused most of the time. The second factor that led to the separation of analytical and transactional systems is different requirements that are applied by analytical and transactional systems to computing systems. The OLAP story begins in 1993 when an article "Providing OLAP (online analytical processing) to users - analysts" was published [1]. Initially, it appeared that the separation of transactional and analytical systems (OLTP - OLAP) is sufficient. However, it soon became clear that OLAP systems cope badly with the task of mediator between various data sources (including transactional systems) and client applications. It became clear that analytical data storage environment is required. Initially shared databases were proposed for this role, which implied the need to copy and store the original data from data sources. This idea turned out to be quite unviable, as transactional systems were commonly developed without a unified plan, and thus they contained conflicting and inconsistent information.

5


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Data warehouse architectures - I by Sabir Asadullaev - Issuu