DATA WAREHOUSE
What is a Data Warehouse ď ľ
A Data Warehouse can be defined as a centralized, consistent data store or Decision Support System (OLAP), for the end business users for analysis, prediction and decision making in their business operations.
ď ľ
Data from various enterprise-wide application/transactional source systems (OLTP), are extracted, cleansed, integrated, transformed and loaded in the Data Warehouse.
Benefits of data warehouse ď ľ
A data warehouse helps to integrate data and store them historically so that we can analyze different aspects of business including, performance analysis, trend, prediction etc. over a given time frame and use the result of our analysis to improve the efficiency of business processes.
ď ľ
Data warehouses are built to facilitate reporting on different key business processes of an organization, known as KPI.
ď ľ
Data warehouse can be further used for trend prediction, forecasts, pattern recognition etc.
Properties of a Data Warehouse
Subject Oriented - Data Warehouse is designed based on the major subjects areas of the business, rather than the major application areas of the enterprise. The ultimate necessity is to store decision-support data rather than application-oriented data.
Integrated - The data in Data Warehouse comes from different enterprise-wide application source systems. The source data is often inconsistent in nature. During the Integration process, source data must be made consistent in Data Warehouse so that the data becomes homogeneous and uniform.
Time-Variant - The source data in the Data Warehouse is only accurate and valid at some point in time or over some time interval.
Non-Volatile - Data in Data Warehouse is not updated or replaced in real time but is refreshed from operational systems on a regular basis. New data is always added incrementally integrating it with the previous data set.
Schematic View of a data warehouse
ETL process ETL (Extract, Transform and Load) is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks: ď ľ
extracting the data from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing.
continued‌.
Transforming the data may involve the following tasks:
Applying business rules
Cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),
Filtering (e.g., selecting only certain columns to load)
Splitting a column into multiple columns and vice versa
Joining together data from multiple sources
Transposing rows and columns
Applying any kind of simple or complex data validation
loading the data into a data warehouse or data repository other reporting applications
ETL layer is mostly responsible for 2 types of treatments on the data: ď ľ
Data Integration - process of combining heterogeneous data sources in to a single queriable schema so as to get an unified view of these data.
ď ľ
Qualitative Treatment - so that the validity and quality of the data can be checked (and if required corrected) before loading to the data warehouse.
THANK YOU