The IT Perspective of Business Intelligence

Page 1

The IT Perspective: Data Warehousing, Management, and Analytical Structures


Objectives • Explain the basics of: Master Data Management Data Warehousing ETL OLAP/Multidimensional Data



SQL Services – Why? • Install only the ones you need • Which? – Integration Services • Get your data from the world outside (ETL)

– Analysis Services • Cubes, Data Mining, support for PowerPivot on SharePoint

– Reporting Services • DIY Report Builder and traditional “big” reports

– Master Data Services • Quality of critical master data (cities, colours, customers)

– Database Engine • Data warehouse and OLTP relational storage


Master Data Management


MDM • Ensures consistency of data across all organisational uses • Impacts overall data quality • Processes and tools for: – Collection, aggregation, matching, distribution, and persistence of master data • Consistently

– Related to Federated Data Management

• Key to MDM: Modelling


Why MDM? It’s About Evolution of Enterprise Architecture


MDM Processes

• Batched Acquisition from Staging Tables • Members, Attributes, Parent-Child Relationships • SQL Integration Services

Import & Integration

Modeling • Versioning Changes • Auditing • Compliance • Tracking of Instances

• Subscription Views • Export to: • Operational Systems • Data Warehouses • BI Analytics • Reporting Tools

Export & Subscription


Microsoft Master Data Services SQL 2008 R2 Enterprise, Datacenter, Developer

• Tools: – Master Data Manager • Primary tool for managing your master data

– MDS Configuration Manager • IT Pro tool

– MDS Web Service • For developers wanting to extend MDS

• Concepts: – – – – – – – –

Models Entities Attributes Members Hierarchies Collections Versions Database


Modelling Master Data • Model organises data at highest level – Allowing versioning of changes to data

• There are typically four categories of models: – – – –

People (Customers, Staff) Places (Geographies, Cities, Countries) Things (Products) Concepts (Accounts, Behaviours, Transactions)


Example: Product MDM Model Product (model)

Product (entity)

Name (freeform attr)

Code (freeform attr)

Subcategory (domainbased attr)

StandardCost (free-form attr)

Name (freeform attr)

Code (freeform attr)

Category (domainbased attr)

Name (freeform attr)

ListPrice (free-form attr)

Code (freeform attr)

Photo (file attr)


Data Warehouse


Rich Connectivity Data Providers ODBC SQL Server

SAP

NetWeaver BI SQL Server

Integration Services

DB2

MySAP

SQL Server

Report Server Models

Teradata

XML

SQL Server

Data Mining Models

SQL Server

Analysis Services

OLE DB

Hyperion Essbase

Oracle


Star Schema


Star Schema Benefits • Simple, not-so-normalised model • High-performance queries – Especially with Star Join Query Optimisation

• Mature and widely supported • Low-maintenance


Snowflake Dimension Tables • Define hierarchies using multiple dimension tables • Support fact tables with varying granularity • Simplify consolidation of heterogeneous data

Potential for slower query performance in relational reporting No difference in performance in Analysis Services database


Fact Table Fundamentals • Collection of measurements associated with a specific business process • Specific column types – Foreign keys to dimensions – Measures – numeric and additive – Metadata and lineage

• Consistent granularity – the most atomic level by which the facts can be defined


Fact Table Examples

Reseller sales data by: •Product •Order Date •Reseller •Employee •Sales Territory

Sales quota data by: •Employee •Time

Quarter Grain Day Grain


Date Dimension Table • Most common dimension used in analysis (aka Time dimension) • Use consistently with all facts • Useful common attributes – Year, Quarter, Month, Day – Time series analysis support – Navigation and summarisation enabled with hierarchies, such as calendar or fiscal

• Single table design (typically not snowflake design) Tip: Format the key of the dimension as yyyymmdd (e.g. 20100115) to make it readily understandable


Parent-Child Hierarchy • A dimension that contains a parent attribute • A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table • Common examples – Organisational charts – General Ledger structures – Bill of Materials


Parent-Child Hierarchy Example

Brian Amy

Stacia

Shu

Stephen

Syed

Michael

JosĂŠ

Peter


Slowly Changing Dimensions • Maintain historical context as dimension data changes • Three common ways (there are more): – Type 1: Overwrite the existing dimension record – Type 2: Insert a new ‘versioned’ dimension record – Type 3: Track limited history with attributes


Integration and ETL


Let’s do ETL with SSIS • SQL Server Integration • SSIS object model • Two distinct runtime

Services (SSIS) service

engines:

– Control flow – Data flow

• 32-bit and 64-bit

editions


The Package • The basic unit of work, deployment, and execution • An organised collection of: – – – – – –

Connection managers Control flow components Data flow components Variables Event handlers Configurations

• Can be designed graphically or built programmatically • Saved in XML format to the file system or SQL Server


Control Flow • Control flow is a process-oriented workflow engine • A package contains a single control flow • Control flow elements – – – –

Containers Tasks Precedence constraints Variables


Data Flow • The Data Flow Task – Performs traditional ETL and more – Fast and scalable • Data Flow Components – Extract data from Sources – Load data into Destinations – Modify data with Transformations • Service Paths – Connect data flow components – Create the pipeline


demos Using SQL Server Integration Services Objectives: Gain an overview understanding of UI and approach of Integration Services Appreciate additional capability and functionality


demos What did we see? How to build up SSIS tasks How to deploy SSIS packages and deliver comprehensive ETL


OLAP/Multidimensional Data


Cube = Unified Dimensional Model • Multidimensional data • Combination of measures and dimensions as one conceptual model – Measures are sourced from fact tables – Dimensions are sourced from dimension tables


Dimensions • Members from tables/views in a data source view (based on a Data Warehouse) • Contain attributes matching dimension columns • Organise attributes as hierarchies – One All level and one leaf level – User hierarchies are multi-level combinations of attributes – Can be placed in display folders

• Used for slicing and dicing by attribute


Hierarchies • Benefits – View of data at different levels of summarisation – Path to drill down or drill up

• Implementation – Denormalised star schema dimension – Normalised snowflake dimension – Self-referencing relationship


Hierarchy • • • •

Defined in Analysis Services Ordered collection of attributes into levels Navigation path through dimensional space Very important to get right! Customers by Geography

Customers by Demographics

Country Marital State Gender City Customer Customer


Measure Group • Group of measures with same dimensionality • Analogous to a fact table • Cube can contain more than one measure group – E.g. Sales, Inventory, Finance

• Defined by dimension relationships


Measure Group Measure Group

Dimension

Sales

Inventory

Customers

X

Products

X

X

Time

X

X

Promotions

X

Warehouse

Finance

X

X

Department

X

Account

X

Scenario

X


Dimension Relationships • Define interaction between dimensions and measure groups • Relationship types – – – – –

Regular Reference Fact (Degenerate) Many-to-many Data mining


Calculations • Expressions evaluated at query time for values that cannot be stored in fact table • Types of calculations – Calculated members – Named sets – Scoped assignments

• Calculations are defined using MDX MDX = MultiDimensional EXpressions


demos

Using BIDS to Review Dimension Design Cube Design and Functionality Objectives: Introduction to Business Intelligence Development Studio Illustrate Cube concepts and design, editing and deployment of Analysis Services


demos What did we see? How to design and build SSAS databases How to deploy SSAS databases


Summary • As a platform for enterprise Business Intelligence you should consider four services: – – – –

Data Warehouse (can be relational) Process for Data Management (MDS) Process for Data Integration (ETL) Analysis (OLAP, Data Mining, Columnar)

= SQL Server 2008 R2


Š 2010 Microsoft Corporation & Content and Code Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Content and Code. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Š 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Content and Code as of the date of this presentation. Because Content and Code & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Content and Code cannot guarantee the accuracy of any information provided after the date of this presentation. Content and Code no warranties, express, implied or statutory, as to the information in this presentation. E&OE.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.