The IT Perspective: Data Warehousing, Management, and Analytical Structures
Objectives • Explain the basics of: Master Data Management Data Warehousing ETL OLAP/Multidimensional Data
SQL Services – Why? • Install only the ones you need • Which? – Integration Services • Get your data from the world outside (ETL)
– Analysis Services • Cubes, Data Mining, support for PowerPivot on SharePoint
– Reporting Services • DIY Report Builder and traditional “big” reports
– Master Data Services • Quality of critical master data (cities, colours, customers)
– Database Engine • Data warehouse and OLTP relational storage
Master Data Management
MDM • Ensures consistency of data across all organisational uses • Impacts overall data quality • Processes and tools for: – Collection, aggregation, matching, distribution, and persistence of master data • Consistently
– Related to Federated Data Management
• Key to MDM: Modelling
Why MDM? It’s About Evolution of Enterprise Architecture
MDM Processes
• Batched Acquisition from Staging Tables • Members, Attributes, Parent-Child Relationships • SQL Integration Services
Import & Integration
Modeling • Versioning Changes • Auditing • Compliance • Tracking of Instances
• Subscription Views • Export to: • Operational Systems • Data Warehouses • BI Analytics • Reporting Tools
Export & Subscription
Microsoft Master Data Services SQL 2008 R2 Enterprise, Datacenter, Developer
• Tools: – Master Data Manager • Primary tool for managing your master data
– MDS Configuration Manager • IT Pro tool
– MDS Web Service • For developers wanting to extend MDS
• Concepts: – – – – – – – –
Models Entities Attributes Members Hierarchies Collections Versions Database
Modelling Master Data • Model organises data at highest level – Allowing versioning of changes to data
• There are typically four categories of models: – – – –
People (Customers, Staff) Places (Geographies, Cities, Countries) Things (Products) Concepts (Accounts, Behaviours, Transactions)
Example: Product MDM Model Product (model)
Product (entity)
Name (freeform attr)
Code (freeform attr)
Subcategory (domainbased attr)
StandardCost (free-form attr)
Name (freeform attr)
Code (freeform attr)
Category (domainbased attr)
Name (freeform attr)
ListPrice (free-form attr)
Code (freeform attr)
Photo (file attr)
Data Warehouse
Rich Connectivity Data Providers ODBC SQL Server
SAP
NetWeaver BI SQL Server
Integration Services
DB2
MySAP
SQL Server
Report Server Models
Teradata
XML
SQL Server
Data Mining Models
SQL Server
Analysis Services
OLE DB
Hyperion Essbase
Oracle
Star Schema
Star Schema Benefits • Simple, not-so-normalised model • High-performance queries – Especially with Star Join Query Optimisation
• Mature and widely supported • Low-maintenance
Snowflake Dimension Tables • Define hierarchies using multiple dimension tables • Support fact tables with varying granularity • Simplify consolidation of heterogeneous data
Potential for slower query performance in relational reporting No difference in performance in Analysis Services database
Fact Table Fundamentals • Collection of measurements associated with a specific business process • Specific column types – Foreign keys to dimensions – Measures – numeric and additive – Metadata and lineage
• Consistent granularity – the most atomic level by which the facts can be defined
Fact Table Examples
Reseller sales data by: •Product •Order Date •Reseller •Employee •Sales Territory
Sales quota data by: •Employee •Time
Quarter Grain Day Grain
Date Dimension Table • Most common dimension used in analysis (aka Time dimension) • Use consistently with all facts • Useful common attributes – Year, Quarter, Month, Day – Time series analysis support – Navigation and summarisation enabled with hierarchies, such as calendar or fiscal
• Single table design (typically not snowflake design) Tip: Format the key of the dimension as yyyymmdd (e.g. 20100115) to make it readily understandable
Parent-Child Hierarchy • A dimension that contains a parent attribute • A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table • Common examples – Organisational charts – General Ledger structures – Bill of Materials
Parent-Child Hierarchy Example
Brian Amy
Stacia
Shu
Stephen
Syed
Michael
JosĂŠ
Peter
Slowly Changing Dimensions • Maintain historical context as dimension data changes • Three common ways (there are more): – Type 1: Overwrite the existing dimension record – Type 2: Insert a new ‘versioned’ dimension record – Type 3: Track limited history with attributes
Integration and ETL
Let’s do ETL with SSIS • SQL Server Integration • SSIS object model • Two distinct runtime
Services (SSIS) service
engines:
– Control flow – Data flow
• 32-bit and 64-bit
editions
The Package • The basic unit of work, deployment, and execution • An organised collection of: – – – – – –
Connection managers Control flow components Data flow components Variables Event handlers Configurations
• Can be designed graphically or built programmatically • Saved in XML format to the file system or SQL Server
Control Flow • Control flow is a process-oriented workflow engine • A package contains a single control flow • Control flow elements – – – –
Containers Tasks Precedence constraints Variables
Data Flow • The Data Flow Task – Performs traditional ETL and more – Fast and scalable • Data Flow Components – Extract data from Sources – Load data into Destinations – Modify data with Transformations • Service Paths – Connect data flow components – Create the pipeline
demos Using SQL Server Integration Services Objectives: Gain an overview understanding of UI and approach of Integration Services Appreciate additional capability and functionality
demos What did we see? How to build up SSIS tasks How to deploy SSIS packages and deliver comprehensive ETL
OLAP/Multidimensional Data
Cube = Unified Dimensional Model • Multidimensional data • Combination of measures and dimensions as one conceptual model – Measures are sourced from fact tables – Dimensions are sourced from dimension tables
Dimensions • Members from tables/views in a data source view (based on a Data Warehouse) • Contain attributes matching dimension columns • Organise attributes as hierarchies – One All level and one leaf level – User hierarchies are multi-level combinations of attributes – Can be placed in display folders
• Used for slicing and dicing by attribute
Hierarchies • Benefits – View of data at different levels of summarisation – Path to drill down or drill up
• Implementation – Denormalised star schema dimension – Normalised snowflake dimension – Self-referencing relationship
Hierarchy • • • •
Defined in Analysis Services Ordered collection of attributes into levels Navigation path through dimensional space Very important to get right! Customers by Geography
Customers by Demographics
Country Marital State Gender City Customer Customer
Measure Group • Group of measures with same dimensionality • Analogous to a fact table • Cube can contain more than one measure group – E.g. Sales, Inventory, Finance
• Defined by dimension relationships
Measure Group Measure Group
Dimension
Sales
Inventory
Customers
X
Products
X
X
Time
X
X
Promotions
X
Warehouse
Finance
X
X
Department
X
Account
X
Scenario
X
Dimension Relationships • Define interaction between dimensions and measure groups • Relationship types – – – – –
Regular Reference Fact (Degenerate) Many-to-many Data mining
Calculations • Expressions evaluated at query time for values that cannot be stored in fact table • Types of calculations – Calculated members – Named sets – Scoped assignments
• Calculations are defined using MDX MDX = MultiDimensional EXpressions
demos
Using BIDS to Review Dimension Design Cube Design and Functionality Objectives: Introduction to Business Intelligence Development Studio Illustrate Cube concepts and design, editing and deployment of Analysis Services
demos What did we see? How to design and build SSAS databases How to deploy SSAS databases
Summary • As a platform for enterprise Business Intelligence you should consider four services: – – – –
Data Warehouse (can be relational) Process for Data Management (MDS) Process for Data Integration (ETL) Analysis (OLAP, Data Mining, Columnar)
= SQL Server 2008 R2
Š 2010 Microsoft Corporation & Content and Code Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Content and Code. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. Š 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Content and Code as of the date of this presentation. Because Content and Code & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Content and Code cannot guarantee the accuracy of any information provided after the date of this presentation. Content and Code no warranties, express, implied or statutory, as to the information in this presentation. E&OE.