Big Data for Enterprise: Managing Data and Values

Page 1

Big Data for Enterprise: Managing Data and Values

Tarun Sukhani NetCom Learning www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Agenda • • • • • • • • • • • • • •

Data & Information What is a Database Management System? File Management Systems Distribution Strategies for Databases Data Management Framework Key Supporting Data Management Components to Big Data Data Governance Council Roles and Responsibilities ETL Data Cleansing Overview of Big Data and Analytics Data Lake Hadoop & Its Role IoT and real-time data Modern Data Warehouse www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data and Information DATA: Facts concerning people, objects, vents or other entities. Databases store data. INFORMATION: Data presented in a form suitable for interpretation. Data is converted into information by programs and queries. Data may be stored in files or in databases. Neither one stores information. KNOWLEDGE: Insights into appropriate actions based on interpreted data.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Using a DBMS Data

• Database Design • Metadata

DBMS Engine Access

• Direct access • Host language

Data Management

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Basic Principles

DATABASE: A shared collection of interrelated data designed to meet the varied information needs of an organization. DATABASE MANAGEMENT SYSTEM: A collection of programs to create and maintain a database. Define Construct Manipulate

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Advantages of Database Processing • More information from same data • Shared data • Balancing conflicts among users • Controlled redundancy • Consistency

• • • •

Integrity Security Increased productivity Data independence

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Disadvantages of Database Processing • Increased size • Increased complexity

• More expensive personnel

• Increased impact of failure • Difficulty of recovery • Cost

• Especially server and mainframe systems

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Objectives of the DBMS Approach • SELF-DESCRIBING • DATA INDEPENDENCE • MULTIPLE VIEWS • MULTIPLE USERS

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


What is a Database Management System?

Data Files Directory Access Engine Utility Programs

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Database

DATA METADATA ACCESS ENGINE UTILITIES

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Files and Databases

Metadata “Data about data”

Description of fields Display and format instructions Structure of files and tables Security and access rules Triggers and operational rules

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


History of Database Management

• File Management Systems • Hierarchical Model IBM “Information Management System (IMS)” 1966

• Network Model Charles Bachman’s “Integraded Data Store (IDS)” 1965 Conference on Data Systems Languages /DataBase Task Group CODASYL/DBTG (1971)

• Relational Model E.F. Codd, 1970

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


File Management Systems

Provided facilities to extract data and share files, but did not implement any way to connect records in one file to those in another. Relationships had to be implemented in application code.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Database vs File Systems

Program 1

Meta-Data

Program 2

Meta-Data

Program 3

Meta-Data

FILE SYSTEM

Data

DATABASE Program 1 Program 2

MetaData

Data

Program 3

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Structured Databases

Relationships were implemented by physical pointers (called “sets”) which allowed records to be connected in different files. Hierarchical databases allow only one parent set; networks allow several. These permit efficient processing but the sets must be constructed on data entry and cannot be rearranged later.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Relational Models Relational models implement relationships with matched data values in related files (called primary and foreign keys). Any attributes can be matched. The connection is established at retrieval so interconnections can be developed as needed.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Hierarchy SECTION

STUDENT

COLLEGE

INSTRUCTOR

COLLEGE

Each file can have only one parent. To implement a second “parent” (COLLEGE) we have to implement a shadow copy.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Network

SECTION

STUDENT

INSTRUCTOR

COLLEGE Each file can have several parents. Both SECTION and COLLEGE are “parent” files..

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Relational

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Relational Terminology • Entity

• Person, place, thing or event about which we wish to keep data

• Attribute

• property of an entity

• Relationship

• an association among entities (entity records)

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Distribution Strategies for Databases Centralized Data and Processing: Dumb terminal with "screen scraping". Intelligent Terminal: Data and processing centralized; data preparation and display on remote devices. Distributed Logic: Data storage distributed; processed at the optimal location. A version of parallel processing. Client Server: Data (usually departmental) maintained on a server. Sub setting occurs on the server, processing on client machines. Distributed Database: Data distributed among different locations; processing access data wherever it is located. Data may be replicated or partitioned.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Data Management

Designing and managing information in a data base environment requires: • Understanding the principles of data modeling in system design. • Using SQL for data manipulation. • Understanding the concepts of managing data in a database environment.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Information System Modeling Approaches PROCESS MODELING: The traditional method of designing systems by following the changes to data flows. DATA MODELING: An approach to system development that specifies the file structure that conforms to the things important to the organization. PROTOTYPING: An iterative approach that focuses on building small operating OBJECT MODELING (Event driven design): Defines objects that contain data and associated processing rules encapsulated together.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Data Management Framework •

• • • •

Holistic approach to understand the information needs of the enterprise & its stakeholders Consistency for planning & process development 10 major functional areas, including governance Aligns data with business strategy (above) and technology (below) Takes into account the data lifecycle – creation through destruction Internationally recognized through Data Management Association International (DAMA)

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Key Supporting Data Management Components to Big Data • Data Governance – Exercise of authority and controls over the management of data assets. Policies, processes, standards, definitions, metrics. • Councils, stewards, trustees roles and responsibilities defined

• Data Architecture - Defines data requirements, guides integration and control of data assets, aligns data investments with business strategy. Part of an overall enterprise architecture framework • Enterprise data models, definitions, and taxonomies Enterprise data delivery

• Master Data Management – Control over master data values to enable consistent, contextual use across systems of the most accurate, timely and relevant version of truth about essential business entities. • Meta Data Management – Descriptive tags about data, concepts, and connections between data and concepts. • Business, technical, process, and stewardship

• Data Security – Planning, development, and execution of security policies and procedures to provide proper authentication, authorization www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Key Questions to Drive Business Value from Data • What business opportunity/problem are we trying to solve?

• How do we integrate the right data together?

• What questions do we need to answer to solve the problem?

• How do we manage the quality of the data?

• What data do we need to answer the questions? • What data do we have? • How can data help differentiate us in the market? • What data is IP for us? Revenue generating for us?

• What data does this relate to (master data)? • Do we have all the data about this (person, event, thing, etc.)? • What are the permissible purposes of the data? (compliance, regulatory environment)

• Who is allowed to access the data? Use this data?

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data Management Maturity in a Social Business

Partial Source: Social Business by Design, Dion Hinchcliffe www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Data Governance Council Roles and Responsibilities

DGC

DGP

Task Forces & Tiger Teams

Lines-of-Business

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Data Governance Operational View

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


ETL The process of updating the data warehouse.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Two Data Warehousing Strategies • Enterprise-wide warehouse, top down, the Inmon methodology • Data mart, bottom up, the Kimball methodology • When properly executed, both result in an enterprise-wide data warehouse

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


The Data Mart Strategy • The most common approach • Begins with a single mart and architected marts are added over time for more subject areas • Relatively inexpensive and easy to implement • Can be used as a proof of concept for data warehousing • Can perpetuate the “silos of information” problem • Can postpone difficult decisions and activities • Requires an overall integration plan

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


The Enterprise-wide Strategy

• A comprehensive warehouse is built initially • An initial dependent data mart is built using a subset of the data in the warehouse • Additional data marts are built using subsets of the data in the warehouse • Like all complex projects, it is expensive, time consuming, and prone to failure • When successful, it results in an integrated, scalable warehouse

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data Sources and Types

• • • •

Primarily from legacy, operational systems Almost exclusively numerical data at the present time External data may be included, often purchased from third-party sources Technology exists for storing unstructured data and expect this to become more important over time

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Extraction, Transformation, and Loading (ETL) Processes

• The “plumbing” work of data warehousing • Data are moved from source to target data bases • A very costly, time consuming part of data warehousing

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Recent Development: More Frequent Updates • Updates can be done in bulk and trickle modes • Business requirements, such as trading partner access to a Web site, requires current data • For international firms, there is no good time to load the warehouse

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Recent Development: Clickstream Data

• Results from clicks at web sites • A dialog manager handles user interactions. An ODS (operational data store in the data staging area) helps to custom tailor the dialog • The clickstream data is filtered and parsed and sent to a data warehouse where it is analyzed • Software is available to analyze the clickstream data

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data Extraction

• Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) • Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems) • Increasing performed by specialized ETL software

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Sample ETL Tools

• Teradata Warehouse Builder from Teradata • DataStage from Ascential Software • SAS System from SAS Institute • Power Mart/Power Center from Informatica • Sagent Solution from Sagent Software • Hummingbird Genio Suite from Hummingbird Communications

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Reasons for “Dirty” Data • • • • • • • • • •

Dummy Values Absence of Data Multipurpose Fields Cryptic Data Contradicting Data Inappropriate Use of Address Lines Violation of Business Rules Reused Primary Keys, Non-Unique Identifiers Data Integration Problems

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data Cleansing

• Source systems contain “dirty data” that must be cleansed • ETL software contains rudimentary data cleansing capabilities • Specialized data cleansing software is often used. Important for performing name and address correction and householding functions • Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Steps in Data Cleansing

• Parsing • Correcting • Standardizing • Matching • Consolidating

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Parsing

• Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files. • Examples include parsing the first, middle, and last name; street number and street name; and city and state.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Correcting

• Corrects parsed individual data components using sophisticated data algorithms and secondary data sources. • Example include replacing a vanity address and adding a zip code.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Standardizing • Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules. • Examples include adding a pre name, replacing a nickname, and using a preferred street name.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Matching • Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications. • Examples include identifying similar names and addresses.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Consolidating •

Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Data Staging • Often used as an interim step between data extraction and later steps • Accumulates data from asynchronous sources using native interfaces, flat files, FTP sessions, or other processes • At a predefined cutoff time, data in the staging file is transformed and loaded to the warehouse • There is usually no end user access to the staging file • An operational data store may be used for data staging

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data Transformation

• Transforms the data in accordance with the business rules and standards that have been established • Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Data Loading • Data are physically moved to the data warehouse • The loading takes place within a “load window” • The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Meta Data

• Data about data • Needed by both information technology personnel and users • IT personnel need to know data sources and targets; database, table and column names; refresh schedules; data usage measures; etc. • Users need to know entity/attribute definitions; reports/query tools available; report distribution information; help desk contact information, etc.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Recent Development: Meta Data Integration

• A growing realization that meta data is critical to data warehousing success • Progress is being made on getting vendors to agree on standards and to incorporate the sharing of meta data among their tools • Vendors like Microsoft, Computer Associates, and Oracle have entered the meta data marketplace with significant product offerings

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Overview of Big Data and Analytics

What differentiates today’s thriving organizations?

Data. www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


What is Big Data, really?

Data in all forms & sizes is being generated faster than ever before

Capture & combine it for new insights & better, faster decisions

11

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Collect any data

Harness the growing and changing nature of data Structured

Unstructured

Streaming

“”

Challenge is combining transactional data stored in relational databases with less structured data Big Data = All Data Get the right information to the right people at the right time in the right format www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


An illustration of the velocity of data created

Kalakota, R. (2012, October 22). Sizing “Mobile + Social” Big Data Stats. Retrieved from http://practicalanalytics.wordpress.com/

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


The three V’s

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Value

Technology innovation accelerates value

Machine learning In-memory Operational reporting Ad hoc analysis Transactional systems Complex implementations

ETL

OLAP

Any data

Internet of Things

Dashboards

Hadoop

Enterprise data warehouse

Spreadmarts

Siloed data

Innovation www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Discover and connect

Answering new questions

Value

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Put data to work for everyone in your organization

Inspire innovation

Accelerate decision-making Learn from & share insights

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Embrace Big Data across your business

Marketing

Finance

Sales

HR

Build deeper customer relationships

Impact your company’s bottom line

Improve revenue performance

Maximize employee engagement

Units Sold, Discounts, and Profit before Tax

XT2000 Status List Show Only Problems Indicator

Revenue and Target by Region

Status

Sales

15

2M

R&D

Book Advertising Slots Fall Showcase Event Analysis

Product D

10

Product C

1M

0.5M

5

60K

70K

80K

0 90K

100K

Human Resources Finance

Customer Support

Product G

0M 50K

IT Region: South Target: 13450 Highlighted: 4900

Product F

End User Survey Technical Review Milestone

Marketing

1.5M

(Thousands )

Materials and Packaging Review

Discounts (Millions)

Product A

Preliminary Budget

Departments Headcount

Administration North

South

110

Revenue

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Target

Accounting 0

5

10

Š1998-2018 NetCom Learning

15


The Data Divide

70% of data generated by customers

80% of data stored

3% prepared for analysis

0.5% being analyzed

<0.5% being operationalized

IDC says that right now, about 22% of data is useful. By 2020 that number will climb to 37%. www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Major Fail

Gartner: “Through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation” Paradigm4: 76% of those who have used Hadoop or Apache Spark complained of significant limitations www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Analytics Solution

Capture and integrate data

Derive insight from data

from multiple internal and external sources

with rich, interactive dashboards and reports using the tools you know

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Put insight into action

to increase efficiency and constituent satisfaction

Š1998-2018 NetCom Learning


Advanced Analytics Defined

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Analytics Example

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


The end result of Big Data - Icing on the cake

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Use Cases

Data Analytics is neededeverywhere Legal discovery and document archiving

Social network analysis

Traffic flow optimization

Recommendation engines

Churn analysis

Location-based tracking & services

Oil & Gas exploration

Weather forecasting for business planning

Healthcare outcomes

Personalized Insurance

Fraud detection

Life sciences research

Advertising analysis

Equipment monitoring

Pricing Analysis

Smart meter monitoring

Intelligence Gathering

IT infrastructure & Web App optimization

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Personalized Insurance

Insurance companies can help (and some have already started helping) their customers with truly personalized insurance plans tailored to their needs and risks Insurance Companies can collect real-time data from incar sensors and combine it with geolocation and in-house systems. With information such as distance and speed, provide personalized insurance offers based on driving amount, risk, and other factors, for a truly personalized plan that may often save drivers money

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Personalized policies can reduce costs & better meet customer needs

$1,600/yr.

US national avg. car insurance premium

Š1998-2018 NetCom Learning


Recommendation Engines

Retailers can use customer purchase & rating information to serve recommendations to current customers, based on similarities across many dimensions The vast amount of current and ever-growing customer purchase, rating and click data can all be collected and managed with an Hadoop-based solution, to pinpoint preferences based on purchase history and demographics, and be able to serve useful and compelling cross-sell and up-sell recommendations.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Significantly improve up-sell and cross-sell opportunities

158

Items sold/second by Amazon.com on 11/29/2010 (Cyber Monday)

Š1998-2018 NetCom Learning


Pricing Analysis

Retailers can use customer past purchase, preference, and demographic information to serve realtime custom pricing, instant discounts when near the store. Retailers – whether large, small, online or in-store – can improve margins with more detailed pricing analysis. When a customer is in range of a transaction (either in the store, online or perhaps passing by), offer personalized offers, real-time price quotes, or other frequent-buyer perks to help bring more customers to the store and improve repeat business.

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Significantly improve sales and customer satisfaction

up to 30%

Additional price Mac users accepted for travel from Orbitz

Š1998-2018 NetCom Learning


Using Big data to determine the best train schedules

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Data Lake

What is a datalake? A storage repository, usually Hadoop, that holds a vast amount of raw data in its native format until it is needed. • A place to store unlimited amounts of data in any format inexpensively • Allows collection of data that you may or may not use later: “just in case” • A way to describe any large data pool in which the schema and data requirements are not defined until the data is queried: “just in time” or “schema on read” • Complements EDW and can be seen as a data source for the EDW – capturing all data but only passing relevant data to the EDW • Frees up expensive EDW resources (storage and processing), especially for data refinement • Allows for data exploration to be performed without waiting for the EDW team to model and load the data • Some processing in better done on Hadoop than ETL tools like SSIS • Also called bit bucket, staging area, landing zone or enterprise data hub (Cloudera) www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Traditional Approaches

MONITORING AND TELEMETRY

Current state of a data warehouse

ETL

DATA SOURCES

OLTP

ERP

CRM

BI AND ANALYTCIS Star schemas, views other readoptimized structures

LOB

Well manicured, often relational sources Known and expected data volume and formats Little to no change

DATA WAREHOUSE

Complex, rigid transformations Required extensive monitoring Transformed historical into read structures

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Emailed, centrally stored Excel reports and dashboards

Flat, canned or multi-dimensional access to historical data Many reports, multiple versions of the truth 24 to 48h delay Š1998-2018 NetCom Learning


Traditional Approaches

MONITORING AND TELEMETRY

Current state of a data warehouse

ETL

DATA SOURCES

OLTP

INCREASING DATA VOLUME

ERP

CRM

DATA WAREHOUSE BI AND ANALYTCIS Star schemas, views other readoptimized structures

LOB

NON-RELATIONAL DATA

INCREASE IN TIME

Emailed, centrally stored Excel reports and dashboards

STALE REPORTING

Increase in variety of data sources

Complex, rigid transformations can’t longer keep pace

Reports become invalid or unusable

Increase in data volume

Monitoring is abandoned

Delay in preserved reports increases

Increase in types of data

Delay in data, inability to transform volumes, or react to new sources

Users begin to “innovate” to relieve starvation

Pressure on the ingestion engine

Repair, adjust and redesign ETL

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


New Approaches

DATA WAREHOUSE

BI AND ANALYTCIS

Star schemas, views other readoptimized structures

Data Lake Transformation (ELT not ETL)

Discover and consume predictive analytics, data sets and other reports

DATA SOURCES

EXTRACT AND LOAD

OLTP

ERP

NON-RELATIONAL DATA

CRM

DATA REFINERY PROCESS (TRANSFORM ON READ)

DATA LAKE

Transform relevant data into data sets

LOB

FUTURE DATA SOURCES

All data sources are considered Leverages the power of on-prem technologies and the cloud for storage and capture

OTHER REFINERY PROCESSES

Extract and load, no/minimal transform Storage of data in near-native format Orchestration becomes possible

Streaming data| accommodation www.netcomlearning.com | info@netcomlearning.com (888) 563 8266

Native formats, streaming data, big data

possible

becomes

Refineries transform data on read Produce curated data sets to integrate with traditional warehouses Users discover published data ©1998-2018 NetCom Learning sets/services using familiar tools


Hadoop and its role

What is Hadoop?  

Distributed, scalable system on commodity HW Composed of a few parts:   

DATA SERVICES

OPERATIONAL SERVICES AMBARI

FLUME PIG

OOZIE SQOOP

FALCON

MapReduce – Programming model Other tools: Hive, Pig, SQOOP, HCatalog, HBase, Flume, Mahout, YARN, Tez, Spark, Stinger, Oozie, ZooKeeper, Flume, Storm

Main players are Hortonworks, Cloudera, MapR

WARNING: Hadoop, while ideal for processing huge volumes of data, is inadequate for analyzing that data in real time (companies do batch analytics instead) www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

HBASE

LOAD & EXTRACT

HDFS – Distributed file system

MAP REDUCE

NFS

Core Services

HIVE & HCATALOG

YARN

WebHDFS

HDFS

Hadoop Cluster compute & storage

.

.

.

.

.

.

.

compute & storage

.

.

.

Hadoop clusters provide scale-out storage and distributed data processing on commodity hardware

©1998-2018 NetCom Learning


Hortonworks Data Platform 2.2

Simply put, Hortonworks ties all the open source products together (20) www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


The real cost of Hadoop

http://www.wintercorp.com/tcod-report/ www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Use cases using Hadoop and a DW in combination Bringing islands of Hadoop data together

• Archiving data warehouse data to Hadoop (move) • (Hadoop as cold storage) • Exporting relational data to Hadoop (copy) (Hadoop as backup/DR, analysis, cloud use) • Importing Hadoop data into data warehouse (copy) (Hadoop as staging area, sandbox, Data Lake) www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


IoT and real-time data

What is the Internet of Things? Things

Connectivity

Data

Analytics

IoT = sensor-acquired data www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


What is the Internet of Things (IoT)? Internet-connected devices that can perceive the environment in some way, share their data, and communicate with you. IoT is just a catch-all term for ways of using machine-generated data to create something useful. - Has it one processor and sensor to collect information - Examples: heart monitoring implants, biochip transponders on farm animals, automobiles with build-in sensors, field operation devices that assist firefighters in search and rescue - Excludes computers, tablets, and smart phones - But really, it’s in the sphere of business intelligence that IoT will really make a difference.

Cool possibilities - When a milk carton is almost empty it will ping you when you are near a store - An alarm clock that signals your coffee maker to start brewing when you wake up - An embedded chip that monitors your vital signs and notifies a medical provider if exceeds limit Gartner: 10 billion devices connected to the internet today, 26B by 2020

At some point in the future, nearly every manmade object will contain a device that transmits data! www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Modern Data Warehouse

Think about future needs: • • • • • •

Increasing data volumes Real-time performance New data sources and types Cloud-born data Multi-platform solution Hybrid architecture

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Modern Data Warehouse Defined

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Modern Data Warehouse The Dream

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


The Reality

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Federated Querying

Other names: Data virtualization, logical data warehouse, data federation, virtual database, and decentralized data warehouse.

A model that allows a single query to retrieve and combine data as it sits from multiple data sources, so as to not need to use ETL or learn more than one retrieval technology

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Federated Querying Select…

Result set

EDW SQL Server

Windows Azure HDInsight

Cloudera CHD Linux Hortonworks HDP

Relational Data

DB2 Query Model

Oracle MongoDB

NonRelational Data

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


DW and the Cloud

Can I use the cloud with my DW? • • • • •

Public and private cloud Cloud-born data vs on-prem born data Transfer cost from/to cloud and on-prem Sensitive data on-prem, non-sensitive in cloud Look at hybrid solutions

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


TDWI Best Practices Report (2015)

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


SMP vs MPP

• SMP - Symmetric

SMP Multiprocessing

•MPPMPP - Massively Parallel Processing

• • • •

Multiple CPUs used to complete individual processes simultaneously All CPUs share the same memory, disks, and network controllers (scale-up) All SQL Server implementations up until now have been SMP Mostly, the solution is housed on a shared SAN

• Uses many separate CPUs running in parallel to execute a single program • Shared Nothing: Each CPU has its own memory and disk (scale-out) • Segments communicate using high-speed network between nodes

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


DW SCALABILITY SPIDER CHART “Data Volume”

“Mixed Workload”

5 PB

100 TB

Strategic, Tactical Loads

Strategic, Tactical

10 TB

1.000

100

Batch Reporting, Repetitive Queries

Ad Hoc Queries Data Analysis/Mining

“Query complexity“

3-5 Way Joins

Weekly Load

Daily Load

5-10 Way Joins

Simple Star Multiple, MB’s Integrated Stars

SMP – Tunable in one dimension on cost of other dimensions

The spiderweb depicts important attributes to consider when evaluating Data Warehousing options.

10.000 50 TB

Strategic Near Real Time Data Feeds

“Query Concurrency“

500 TB

Strategic, Tactical Loads, SLA

“Data Freshness”

MPP – Multidimensional Scalability

▪ ▪ ▪ ▪ ▪ ▪

Big Data support is newest dimension.

Joins + OLAP operations + Aggregation + Complex “Where” constraints + Views Parallelism

Normalized GB’s

“Query Freedom“ TB’s

Multiple, Integrated Stars and Normalized

“Schema Sophistication“

“Query Data Volume“ www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Recorded Webinar Video

To watch the recorded webinar video for live demos, please access the link: https://goo.gl/rPrjZf

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


About NetCom Learning

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Recommended Courses & Marketing Assets Courses: » 20778: Analyzing Data with Power BI - Class scheduled on Jan 7 » 20775: Performing Data Engineering on Microsoft HD Insight - Class scheduled on Jan 14 » Tableau Desktop Level 1: Introduction - Class scheduled on Jan 21 » GL660 - Hadoop For Systems Administrators - Class scheduled on Feb 4

Marketing Assets: • Blog - Top AI, Big Data, & Analytics Trends to Follow in 2018 • Whitepaper - Curtailing the Talent Gap in Data Science

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Top Reasons to Master Agile Scrum and its Benefits Clean Architecture: Patterns, Practices, and Principles CEH: Understanding Ethical Hacking SQL Server 2017: Application Development Best Practices

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


Promotions

The year 2018 is coming to an end, though learning is a continuous process! Build your’s, or team’s, or department’s skills with the best training courses of 2018-19. With a range of Cloud, Security, Networking, Data & AI, Design & Multimedia, Business Application, Application Development and Business Process training at limited-time prices, you can imbibe in-demand skills while making a huge saving on the training cost. Learn More www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Follow Us On:

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

Š1998-2018 NetCom Learning


THANK YOU !!!

www.netcomlearning.com | info@netcomlearning.com | (888) 563 8266

©1998-2018 NetCom Learning


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.