Jose Director of Analytics 路 Dunn Solutions NameHernandez路 路 Title 路 Dunn Solutions
2017
Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A
Dunn Solutions Delivers Velocity to Businesses
Dunn Solutions is a digital commerce and business transformation consultancy focused on delivering velocity to our clients. Velocity is achieved by the combination of both speed and direction. Dunn Solutions helps our clients achieve speed by automating business processes and direction using advanced analytics. Our teams align with organizations to optimize their unique processes and help them discover the most profitable routes to business success.
Dunn Solutions is a full-service IT consulting firm founded in 1988
Minneapolis Delivery ďƒ— Training
Chicago Delivery
Raleigh, NC Delivery ďƒ— Training
Bangalore, India Delivery
Practice Areas
Solutions
Application Development
Analytics •
Data Lakes
Training
•
Portals
•
IoT
•
Certified SAP, Liferay, Microsoft
•
•
e-Commerce & Content Managed Websites
•
Predictive Analytics
Accountable Care Orgs (ACO’s)
•
Machine Learning
•
Corporate Legal
•
e-Commerce
•
Higher Education
•
Classroom, Onsite, Computer Based & Virtual
•
Optical Shop
•
Mobile App Development
•
Custom App Development
•
Search Engine Optimization
•
Analytics
•
Cloud - BI Platforms
•
DW & Data Integration
•
Mentoring & Custom Training
Frameworks
Selected Clients
Partnerships
Analytics Practice
Business Intelligence • • •
•
KPI’s and Metrics Dashboards Exploration and Visualization Ad Hoc Analysis & Reporting
Big Data • • • •
Hadoop, Hive, Sqoop, Spark NoSQL MapReduce
Business Analytics Data Integration • • • •
Data Mining Predictive Analytics Prescriptive Analytics R, AzureML
Data Repositories • • • • •
Data Lakes Columnar In-memory EIM (Data Integration & Data Quality Dimensional Modeling
Analytics Services in the Cloud
Analytics Services • • • •
Develop Forecasting Models Productionizing Predictive Models Retail Analytics Machine Learning
Migration Services • Migrate your Data Warehouse to the Cloud with Azure and AWS • Migrate SAP BusinessObjects deployments
Big Data Services • Data Lakes • Big Data • Integration with Data Warehouses
Data Warehousing Services • Full Lifecycle Data Warehouse Development • Extend Data Warehouse to the Cloud • Massive Data Warehouses in the Cloud • Snowflake
Microsoft Azure Consulting Services
Azure HDInsight
Azure Training Partner
Azure Machine Learning
Azure SQL Data Warehouse
Azure Stream Analytics
Azure Data Lake Azure Event Hubs
Amazon Web Services Consulting
Amazon EMR
Amazon IoT
Amazon DynamoDB
Amazon Kinesis Firehose
Amazon Lambda
Amazon Redshift Amazon Machine Learning
Dunn Solutions Global Delivery Model People
• • •
U.S. based management of teams and client communications All resources interviewed and approved by DSG leadership Right Model/Right Project • U.S. only • U.S.-- India • India only (EMEA clients)
Process
• • • • •
Mature and proven Phased approach Project sensitive Software Engineering methodology Certified Quality Processes
Technology
• •
Current technology awareness Risk awareness
Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A
Jose Hernandez, Director of Analytics
Warning! Today’s data consumer is very demanding, and rightly so! 80% of consumers need KPIs and operational data – The data warehouse is ideal for them.
10-15% of consumers do more analysis; they use the data warehouse as a source, but dive back into source systems to get more data.
The rest of the consumers do very deep data analysis – this includes data scientists. They are voracious data consumers and data creators! (IT can’t keep up with them)
Savvy Data Consumers Needs Access to information… • What information?
A: any, all, even data not though of
• When?
A: anytime, now would be great
• How much?
A: all of it, as much as there is
Analyze the data.. • What tools?
A: whatever tool is need (lots of great tools are available commercially and open source)
• What kind of data?
A: all kinds
Traditional Data Storage and Management Challenges The demand for data has never been greater! • • • •
Business users rely heavily on IT IT controls access to the data Accessing data across sources is very challenging Schema on write*
What about the enterprise data warehouse? • Does not provide just-in-time data • Requires lots of lead time • Limited to the “required” data
KPI
*Sorry I could not avoid this terminology, more in a bit….
The Data Lake Provides Relief
What is a Data Lake? The Data Lake is about democratization of information
It provides your organization a cost effective way to store information for later processing It lets your information consumers and researchers focus on finding the next big thing, not wasting time finding the data For the techies in the crowd‌. A Data Lake is a storage repository that holds a vast amount of raw data in its native format, including structured, semi-structured, and unstructured data. It also provides compute power to work the data.
Origin of the Data Lake James Dixon of Pentaho is credited with coining the phrase “Data Lake” Dixon’s analogy… Think of the data mart as bottled water; cleansed, packaged and delivered for your consumption The data lake is a man made reservoir of water in its natural state, no processing
Purpose of the Data Lake Feed the data starved users Make it easy to consume and combine Deliver the data just-in-time
Store all kinds of data (whether you have a specific need today or not), and lots of it Worry about how it’s going to be used later (schema on read)
Provide boundless playgrounds • To store data • To process data
Warning! The Data Lake does not replace the Enterprise Data Warehouse!
Comparing the Data Lake to the Data Warehouse Data Lake
Data Warehouse
• Stores everything • Unprocessed / RAW • Unstructured, semistructured, structured • Democratization of data • Shared data stewardship • Provides compute power
• Data focuses on Business Processes • Highly processed & massaged • Tabular & structured • Lots of effort on design & build • Optimized for data retrieval • Highly governed
It’s Not Just About Data Storage Storing and accessing data is only part of the Data Lake’s Purpose The Data Lake must also provide the ability to:
• Massively process data (usually in place) • Process and combine structured, semi-structured and unstructured data • Grow and shrink in both storage and compute power as needed • Onboard data very fast • Perform advanced analytics (massively process data)
Supporting Top-Down and Bottom-Up
Data Warehouses use the Top-Down approach
From generalized principles (known to be true) to a specific conclusion
Descriptive
Data Lakes use the Bottom-up approach
Predictive
From specific instance into a generalized conclusion
What Does a Data Lake Look Like?
Filling the Data Lake Types of data • Structured Data • Semi-structured Data • Unstructured Data No schema is applied at load time Data loads very fast The Data Lake is infinitely deep and can hold all data
Consuming from the Data Lake Supports many uses • Data Exploration • Staging for the Data Warehouse • Data enrichment • Predictive analytics
• Mixing disparate data • Apply schema on demand (on read) • Processing massive amounts of data
• Sandboxes for experimentation
Warning Don’t let your Data Lake turn into a data swamp! It’s not the Wild, Wild, West. Governance is still needed.
Data consumers must also be citizen data stewards.
Include metadata (data about your data)
Don’t contaminate the Data Lake with bad data (get it from trusted sources)
Data Lakes hold all data; however set and enforce boundaries.
Have a vision for your data lake; know what it will be used for.
Security and Governance Access and Security • It’s a data playground, even playgrounds have rules • Not all the data should be available to all users (confidential information that must be protected) • Is the data sensitive in nature? Are there laws governing the data that require encryption?
Data Quality • poor quality data, don’t put it in your data lake • Trust the source
Today’s Agenda Introduction to Dunn Solutions Group What is a Data Lake? You Need a Data Lake Q&A
Voracious Data Consumers Must Be Served! Getting back to the 10% of users that need all the data; the Data Scientists Your organizations success and survival depends on • • • •
Innovation Efficiency finding the next big thing getting (and keeping) an edge
The data scientists and data analysts give you the ability to do this. The data lake supports:
• • • • •
Predictive Analytics Prescriptive Analytics Machine Learning Experimentation (A/B Testing) Qualitative data analysis – help steer strategic decisions
How Does a Data Lake Complement the EDW? Your enterprise data warehouse is home to historical data and metrics that feed your KPIs, PIs based on your business processes. It does this by extracting, transforming and loading the data required to support your “known” KPIs and metrics. What if you determined that some data element was needed to provide a KPI you should have been tracking? You would add that to your data warehouse and start populating from that point forward. Too bad, wish I would have thought of this sooner, there are some historical trends that I would be able to identify
Give Super Powers to your Data Warehouse! Imaging you could go back in time! In the previous scenario you did not have the historical data because: a. It was not being captured because it wasn’t considered b. The EDW staging area is transient and typically only goes back for a short period of time The Data Lake would have given your data warehouse the ability to go back in time!
The data lake can serve as a great staging area for your EDW. It can store transactional data from the beginning of time: a. Letting you go back in time and reconstruct your EDW to incorporate the information you did not consider b. Also it would allow you to rebuild your EDW from day one in the event of a catastrophic failure
Warning! Deploying a data lake is very expensive and challenging. So don’t!
Do It in the Cloud!
Easily Scales Pay for what you use
Reliable & Trusted Supporting Tools
Delight Your Data Consumers!
You’re wondering whether the Data Lake can help you with your data starved consumers. The simple answer is yes. You don’t have to start huge (that’s the beauty of cloud based data lakes). We can get you started immediately. Your data consumers will be very happy.
Contact us info@dunnsolutions.com
Question & Answers
Watch for more webinars featuring how Data Scientist “do their thing” with Data Lakes in the cloud!
Jose Hernandez· Director of Analytics · Dunn Solutions