Big Data – Que es y porque es importante para el Sector Financiero? Septiembre, 2012
Big Data: Massive Data Growth Last 5 Years And 80% is typically not in traditional enterprise data warehouses § Digital is the primary Complex, Unstructured
driver of new data
§ 80% of this new digital data
is complex to analyze in its raw structure Relational
§ Digital data is growing at
62% annually vs. structured data at 22%
2
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Source: An IDC White Paper. As the Economy Contracts, the Digital Universe Expands. May 2009.
New Forms of Data
Extending Big Data Beyond the EDW
• Long strings of encoded page clicks, sessions, and actions • Entry points to a website tracked by cookie strings
Aster Data
Big Data elements • Social connections
Raw formats: Lengthy text strings, binary, blobs, social graphs Rapid updates, data refreshes: Online click stream, stock orders, social connections/friends High volume: Embedded processing to eliminate data movement
3
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
• e.g. One stock order split into 100s of transactions over days/ weeks • e.g. ACH transactions, Service/Customer Support records, insurance claims • Wide tables with highly descriptive textual strings
New Analytics Are Needed to Gain Big Data Insights Data Size with Multi-Structure Forms Require New Analytic Approaches
Big Data Analytics • Deliver Path, Pattern Matching, Time Series & Graph Analysis • Iterative Discovery • Use of SQL & Non-SQL and Techniques (MapReduce)
4
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
“CIOs face significant challenges in addressing the issues surrounding big data…
New technologies and applications are emerging …and should be investigated.”
Source: CEO Advisory: ‘Big Data’ Equals Big Opportunity, Gartner, 31 March 2011.
What is Big Data?
5
Confidential and proprietary. Copyright Š 2011 Teradata Corporation.
What is Big Data? • Big Data = Large scale (data volume) analytics
6
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
What is Big Data? • Big Data = Large scale (data volume) analytics ü MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
• Big Data = Emerging new data types
7
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
What is Big Data? • Big Data = Large scale (data volume) analytics ü MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
• Big Data = Emerging new data types ü New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.
8
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
What is Big Data? • Big Data = Large scale (data volume) analytics ü MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
• Big Data = Emerging new data types ü New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.
• Big Data = New (non-SQL) analytics
9
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
What is Big Data? • Big Data = Large scale (data volume) analytics ü MPP SQL databases have delivered large scale analytics for over a decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
• Big Data = Emerging new data types ü New multi-structured data types with unknown relationships that require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.
• Big Data = New (non-SQL) analytics ü New Analytic Frameworks that provides parallel processing on semi-structured data. Leveraging the power of MapReduce (Programmatic Languages; Java, Python, Perl, C, C++) 10
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Big Data Challenges are More Than Data Size
The Four Axes of Big Data
“CIOs face significant challenges in addressing the issues surrounding big data… New technologies and applications are emerging (examples include Hadoop and MapReduce) and should be investigated to understand their potential value.”
Source: CEO Advisory: ‘Big Data’ Equals Big Opportunity, Gartner, 31 March 2011.
11
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules Modules
Path Analysis Discover patterns in rows of sequential data
Business-ready SQL-MapReduce Functions • nPath: complex sequential analysis for time series analysis and behavioral pattern analysis • Sessionization: identifies sessions from time series data in a single pass over the data • Attribution: operator to help ad networks and websites to distribute “credit” • Histogram: function to provide capability of generating • Decision Trees: Native implementation of parallel random forests.
Statistical Analysis
• Approximate percentiles and distinct counts: calculate percentiles and counts within specific variance
High-performance processing of common statistical calculations
• Regression: performs linear or logistic regression between an output variable and a set of input variables
• Correlation: calculation that characterizes the strength of the relation between different columns
• Averages: calculate moving, weighted, exponential or volumeweighted averages over a window of data
Relational Analysis
• Graph analysis: finds shortest path from a distinct node to all other nodes in a graph • Tokenization: splits strings into individual words to assist text processing
Discover important relationships among data Confidential and proprietary. Copyright © 2011 Teradata Corporation. 12
Ease of Development and Reuse Analytic Foundation : 50+ out-of-the-box modules Modules
Text Analysis Derive patterns in textual data
SQL-MapReduce Analytic Functions • Text Processing: counts occurrences of words, identifies roots, & tracks relative positions of words & multi-word phrases • Text Partition: analyzes text data over multiple rows • Levenshtein Distance: computes the distance between two words • k-Means: clusters data into a specified number of groupings
Cluster Analysis Discover natural groupings of data points
• Canopy: partitions data into overlapping subsets within which kmeans is performed • Minhash: buckets highly-dimensional items for cluster analysis • Basket analysis: creates configurable groupings of related items from transaction records in single pass • Collaborative Filter: predicts the interests of a user by collecting interest information from many users
Data Transformation
• Unpack: extracts nested data for further analysis
Transform data for more advanced analysis
• Multicase: case statement that supports row match for multiple cases
13
• Pack: compress multi-column data into a single column • Antiselect: returns all columns except for specified column
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Enterprise Discovery Architecture Data Sources
ETL
Non relational Data
Discovery
Discovery Apps
Aster Discovery Platform
Fraud Discovery
MultiStructured Data
Structured Data
OLTP DBMS’s
14
SAS In-DB Modeling
Teradata IDW
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Data Scientist
Customer Discovery Business Insight Discovery
ETL
Users
SAS Analyst
R Analyst
R In-DB BI Tools
Business Analyst
Financial POC Data Sets Analyzed
15
Confidential and proprietary. Copyright Š 2011 Teradata Corporation.
Events Preceding Account Closure
16
Confidential and proprietary. Copyright Š 2011 Teradata Corporation.
Events Preceding Account Closure
SELECT * FROM nPath ( ON (…) PARTITION BY sba_id ORDER BY datestamp MODE (NONOVERLAPPING) PATTERN ('(OTHER_EVENT|FEE_EVENT)+') SYMBOLS ( event LIKE '%REVERSE FEE%' AS FEE_EVENT, event NOT LIKE '%REVERSE FEE%' AS OTHER_EVENT) RESULT (…) ) n;
17
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Closed Accounts Fee reversal seems to be a “Signal”
Aster in Retail Banking: “Last Mile” Marketing Cross-Channel Customer Interactions
Challenge • Know the “last mile” of a decision • Data Mining tools predict probability but do not ID the “last mile”
With Aster
17,000 Customers, 1 Month
34,000 Branch Visits
25,000 ATM Sessions
• SQL-MapReduce listens and predicts the “last mile” - Identifies all interaction patterns prior to acquisition or attrition
Business Impact • 10-300x less effort to pinpoint a customer in the “last mile”
5,000 Call Center Sessions
43,000 E-mails 18
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
92,000 Online Sessions
Aster MapReduce: Understanding the “Last Mile”
Jan 5: Reverse Fee Request
Jan 10: Request Made Again
Jan 7: Request Made Again
Jan 20: Account Closed
Jan 15: Request Made Again
What if I knew that this customer was likely to leave? I could… • Apologize • Offer an explanation • Reverse the $5 fee “It takes 3x more to acquire a customer than to retain one”
19
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Aster Removes the “Last Mile” Technical Challenge Teradata Aster MapReduce Platform
Completes the customer profile with digital data • Adds web, social, call center data
custID
channel1
…
channeln
• Stitches rows together by customer in a time-ordered view
10001
Online Banking
…
Account Close
20001
Call Center
…
Branch Visit
Total # of Customers
channel1
…
channeln
Online Banking
…
Account Close
Bank Branch Visit
…
Account Close
Scans all customer record patterns in a single pass • No need to define patterns in advance • Fully parallelized for SQL-MapReduce performance
Summarizes output for business exploration
• Rank orders the most popular paths and 35 yet represent the long tail too
26
20
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Visualizing Aster nPath Analysis Bump Chart
Funnel Chart
Color Chart
Time Plot
Big Data Business Impact: Example Use Cases Some Examples Use Case
Business Description
Digital Marketing Optimization
Analysis of user behavior, intent, and actions across search, ad media and web properties to increase the ROI for digital media marketing efforts.
Social Network and Relationship Analysis
Uncover deep social relationships and interactions hidden in raw transaction data, online behavior, and social networks in order to gain behavioral insights, target influencer marketing, and analyze virality within the social network.
Fraud Detection and Prevention
On-the-fly analysis of transactions, interactions, and systems to detect, block, and prevent malicious users, networks, and programs engaged in fraud.
Machine Data Analysis
Analysis of sensor, location, and machine to machine communications to optimize operational efficiencies.
22
Confidential and proprietary. Copyright Š 2011 Teradata Corporation.
New Kinds of Analysis Graph Analysis Indirect Relationship
Direct Relationship
Social and Relationship Analysis
Social Link
Uncover deep social relationships and interactions hidden in raw transaction data, online behavior, and social networks that can be used for behavioral analysis, influencer marketing, virality analysis, crowd sourcing, and similar applications.
Person
Pattern Matching Analysis Discover patterns in rows of sequential data {user, page, time}
Weblogs Smart Meters
Click 1
Click 2
Click 3
Click 4
Reading 2
Reading 3
{device, value, time}
Sales Transactions
Reading 1
Reading 4
{user, product, time} Purchase 1
Purchase 2
Purchase 3
Purchase 4
{stock, price, time}
Stock Tick Data
Tick 1
Tick 2
Tick 3
Analysis of user behavior, intent, and actions across search, ad media and web properties to create an interaction map of user behavior across digital media assets and drive increased ROI for digital media marketing efforts.
Tick 4
{user, number, time}
Call Data Records
23
Call 1
Call 2
Call 3
Call 4
Call 5
Confidential and proprietary. Copyright Š 2011 Teradata Corporation.
New: Aster MapReduce Appliance • High Performance Analytics
- Powerful solution for Big Data Analytics using patented SQLMapReduce framework - Massively parallel processing architecture optimizes performance
• Appliance Solution
- Purpose-built integrated hardware / software solution - Nodes, software, storage, and networking in a single rack - 8 nodes per cabinet, scalable to 6 cabinets and over 200 TB of customer data - Delivered ready to run at a competitive price point - Leading edge Intel processors for fast scans and performance
• Enterprise Ready
- Integrated with Teradata Warehouse to expand analytical capabilities - ODBC and JDBC support for major business intelligence, visualization, and ETL tools - Native Hadoop connectivity - Management tools for monitoring system health
24
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
The Broad Business Impact of Fraud
Source: Ponemon Institute, “Consumers’ Reaction to Online Fraud”, April 2011
• Direct revenue impact - Cost of uncovering fraud - Cost of correcting fraud - Cost of regulatory penalties
25
• Damage to customer satisfaction - Customers’ decisions influenced by perceived risk of fraud - Fraud creates significant indirect impact on revenue
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
• Risk of regulatory penalties - Consumer protection regulations - Privacy regulations
Challenges in Addressing Fraud • Timeliness of detection critical - Time to detection determines cost of correction
• Fraud continues to adapt and evolve - Techniques rapidly change to adapt to new monitoring techniques
• Technology enables new means of fraud - Automated gaming bots - Online identity threat
26
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Consumer Concerns • 42% of survey respondents believe they have been victims of online fraud • 57% of respondents do not believe online companies are taking enough precautions to protect them against online fraud Ponemon Institute “Consumers’ Reaction to Online Fraud” April 2011
Implementing Solutions for Fraud Detection Rules for real-time monitoring, updated based on analysis
Real-time processing engine
Aster Data Analytic Platform
Fraud Models
Teradata Integrated Data Warehouse Multi-structured Data • Web logs • Text fields • …
Relational Data • Transactions • Means of payment • Customer profile
Exploration and investigation of data to identify relationships indicative of likely fraud 27
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Transactions
Payments
Customer Records
Returns
…
Models in data warehouse updated for ongoing scoring processes
Bringing Together Multi-Structured Data for Fraud Detection and Prevention Social media data
Payment records Location information
ACH transaction data
CRM records
Stock trade transactions
Audit records
Aster Data
Account activity
Purchase history Web log data Purchase records User profile information
Claim forms
Adjuster notes
28
Confidential and proprietary. Copyright Š 2011 Teradata Corporation.
Performing Rich Analytics to Detect Fraud • Identify suspect data - False identities, multiple profiles, invalid credit cards, …
• Identify suspect relationships - Collusion, transaction structure, money transfers, …
• Identify suspect patterns - Order velocity, purchasing behavior, claims submissions, …
Example Analytic Tools
Graph and network analysis
{user, page, time}
Web Logs
Pattern & time series analysis
Click 1
Insurance Claims
Claim 1
Transactions
Stock Tick Data
All at massive scale across multiple data sources and types 29
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Click 2
Click 3
Click 4
{user, payee, date} Claim 2
Claim 3
Claim 4
{user, product, time} Purchase 1 Purchase 2 Purchase 3 Purchase 4 {stock, price, user, time}
Returns
Trade 2 Trade 3 Trade 1 {customer, SKU, price, date} Return 1 Return 2
Trade 4
Return 3 Return 4
Using Pattern Detection to Identify Fraud • Manual sampling approaches insufficient - Find only small sample of fraud
Event Pattern Detection
- Highly-inefficient approach Event
• Automation of fraud detection critical to improving detection
Event
Event
Event
Event
- Need to identify unusual patterns indicative of likely fraud - Need to rapidly evolve algorithms to improve accuracy and catch new types of fraud
• Requires unique capabilities - SQL approach requires knowing pattern in advance - SQL approach requires highly inefficient multiple data scans & selfjoins to find patterns 30
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Event
Example events: • Purchase • Return • Claim submission • Game play • Securities trade
Event
Event
Pattern Analysis with Aster Data Uncovering suspicious patterns in sequences of events Aster Data Capabilities - nPath pre-packaged SQL-MapReduce function for finding sequences of events - Identify patterns across diverse types of events and interactions - Find all patterns that connect specified events
nPath Pattern Analysis
Benefits - Pattern detection via a single pass over the data for rapid results - Allows you to understand any trend that needs to be analyzed over a continuous period of time - Easily modify analysis without the complex code and significant changes required by SQL approaches
31
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Examples: • Identify significant changes in risk scores • Find unusual patterns in stock trading • Uncover suspicious sequences in online games
Graph Analysis for Fraud Detection & Prevention • Identify complex networks of relationships - Users to transactions - Transactions to means of payment - Identities to individuals - …
• Fraud identified by graph relationships - Clusters of connections - Patterns of activity between connections
• Understand impact of fraud - Trace flow of money and goods - Identify users impacted by fraud
32
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Examples
Pattern Match Analysis: SQL-MapReduce for Fraud Detection & Prevention Analyzing Play Patterns for Fraudulent Activity Business Goal • Detect and disrupt various types of fraud, like collusion Analytic Application in Aster Data • Identify collusion targeted at money laundering schemes • Monitor play between any two players who are “known” to one another– data captured in binary format to keep up with play • Tag as unusual a pattern of play where player 1 loses an unusual number of time to player 2 across game sessions— nPath performs advanced path and pattern matching Business Impact • Revenue protection: 115x faster fraud & pattern detection • Site integrity: increase player count & market share • Brand trust by reducing fraud: increases revenue/player Other Aster Data Applications at Full Tilt Poker • Follow-the Money fraud tracing • Player dashboard 34
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
• From 1 week to trace fraud to 15 minutes using SQLMapReduce • Found 5 new fraud patterns that were previously not detectable using Java & SQL-based analytics • Full granular detail; from 1200 hands/second analyzed to 140,000 hands/second • 60X faster queries; 90 minutes to 90 seconds
Fraud Analytics: Detecting and Preventing Fraud in Online Retail Transactions Multi-Dimensional Analysis for Fraud Detection • Provider of cloud-based fraud detection solutions for e-commerce • Utilize wide variety of data to build a “contextual score” that helps identify fraudulent users • Adaptable rules and scoring enable rapid, agile evolution as fraud evolves
Business Goal: • Offer uniquely accurate and adaptable fraud detection solutions for e-commerce Aster Data Unique Value • Support interactive exploration of data to discover and characterize fraud patterns Business Impact: • Detect fraud: rapidly identify transactions and interactions that are likely to be fradulent • Prevent fraud: block fraudulent activity before it occurs based on advanced analysis • Focus on real users: understand which users and interactions represent genuine customers and prospects worth focusing on for conversion
35
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example: e-Commerce Transaction Fraud Uncover fraudulent users and transactions Challenges - Massive volumes of data about users, devices, and activity to monitor
Examples
- Need to find complex patterns in and relationships in data - Need for graph and path analysis to understand data - Need frequent analysis to rapidly evolve detection rules and algorithms
Aster Data Value - Ability to store and process massive volumes of diverse multi-structured data - Massive scalability accelerates exploring and testing patterns on large data sets - Power and flexibility of capabilities for pattern analysis and graph processing simplifies detection of complex patterns 36
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
• Card-not-present fraud: identify transactions patterns indicative of credit card fraud • Returns fraud: uncover patterns that are highly correlated with fraudulent returns • Collusive bidding: detect collusive bidding behavior on auction sites by detecting suspicious patterns of activity
Example: Claims Fraud Identify patterns in claims and payments that indicate fraud Challenges
Examples
• Massive volumes of claims information • Claims records include both relational and multi-structured data • Unable to effectively combat fraud by manual sampling
Aster Data Value • Ability to process and analyze multistructured and relational data together • Massively-scalable processing of data and analytics • Flexible, powerful tools to enable pattern analysis, text processing, and graph analysis
• Insurance claims fraud: identify patterns and relationships indicative of claims likely to be fraudulent • Medical claims fraud: identify networks of interactions and patterns indicative of fraud • Payment fraud: find patterns that uncover fraudulent payment schemes
37
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Example: Gaming Fraud Uncover collusion, gaming bots, money laundering Challenges
Examples
• Massive volumes of game play data from massive numbers of users and gaming sessions • Diverse multi-structured data formats (encoded game play data, text data, relational data, …) • Constantly evolving fraud techniques • Legal requirements for traceability and monitoring
Aster Data Value • Ability to load, transform, and process diverse data • Rich tools for pattern analysis on massive volumes of data • Capabilities to enable identification and monitoring of complex networks of relationships 38
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
• Collusion detection: uncover networks of conspirators colluding to defraud other players • Botnet detection: detect machine players in games based on playing patterns • Anti-money-laundering: track flow of money in money laundering through gaming
Consumer Financial Services Example: Online Fraud Analysis Monitor online consumer behavior
Online Fraud Analysis
• Monitor log-in and navigation behavior online • Evaluating click stream transactions often includes analysis of non-relational log files
Rule out false negatives and positives • Over 40 known click stream fraud patterns that can be detected, e.g. frequent normal paths to creating a wire transfer (creation of transfer without checking balance) • Identifying fraudulent activity often requires looking for patterns in behavior, e.g. good user logged in but in same time window a stalker also logs in and tries 3 different wire transfers of differing amounts before succeeding
Replay user sessions • Once fraudulent activity is detected, the support team often requires the replaying of the activity to discuss the issue with the customer
39
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Characteristics • Raw web log data • Complex pattern analysis • Clustering analysis
Capital Markets Example: Near Real-Time Fraud Pattern Matching for Trader Surveillance Identify suspicious trading activity • Use time-series based pattern matching to identify unusual patterns of trade activity intra-day
Trader Surveillance
• Potential patterns include front-running, market manipulation, non-compliant positions that expose the firm to undue risk
Combine trading patterns with diverse data • Introspect communications over e-mail and chat channels for corroborating evidence of trade misbehavior
Streamline investigations with iterative, hypothesis-driven query interface • Save valuable time by conducting ad-hoc analysis of trades, cases, alerts, account data
40
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Characteristics • High rate of new data generation • Granular data • Simultaneous load & query
Healthcare Payor Example: Enhanced Fraud Detection Analyzing claim, demographic and web data • Business Goal
- Proactively identify fraud at first notice of loss; fraud rings, collusion or falsified claims - Fraud identification continues to evolve for improved effectiveness
• Solution
- Pattern analysis • Understand relationships among parties (physicians, consumers, organizations), locations, time of filing, frequency and circumstances • Detect potential for computer generated claims - Graph analysis of cohort networks - Use MapReduce to structure social media for additional insight
Joint Differentiators • Reduce complexity of analysis with MapReduce • Pattern detection and nPath analysis against physician, consumer, claim, channel and location data • Geospatial Analysis
• Business impact
- Identify national fraud rings - Limit or reduce loss caused by fraudulent claims
Raw web logs
Key Characteristics: Granular Channel Data
41
Sessionization
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
nPath Pattern Matching
Example: Healthcare Fraud Waste & Abuse Identify fraud patterns to minimize false positives Challenges • Interesting data is highly granular • Patterns can not be identified unless they are part of the software program; human analysis can minimize the false positives
With MapReduce • Timely and unique identification of unusual billing patterns • Enables more efficient analysis by putting the tools in the hands of the Special Investigative Unit (SIU)
Example: Claims Data Billing Prov
Service Date
MembID
Serv Loc
123
1/1/2010
10001
Orlando, FL
135
1/1/2010
10001
Miami, FL
123
1/1/2010
10001
Orlando, FL
135
1/15/2010
10001
Miami, FL
123
1/1/2010
10001
New York, NY
234
12/24/2010
10002
Orlando, FL
345
12/24/2010
10003
Miami, FL
Impact • Early detection minimizes payments of fraudulent claims, (estimates range from $125 - $800 billion in losses) • Statistics show that $11 is saved on every $1 spent on fighting fraud* * Source: healthcare-informatics.com; Identifying Fraud, Barry Johnson, DDS 42
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
MapReduce simply identifies “Fraud Patterns” ALL interactions patterns evaluated in a single pass Billing Prov
Prepares multi-structured data
• No need to define patterns in advance • Fully parallelized for top performance using MapReduce where SQL falls down
MapReduce Platform
Service Loc
234 123 123
Step 1: Pivot data via nPath SQL-MapReduce Billing Prov
Service Loc1
…
Service Locn
date1
…
daten
123
Orlando, FL
…
New York, NY
1/1/2010
…
1/1/2010
135
Miami, FL
…
Miami, FL
1/1/2010
…
1/15/2010
Summarize output for business exploration • Rank order the most popular paths and yet represent the long tail too 43
Member ID
123
• Stitches rows together by customer in a time-ordered view
Scans all records to product a complete set of paths
Service Date
Confidential and proprietary. Copyright © 2011 Teradata Corporation.
Step 2: Run nPath SQLMapReduce Java Logic Billing Prov
Service Location
123
More than one service location
135
Single service location
Teradata Aster MapReduce Platform Advantages 1
Faster Combining MapReduce with RDBMS for the best of both worlds
§ Faster exploration of both multi-structured and structured data
2
Easier to Use - Investigative Analytics at Scale SQL and beyond, SQL-MapReduce framework + pre-built analytics, visual IDE
Useable by any SQL-savvy analyst or BI toolset
3
4
44
Easier manageability and ecosystem fit Enterprise -class manageability, extensive ecosystem integration
Plugs into existing IT investments without specialized skill sets
Lower total cost of ownership Better performance and ecosystem support = less hardware & expensive staff
§ Leverage what you have and don’t over-engineer the problem
Confidential and proprietary. Copyright © 2011 Teradata Corporation.