h t t p : / / w w w. a n a ly t i c s - m a g a z i n e . O R G
Driving Better Business Decisions
novem ber / de c em ber 2014 Brought to you by:
Analytics vs. Fraud • Big data takes a big bite out of crime • Real-time fraud detection in the cloud
ALSO INSIDE: • Goal-driven analytics • Decision analysis survey • Healthcare analytics Executive Edge Durjoy Patranabish, senior V.P. at Blueocean Market Intelligence, on unleashing big data via machine learning
Ins ide story
Fighting fraud with analytics A day doesn’t seem to go by without a new report of fraud via stolen identity and misappropriated credit card numbers, Internet and phone scams, good, old-fashioned employee embezzling and officialdom corruption, you name it. Is the world really crawling with fraudsters? Perhaps so. According to the Report to the Nations on Occupational Fraud and Abuse – a 2014 global fraud study – the typical organization loses 5 percent of revenues each year to fraud, which, if applied to 2013 estimated gross world product, translates to a potential projected global fraud loss of nearly $3.7 trillion. That’s some serious malfeasance. The Report also reports that 22 percent of fraud cases result in losses of at least $1 million, and many of the victims – individuals and organizations, large and small – never fully recover or, in the case of some companies, go out of business. Two articles in this issue of Analytics take a closer look at the enormous worldwide problem of fraud and explain how big data, analytics and pattern recognition are effective tools in curbing the $3.7 trillion crime. In their article “Employing big data and analytics to reduce fraud,” Drew Carter and Stephanie Anderson of AlixPartners point out that fraud doesn’t play favorites; it’s a multi-industry problem, noting 2
|
a n a ly t i c s - m a g a z i n e . o r g
that retail, transportation, manufacturing and telecom are all prone to fraud, along, of course, with the banking and financial sectors. Carter and Anderson go on to spell out the keys for employing analytics for proactive fraud monitoring. Warns Carter and Anderson: “Sinister schemes one can’t even imagine are happening because no one knows to look for them. Once they are uncovered and observed, their patterns can be “built into” rules-engines.” Meanwhile, in his article “Real-time fraud detection in the cloud,” Saurabh Tandon of Mu Sigma explores real-time fraud detection in the cloud, and how his company built a fraud detection framework that had up to 250 unique variables pertaining to the demographic and financial history of the financial client’s customers. Writes Tandon: “A cloud-based ecosystem can enable users to build an application that detects, in real time, fraudulent customers based on their demographic information and prior financial history.” Analytics alone can’t stop the worldwide crime spree, but it’s clearly entered the anti-fraud fight, and more and more organizations have seen it packs a powerful punch.
– Peter Horner, editor peter.horner@ mail.informs.org w w w. i n f o r m s . o r g
OPTIMIZE YOUR BUSINESS WITH UNPRECEDENTED SPEED IDEA
IN A FEW HOURS
MISSION CRITICAL ENTERPRISE APP
IN A FEW MONTHS
PUBLISHED INSTANTLY TO YOUR ENTERPRISE OPTIMIZATION APP STORE
PROOF OF CONCEPT
IN A FEW DAYS
OPTIMIZATION APP
IN A FEW WEEKS
To learn more about AIMMS Optimization Apps, visit aimms.com. info@aimms.com | +1 425 458 4024
C o n t e n t s
DRIVING BETTER BUSINESS DECISIONS
november/december 2014 Brought to you by
Features 28 Goal-driven analytics Big buzzkill: size and success don’t correlate. Big data needs advanced analytics, but analytics does not need big data. By Eric A. King
40
40 Big data, analytics fight fraud Fraud doesn’t play favorites; it’s a multi-industry problem. How to employ analytics for effective, proactive fraud monitoring. By Drew Carter and Stephanie Anderson 48 Real-time fraud detection in the cloud Detecting fraud among online banking customers in near real time by running a combination of learning algorithms on a data set. By Saurabh Tandon
54
54 Corporate profile: AOE Analytics Operations Engineering, Inc. applies advanced quantitative methods to solve challenging operations problems. By Mitchell Burman and Lauren Berk 62 Software survey: Decision analysis Decision tools continue to evolve, providing analysts with more horsepower to transform vast amounts of information into better decision-making. By William M. Patchak
62
4
|
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
ANALYTICS IN YOUR BROWSER Optimization and Simulation in Excel Online and Google Sheets
Advanced Analytics built on Business Intelligence is moving to the Web. But you don’t have to learn a complex new language to create your analytic models. Start and finish today in Excel Online or Google Sheets.
Use Premium Solver Platform for industrial-strength conventional optimization – from linear, quadratic and mixed-integer programming to nonlinear, non-smooth and global optimization.
Use our Analytics Apps in your Web Browser
Use Risk Solver Platform for super-fast Monte Carlo simulation, decision trees, simulation optimization, robust optimization and stochastic programming.
Solve all types of optimization models with Frontline’s free Solver App, for Excel Solver features in Excel Online and Google Sheets. To solve large-scale optimization models, upgrade to our Premium Solver App. Run Monte Carlo simulation models with Frontline’s Risk Solver App – free for limited-size models, and free without size limits for desktop Risk Solver Pro, Risk Solver Platform, and Analytic Solver Platform users. Use our free XLMiner Data Visualization App to explore your data in Excel Online – including Pivot Table results drawn from the powerful multidimensional data model in Office 365 Power BI and desktop Excel 2013.
Use XLMiner Platform for forecasting and data mining, from time series methods to classification and regression trees, neural networks and association rules. Or use Analytic Solver Platform to do it all in Excel – building on Microsoft’s Power Pivot, Power Query, and Power View add-ins to work with 100 million row data sets from enterprise and cloud databases.
Find Out More, Download Your Free Trial Now Visit www.solver.com/apps to learn more, register and download a free trial – or email or call us today.
Powerful Analytics for Your Laptop or Desktop You can create your models in your web browser, but there’s nothing like a big screen and a fast CPU plus software to develop sophisticated analytic models. And we make it easy to publish your analytic models from desktop Excel to either Excel Online or Google Sheets.
The Leader in Analytics for Spreadsheets and the Web Tel 775 831 0300 • info@solver.com • www.solver.com
DRIVING BETTER BUSINESS DECISIONS
Register for a free subscription: http://analytics.informs.org INFORMS Board of Directors President Stephen M. Robinson, University of Wisconsin-Madison President-Elect L. Robin Keller, University of California, Irvine Past President Anne G. Robinson, Verizon Wireless Secretary Brian Denton, University of Michigan Treasurer Nicholas G. Hall, Ohio State University Vice President-Meetings William “Bill” Klimack, Chevron Vice President-Publications Eric Johnson, Dartmouth College Vice President Sections and Societies Paul Messinger, CAP, University of Alberta Vice President Information Technology Bjarni Kristjansson, Maximal Software Vice President-Practice Activities Jonathan Owen, CAP, General Motors Vice President-International Activities Grace Lin, Institute for Information Industry Vice President-Membership and Professional Recognition Ozlem Ergun, Georgia Tech Vice President-Education Joel Sokol, Georgia Tech Vice President-Marketing, Communications and Outreach E. Andrew “Andy” Boyd, University of Houston Vice President-Chapters/Fora David Hunt, Oliver Wyman
70
74
Departments
2 Inside Story 8 Executive Edge 12 Analyze This! 16 INFORMS Initiatives 20 Forum 24 Healthcare Analytics 68 Conference Preview 70 Five-Minute Analyst 74 Thinking Analytically
Analytics (ISSN 1938-1697) is published six times a year by the Institute for Operations Research and the Management Sciences (INFORMS), the largest membership society in the world dedicated to the analytics profession. For a free subscription, register at http://analytics.informs.org. Address other correspondence to the editor, Peter Horner, peter.horner@mail.informs.org. The opinions expressed in Analytics are those of the authors, and do not necessarily reflect the opinions of INFORMS, its officers, Lionheart Publishing Inc. or the editorial staff of Analytics. Analytics copyright ©2014 by the Institute for Operations Research and the Management Sciences. All rights reserved.
6
|
INFORMS Offices www.informs.org • Tel: 1-800-4INFORMS Executive Director Melissa Moore Meetings Director Laura Payne Communications Director Barry List Headquarters INFORMS (Maryland) 5521 Research Park Drive, Suite 200 Catonsville, MD 21228 Tel.: 443.757.3500 E-mail: informs@informs.org
Analytics Editorial and Advertising Lionheart Publishing Inc., 506 Roswell Street, Suite 220, Marietta, GA 30060 USA Tel.: 770.431.0867 • Fax: 770.432.6969 President & Advertising Sales John Llewellyn john.llewellyn@mail.informs.org Tel.: 770.431.0867, ext. 209 Editor Peter R. Horner peter.horner@mail.informs.org Tel.: 770.587.3172 Assistant Editor Donna Brooks donna.brooks@mail.informs.org Art Director Jim McDonald jim.mcdonald@mail.informs.org Tel.: 770.431.0867, ext. 223 Advertising Sales Sharon Baker sharon.baker@mail.informs.org Tel.: 813.852.9942
AnAlytic Solver PlAtform easy to Use, industrial Strength Predictive Analytics in excel
How can you get results quickly for business decisions, without a huge budget for “enterprise analytics” software, and months of learning time? Here’s how: Analytic Solver Platform does it all in Microsoft Excel, accessing data from PowerPivot and SQL databases. Sophisticated Data Mining and Predictive Analytics Go far beyond other statistics and forecasting add-ins for Excel. Use classical multiple regression, exponential smoothing, and ARIMA models, but go further with regression trees, k-nearest neighbors, and neural networks for prediction, discriminant analysis, logistic regression, k-nearest neighbors, classification trees, naïve Bayes and neural nets for classification, and association rules for affinity (“market basket”) analysis. Use principal components, k-means clustering, and hierarchical clustering to simplify and cluster your data.
Help and Support to Get You Started Analytic Solver Platform can help you learn while getting results in business analytics, with its Guided Mode and Constraint Wizard for optimization, and Distribution Wizard for simulation. You’ll benefit from User Guides, Help, 30 datasets, 90 sample models, and new textbooks supporting Analytic Solver Platform. Surprising Performance on Large Datasets Excel’s ease of use won’t limit what you can do – Analytic Solver Platform’s fast, accurate algorithms rival the best-known statistical software packages. Find Out More, Download Your Free Trial Now Visit www.solver.com to learn more, register and download a free trial – or email or call us today.
Simulation, Optimization and Prescriptive Analytics Analytic Solver Platform also includes decision trees, Monte Carlo simulation, and powerful conventional and stochastic optimization for prescriptive analytics.
Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com
Exe cu tive EDGE
Machine learning unleashes big data potential The art of putting fragmented, often disconnected data sources together to generate actionable insights for the enterprise.
By Durjoy Patranabish and Sukhda Dhal (l-r)
8
|
Big data has no doubt created a big business buzz, and organizations and thought leaders are constantly talking about big data, yet many critics note that the widespread application of big data has not matched all the hype. Yes, big data helps unveil millions of facts about consumer behavior and trends. Leveraging emerging big data sources and types to gain a much more complete understanding of customer behavior – what makes them tick, why they buy, how they prefer to shop, why they switch, what they’ll buy next, what factors lead them to recommend a company to others – is strategic for virtually every company. But have data science organizations built the capabilities to truly harness big data? It’s clear that traditional predictive analytical models will be unable to work on big data, as these modeling tools need human intelligence to work across the data sets. They definitely make the analysis robust and quick, but only for the structured data sets. Big data, however, is mostly generated via unstructured formats such as images, comments on portals,
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
AnAlytic Solver PlAtform from Solver to full-Power Business Analytics in excel
The Excel Solver’s Big Brother Has Everything You Need for Predictive and Prescriptive Analytics From the developers of the Excel Solver, Analytic Solver Platform makes the world’s best optimization software accessible in Excel. Solve your existing models faster, scale up to large size, and solve new kinds of problems. From Linear Programming to Stochastic Optimization Fast linear, quadratic and mixed-integer programming is just the starting point in Analytic Solver Platform. Conic, nonlinear, non-smooth and global optimization are just the next step. Easily incorporate uncertainty and solve with simulation optimization, stochastic programming, and robust optimization – all at your fingertips.
Comprehensive Forecasting and Data Mining Analytic Solver Platform samples data from Excel, PowerPivot, and SQL databases for forecasting and data mining, from time series methods to classification and regression trees, neural networks and association rules. And you can use visual data exploration, cluster analysis and mining on your Monte Carlo simulation results. Find Out More, Download Your Free Trial Now Analytic Solver Platform comes with Wizards, Help, User Guides, 90 examples, and unique Active Support that brings live assistance to you right inside Microsoft Excel. Visit www.solver.com to learn more, register and download a free trial – or email or call us today.
Ultra-Fast Monte Carlo Simulation and Decision Trees Analytic Solver Platform is also a full-power tool for Monte Carlo simulation and decision analysis, with a Distribution Wizard, 50 distributions, 30 statistics and risk measures, and a wide array of charts and graphs.
Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com
Exe cu tive EDGE
telephonic conversations, e-mail communications, videos and the like. This creates a maze of data that cannot be easily handled with traditional models, resulting in a waste of time and human effort. So what can tame big data and put it to good use? Machine Learning Applying machine learning algorithms on big data is the art of putting all fragmented and often disconnected data sources together to generate actionable insights for the enterprise. To gain that 360-degree view of the customer, organizations need to be able to leverage internal and external sources of information to assess customer sentiment. As more and more organizations are stepping out of the traditional boundaries of the enterprise to understand the impact of the environment on their business, the number of data sources keeps multiplying. Social media channels, websites, automatic censors at the workplace and robotics, for instance, are producing a plethora of structured, unstructured and semi-structured data. Machine learning weaves together the two budding trends of 2014 – realtime data collection and automation of business processes. Bringing in the computational power, machine learning runs on the machine scale. The number 10
|
a n a ly t i c s - m a g a z i n e . o r g
of variables and factors that are taken into consideration by this methodology is unlimited. Machine learning brings in the capability to cover data from varied channels, such as social media, websites, automatic censors at the workplace and robotics. The job of data scientists here becomes to oversee what type of variables enter the models, adjust model parameters to get better fits and finally interpret the content of models for decision-makers. How and When to Introduce Machine Learning Machine learning is ideally suited to the complexity of dealing with disparate data sources and the huge variety of variables and amounts of data involved. The more data fed to an ML system, the more it can learn, resulting in higher quality insights. Keep in mind that big data can only unfold incremental insights. The Pareto 80-20 rule applies here as well, as 80 percent of the details one would need for business come from the internal and transactional data. Using big data is only viable for organizations that have matured in the data utilization curve. Once the business intelligence bit and predictive analytics have been achieved, only then does it makes sense to move toward big data. w w w. i n f o r m s . o r g
Organizations need to prepare themselves with adequate knowledge resources and skilled data scientists who are adept at not only building statistical models, but also at using cutting-edge programming for applying machine learning. Business experts need to first ask themselves these questions: What use case will they apply big data on? What insights do we need from the big data? Premature application of big data techniques, either without requisite
a na l y t i c s
expertise or without the knowledge of the business case to be solved, will result in the waste of human and capital resources. â?™ Durjoy Patranabish is senior vice president of Big Data Analytics at Blueocean Market Intelligence. He has more than 17 years of experience in IT services, KPO and analytics services, and BPO and back-office services for global brands and regional leaders. Sukhda Dhal is a consultant, Big Data Analytics, at Blueocean Market Intelligence. She has more than four years of experience in business consulting, analytics and technology services at blue chip firms and startups.
n o v e m b e r / d e c e m b e r 2 014
|
11
Analy ze T h i s !
An analytics professor/ practitioner looks at 50 “Time, time, time, see what’s become of me. While I looked around for my possibilities, I was so hard to please.” — “A Hazy Shade of Winter,” by Paul Simon and Art Garfunkel [1] Between my day job as a business school analytics teacher, an odd collection of research projects and various other analyticsrelated commitments, these days I feel like I’m working on a wider variety of projects than ever!
By Vijay Mehrotra
12
|
By the time that you read this, I will be 50 years old. The Big Five Oh. Indeed, I am now officially eligible to join the American Association of Retired Persons. The thing is, I’m not retired. Not even close. In fact, between my day job as a business school analytics teacher, an odd collection of research projects and various other analytics-related commitments, these days I feel like I’m working on a wider variety of projects than ever! In fact, in approaching this milestone birthday, I have had a chance to reflect on a variety of matters personal and professional. Here, in no particular order, are some of my thoughts upon arriving at the half-century mark. Update on analytics survey. In early 2013, I asked readers of this column to complete an online survey to support a research project that Accenture’s Jeanne Harris and I were doing. Many of you graciously took the time to respond affirmatively to this request, and thus ended up taking our rather lengthy survey. Survey respondents were asked to answer questions about a wide variety of topics including job titles, educational background, organizational structure and
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
company culture, and the types of data, software and mathematical tools utilized in their work. Other volunteers participated in a series of focus groups with us to provide us with additional, more detailed input for our research. Well, given that we were studying the world of people who work with large amounts of data, it seems somehow appropriate that we found ourselves with a whole lot of data to analyze. But over the past year or so, we managed to test our long list of research hypotheses and 12-6 leeds ad alternate_Layout 1 12/20/13 8:13 AM Page 1
come up with some interesting findings. One discovery was that there were in fact many significant differences between those with the job title “data scientist� and those with more traditional job titles such as business analyst, statistician, industrial engineer and six-sigma black belt. (We also discovered that there were a startling number of job titles and organizational structures that our data-centric survey respondents fell under.) While some of the findings were not
Stand Out. Leeds alum Matt Emmi, founder of OneButton, Boulder-based tech company
Position yourself in a lucrative new career
with a master’s degree in Business Analytics or Supply Chain Management. Intensive nine month programs World-renowned faculty
Experiential projects with industry clients Personalized professional development
Be Boulder.
Get started now: www.leeds.colorado.edu/MS
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
13
Analy ze T h i s !
After analyzing a great deal of historical data, they find that a team’s late season winning percentage is not a significant predictor of post-season success. The Major League Baseball playoffs, it seems, are (at least statistically) a whole new season.
surprising (e.g., “data scientists tend to work with larger data sets integrated across more sources”), there were some interesting insights that emerged (e.g., “data scientists are far more likely to use prototypes to garner support for their projects” and “data scientists are much more likely to be focused on helping their organizations develop a unified view of their customers”). Anyone interested in seeing a summary of these findings should feel free to contact me via e-mail (vmehrotra@usfca.edu). As part of this project, we also examined best practices for managing data scientists. Our findings in this area are presented in a paper entitled “Getting Value from Your Data Scientists” that was recently published in the MIT Sloan Management Review. The paper can be accessed online. Feel free to send me an e-mail with your thoughts, reactions and comments. San Francisco Giants. As I write this, my beloved San Francisco Giants are playing in the World Series, trying for their third championship in the last five years. Like the rest of the orange-clad Giants faithful, I am ecstatic at this year’s post-season success, but I must confess to also being a bit surprised, for this year’s team won only 88 of 162 games (the lowest of any team that qualified for this year’s post-season). Moreover, these Giants finished a distant six games behind Los Angeles Dodgers, their perennial rivals who once again captured the National League Western Division championship. Worse yet, the Giants struggled down the stretch, winning just six of their final 15 games and barely qualifying for the playoffs.
14
|
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
Yet the Giants have thrived once again in the post-season and are, at the time of this writing, just three games away from winning the 2014 World Series. Though I am surprised, Jonah Keri and Neil Paine are not. Their recent article [2] on fivethirtyeight.com states that after analyzing a great deal of historical data, they find that a team’s late season winning percentage is not a significant predictor of post-season success. The Major League Baseball playoffs, it seems, are (at least statistically) a whole new season. Let’s go Giants! Learning to translate. I have been a university faculty member for the past 11 years. Prior to that, I spent 11 years in industry after finishing graduate school. On the occasion of my 50th birthday, I find that symmetry to be both amusingly coincidental and oddly appropriate, as I feel as though I’ve been straddling the line between industry and academia for all of my adult life. Since becoming a professor, I have continued to work with startup companies in a variety of roles. When considering whether or not to get involved with a company, I typically ask myself three questions: • Does this company have a reasonably high probability of getting funded, growing and/or ultimately becoming successful? a na l y t i c s
• Can I add value to this company by helping them with the technical problems and/or business problems that it is likely to face? • Will working with this company give me a chance to learn something valuable that I can share with my students and colleagues? I recently agreed to serve as an advisor to an exciting new start-up in Silicon Valley. My primary responsibilities are to serve as a sounding board for their lone data scientist and to provide a bridge between this data scientist and the company’s executive team. This role in some form or another is an increasingly common one. As Anil Kaul, CEO of AbsolutData, observed during one of our research focus groups, “We are starting to see a significant increase in the demand for high-level ‘translators’ within data science project teams.” Somehow it feels like I’ve been preparing for this role all my life. Vijay Mehrotra (vmehrotra@usfca.edu) is a professor in the Department of Business Analytics and Information Systems at the University of San Francisco’s School of Management. He is also a longtime member of INFORMS. REFERENCES 1. http://www.youtube.com/watch?v=bnZdlhUDEJo 2. http://fivethirtyeight.com/datalab/the-as-tailspinmight-not-matter-once-the-playoffs-start/
n o v e m b e r / d e c e m b e r 2 014
|
15
IN FORMS in i t i at i ve s
Webinar series preps candidates for CAP exam The CAP Board recognizes that preparation is individualized; some people have had more or different education, others have had more or different experiences.
By Scott Nestler, CAP
16
|
The question asked most often about the INFORMS (Institute for Operations Research and the Management Sciences) Certified Analytics Professional (CAP®) program is: “How do I prepare for the exam?” To help people prepare and allay their fears of an unknown exam, the CAP program volunteers and staff have written and published a Candidate Handbook with sample questions, a CAP Study Guide that outlines topics for each domain and an introductory webinar. All are posted on the INFORMS website at www.informs.org/ certification; the introductory webinar is also posted on BrightTALK at https://www.brighttalk.com/ webcast/11147/109485. The CAP Board recognizes that preparation is individualized; some people have had more or different education, others have had more or different experiences. Others like to have many, many foundational texts that they can pore over to see if they still recall the information from their educational program. One blanket preparation guide will not suffice for all.
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
In an attempt to provide more information for those who wish it, the CAP Board offers a series of webinars designed to help candidates prepare for the exam. The webinar series is designed for those who are curious about the program, those who want to know what’s in the exam, those who feel the need for a little preparation and for those who just want a little more information. The webinar series is tentatively scheduled to begin in December 2014 andAnalytics_AD_INFORMS.ai run through February 2015. The 1 8/12/2014 9:15:05 AM
first webinar will be devoted to an overall description of the CAP program; subsequent webinars will be delivered every other week and will focus on specific domains of practice as identified in the job/ task analysis upon which the program is based. The first webinar will focus on problem framing and will cover Domain 1 (business problem framing) and Domain 2 (analytics problem framing). The next two webinars will focus on Domain 3 (data) and Domain 4 (methodology and approach selection), respectively, while
C
M
Y
CM
MY
CY
Steve B. Class of ‘15
THE COMPETITIVE ADVANTAGE DEGREE
CMY
K
MASTERS OF SCIENCE ANALYTICS AT HOUSTON CITYCENTRE OR VIA ONLINE The Texas A&M program gives me the curriculum I’m looking for, the flexibility to participate remotely and a top-notch staff of seasoned professors with loads of practical experience beyond the university.
FOR MORE INFORMATION | analytics.tamu.edu
EXCERPT FROM CLASS OF 2015 FIRST SEMESTER EVALUATIONS
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
17
IN FO RMS ini t i at i ve s
Candidates for CAP can view the webinars as they need to freshen their knowledge of specific portions of the job/ task analysis, to review examination content or simply to find out more about some area of analytics.
the last in the series will cover Domains 5, 6 and 7 (model building, deployment and lifecycle management, respectively). The tentative schedule is: • Introduction – Dec. 3 • Domains 1 and 2 – Dec. 17 • Domain 3 – Jan. 7, 2015 • Domain 4 – Jan. 21, 2015 • Domains 5, 6 and 7 – Feb. 4, 2015 INFORMS is working with BrightTALK™ (www. brighttalk.com), a company that provides webinars and presentations for professionals and their communities. The CAP Board is providing the webinars to the analytics community; they will be available free on demand after their original, live presentation. During the live presentation, questions can be sent to the presenters for answers at the end of the webinar or for answers by staff after the conclusion of the webinar. Candidates for CAP can view the webinars as they need – to freshen their knowledge of specific portions of the job/task analysis, to review examination content or simply to find out more about some area of analytics. In addition, the CAP Board hopes to provide other topical and timely information on analytics through the same medium. For more information, visit the Certified Analytics Professional website at www.informs.org/ certification or contact certification@informs.org with any questions. Scott Nestler is the chairperson of the Analytics Certification Board and a longtime member of INFORMS.
18
|
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
No matter what kind of forecasting you do, we invite you to
take Foresight for a “test drive.” Published for business forecasters, planners, and managers by the International Institute of Forecasters (IIF), Foresight: The International Journal of Applied Forecasting delivers authoritative guidance on forecasting processes, practices, methods, and tools. Each issue features a unique blend of insights from experienced practitioners and top academics, distilled into concise and accessible articles, tutorials, and case studies. Our mission is to help you improve the accuracy and efficiency of your forecasting and operational planning.
Foresight’s topics include
• S&OP process design and management • Forecasting principles and methods • Measuring and tracking forecast accuracy • Regular columns on forecasting intelligence, prediction markets, financial forecasting • Hot new research and its practical value • Reviews of new and popular books, software, and other technologies To take Foresight for a spin, download a recent issue here:
bit.ly/ForesightTestDrive
To receive quarterly hard copy issues, unlimited access to our library of back issues, and much more, subscribe to Foresight here: forecasters.org/foresight/subscribe Foresight is a publication of the International Institute of Forecasters. IIF Business Office: 53 Tesla Avenue, Medford, MA 02155, USA. Tel: 1-781-234-4077
forum
Your data already knows what you don’t The marriage of two “natural resources” – hydrocarbons and data – will transform unconventional oil development. How do we gain ground on the vexing “unknowns” to tilt the inherent risks involved in shale oil development in our favor?
By Atanu Basu (above), Daniel Mohan and Marc Marshall
Known-knowns, known-unknowns and unknownunknowns. Donald Rumsfeld’s notable turn of phrase is an apt characterization of where we are with unconventional oil development today. Shale operators in Eagle Ford (South Texas), Permian (West Texas), Bakken (Upper Midwest) and other places have transformed the United States into an energy superpower by profitably extracting oil and gas from tight rocks that weren’t commercially viable even a few years ago. With that backdrop, unconventional oil development today is punctuated by significant performance variations among operators with contiguous acreage positions and meager estimated ultimate recovery (EUR) rates. Unless performance keeps improving, any fluctuation in commodity prices can send shockwaves through the oil patches around the country, as we have seen happen with natural gas. How do we gain ground on the vexing “unknowns” to tilt the inherent risks involved in shale oil development in our favor? Standing on the shoulder of giants Geoscience (geoscientists are the giants of the energy industry) is finally getting a shot in the arm from data science, especially from Google-like technologies
20
|
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
Images
Videos
2D/3D/4D Seismic
Downhole Camera monitoring fluid flow
Microseismic Well Logs, Mud Logs, Offset Logs
Time-based image sequences of acoustic and EM fracture monitoring
Sounds
Distributed Acousting Sensing (DAS) - fiber optic sensors
Texts
Numbers
Completion Procedures
Completion Results
Core Analysis
Production Data
Past and Present Notes from Drilling Engineering
Artificial Lift Data
Figure 1: Examples of shale data sets. that are already at work in the oil patch. Leading the charge is prescriptive analytics, which can “prescribe” optimum recipes for drilling, completing and producing wells to maximize an asset’s value at every point during its operational lifetime. The premise of prescriptive analytics is to take in all data – Figure 1 shows examples of shale data sets – and use the data to predict and prescribe how to make better wells using information from the past wells and subsurface characteristics of undrilled acreage. While today’s sophisticated operators and energy services companies are adept at analyzing each of these data a na l y t i c s
sets separately, prescriptive analytics technology is unique in that it processes these structured and unstructured data sets together, and does so continually. Since reservoir conditions are anything but static, the machine learns from new streams of data and updates its “prescriptions” when the data sets signal the need for a recalibration. This adaptive environment compresses learning curves, enabling better decisions faster, with less risk – and much less capital. Questions worth answering Let’s “begin with the end in mind” – as the late Dr. Steven Covey used to say n o v e m b e r / d e c e m b e r 2 014
|
21
forum
– and understand the outcomes made possible by prescriptive analytics, using data sets most operators already have on hand and/or routinely collect in the course of normal operations. Some examples: Planning • Which reservoir, drilling, completion and production variables have the greatest impact on production? • How closely should we space wells? Do we have stage overlap? Formation containment? • Does the order in which we treat and/ or produce adjacent wells matter? Why? Production • Which stages and clusters were treated effectively? Treated as expected? Why? • Which stages are producing? Producing as expected? Which are not? Why? • How should a well be produced to maximize its lifetime value? Secondary Recovery, EOR • When should artificial lift be introduced in the lifecycle of a well to maximize estimated ultimate recovery (EUR)? • When should enhanced oil recovery (EOR) be introduced in the lifecycle of a well in order to maximize EUR?
22
|
a n a ly t i c s - m a g a z i n e . o r g
Does EOR result in higher recovery rates, or are recoveries simply accelerated? • What is the incremental return on investment of EOR? Where is the point of diminishing returns? You don’t have to know as long as your data does Operators can start using – and reaping benefits from – data-driven prescriptions immediately, even if the underlying causalities are not fully understood. Think about this for a minute. We use Google to find restaurants or plan a route that avoids traffic congestion without completely understanding how or why the engine’s algorithms produced the suggestions they did. Does that lack of understanding make the results less useful? Of course not. The same holds true for prescriptive analytics. The technology makes an immediate impact while the geoscientists, in parallel, strive to understand the physics behind the predictions and prescriptions the software derived from an operators’ data – lots and lots of data of all types. ❙ Atanu Basu (atanu.basu@ayata.com) is president and CEO of Ayata, a prescriptive analytics software company based in Houston, Texas, and a member of INFORMS. Daniel Mohan is senior vice president of sales & marketing, and Marc Marshall is the senior vice president of engineering SVP at Ayata. A version of this article appeared in Hart Energy’s blog site, E&P. Reprinted with permission.
w w w. i n f o r m s . o r g
CPLEX Optimization Studio®. Still the best optimizer and modeler for the finance industry. Now you can get it direct
CPLEX Optimization Studio is well established as the leading, complete optimization software. For years it has proven effective in the finance industry for developing and deploying business models and optimizing business decisions. Now there’s a new way to get CPLEX – direct from the optimization industry experts. Find out more at optimizationdirect.com The IBM logo and the IBM Member Business Partner mark are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. *IBM ILOG CPLEX Optimization Studio is trademark of International Business Machines Corporation and used with permission.
healt h car e a n a ly t i c s
2014: Year of many challenges Meaningful analytics is crucial to ensure delivery of quality care and improve the patient experience, while maintaining a positive bottom line.
By Rajib Ghosh
24
|
2014 has been quite a remarkable year for the healthcare industry. We have seen various ups and downs, from the healthcare.gov fiasco to the surge of investment in the digital health technology space. A recent report published by StartUp Health shows that $5 billion has been invested in digital health technology during the first three quarters of 2014, which exceeds the total investment for all of 2013. Apple released its Healthkit product, and Google announced Google Fit – both are harbingers for personal datalevel analytics in the cloud. Both companies are moving aggressively forward in the healthcare technology space, and I predict that their competition will produce great outcomes for patients as consumers. Healthcare analytics is slowly taking center stage. As of this writing, $381 million was invested in healthcare-related analytics and big data in 2014. A new study by Johns Hopkins published in Academic Medicine found that analytics is critical to the success of accountable care organizations (ACO) [1]. In a recently concluded Healthcare Analytics Summit organized by Health Catalyst, senior leaders from Geisinger Health Systems, Cleveland Clinic and Texas Children’s Hospital confirmed that meaningful analytics is crucial to ensure delivery of quality care and improve the patient experience, while maintaining a positive bottom line.
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
Analytics & Hadoop So much data. So little time. So easy.
What good are massive amounts of Big Data if you can’t analyze it all? Or if you have to wait days and weeks to get results? Combining the analytical power of SAS® with Hadoop lets you go from data to decisions in a single, interactive environment — with the fastest results and greatest value.
Read the TDWI report
sas.com/tdwi
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
25
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2014 SAS Institute Inc. All rights reserved. S131590US.1014
healt h car e a na ly t i c s
I expect that by 2017, analytics will drive success for ACOs across the board. That’s good news for analytics professionals and product companies. The needle has started to move at last.
Cost and return on investment of data analytics solutions pose significant barriers to adoption within many organizations.
Pioneer ACO Data Shows Challenges Ahead
CMS designed the ACO model to improve care by sharing data among multiple stakeholders. ACOs are a cornerstone of the Affordable Care Act (ACA). The idea is that by focusing on the holistic picture of a patient as they move through the healthcare system, providers can not only prevent wasteful duplication of diagnostic tests and procedures, but can also deliver the right care at the right time to the right patient. Such targeted intervention in turn will help in the prevention of unnecessary emergency room visits and costly hospital readmissions. Patients do not want to spend time in the hospital, so reduced hospitalizations and better health for individuals will drive up patient satisfaction. This is the “triple aim� that ACA wants the healthcare industry to move toward. The majority of ACOs, however, are far from achieving that goal. They have not found much success in improving key performance indicators such as patient safety, cost containment, efficiency and patient satisfaction [2]. Geisinger Health System, one of the leading physician-led healthcare systems in the country, mined its huge data sets, and it has found that an inverse
After a long wait, the Center for Medicare and Medicaid (CMS) recently published quality and financial data reported from the Pioneer Accountable Care Organization (ACO) program. CMS started the program in 2012 to improve quality and health outcomes of patients by aligning payment incentives for providers. Of the 32 original Pioneer ACOs who participated in the program, 60 percent posted savings in the first year (2012) while 40 percent posted losses. The worrisome thing is by the second year about 50 percent of the initial participants dropped off the list. The cost incurred in the program by those who dropped off was much higher than their savings. However, CMS reported that in general the program saved Medicare $96 million in a two-year span. That is a good start, but clearly ACOs have mountains to climb in terms of data interoperability and data analytics. A recent survey of 62 ACOs by Premiere and eHealth Initiative found that 83 percent of ACOs are facing challenges in integrating analytics into their workflow. 26
|
a n a ly t i c s - m a g a z i n e . o r g
What is Holding Back Transformation?
w w w. i n f o r m s . o r g
correlation exists between the cost and quality in healthcare. According to them, as the costs of providing healthcare services decreases, the quality of care improves … and vice versa. Better patient outcomes are determined by the coordination of care and timely intervention for the right patient. However, to do that effectively organizations need to share data, re-engineer workflows, work collaboratively and embrace analytical tools that can provide them with actionable clinical insights in real time, preferably at the point of care. Unfortunately, most organizations are stuck in the first fundamental step, i.e., sharing data. This is probably the biggest disappointment for the industry during the last few years. There is Hope, However. The Office of the National Coordinator for Health IT is taking a keen interest in improving interoperability among various health IT vendors. To that end, the organization unveiled a 10-year plan in June for building a robust interoperable health IT ecosystem. According to ONC, an interoperability roadmap will become available within the next three years that will be based on scaling current health information exchanges (HIE) across various vendor platforms. Regardless of what political view we subscribe to, it will be unwise to believe a na l y t i c s
that without significant government intervention such a roadmap can be implemented. Are we going to see a similar carrot-and-stick approach as we have seen in the “meaningful use” program for electronic health record systems? No matter how significant the challenge, there is no doubt that the time is ripe for healthcare analytics. CB Insights predicted that this area is poised to attract close to $1 billion in investment by the end of 2014. Irrational exuberance or just the tip of the iceberg? You decide. Rajib Ghosh (rghosh@hotmail.com) is an independent consultant and business advisor with 20 years of technology experience in various industry verticals where he had senior level management roles in software engineering, program management, product management and business and strategy development. Ghosh spent a decade in the U.S. healthcare industry as part of a global ecosystem of medical device manufacturers, medical software companies and telehealth and telemedicine solution providers. He’s held senior positions at Hill-Rom, Solta Medical and Bosch Healthcare. His recent work interest includes public health and the field of IT-enabled sustainable healthcare delivery in the United States as well as emerging nations. Follow Ghosh on twitter @ghosh_r. REFERENCES 1. Scott A. Berkowitz, MD, MBA, and Jennifer J. Jahira, 2014, “Accountable Care Organization Readiness and Academic Medical Centers,” Academic Medicine, September 2014, Vol. 89. No. 9, pp. 1,210–1,215. 2. Premiere and eHealth Initiative, “The Landscape of Accountable Care and Connected Health: Results from the 2014 National Survey of Accountable Care Organizations.”
n o v e m b e r / d e c e m b e r 2 014
|
27
Big data b u z z k i ll
Goal-driven analytics Big data needs advanced analytics, but analytics does not need big data.
By Eric A. King
T
hanks big data! Now we’re even more data-rich …yet remain information-poor. After staggering investments motivated by an overabundance of buzz and hype, big data has yet to produce cases that reveal substantial verified return. Organizations are becoming harder pressed to show value, but they’re not sure where or how to draw it. 28
|
a n a ly t i c s - m a g a z i n e . o r g
Professor Dan Ariely of Duke University relates big data to teenage sex: “Everyone talks about it; no one really knows how to do it,” he says. “Everyone thinks everyone else is doing it; so everyone claims they’re doing it.” In an article from the December 2013 Harvard Business Review by Jeanne W. Ross, Cynthia M. Beath and Anne Quaadgras, the very title suggests that “You May Not w w w. i n f o r m s . o r g
Need Big Data After All.” The authors rightfully argue that even before big data, most companies did not make productive use of valuable information already at hand. So, jumping ahead to big data is like attempting to operate a jet fighter before gaining proficiency in a sedan. It’s no wonder why big data is proceeding rapidly into the third stage of Gartner’s Technology Hype Cycle [1]: (dark scary voice) the “trough of disillusionment.” The practice of big data overall has its merit and will not go away. It requires that
organizations actively think about how to accommodate rapidly increasing volumes and varieties of data. Yet, only the companies that successfully implement predictive analytics and effectively act upon the value-laden information mined from their large data stores will enjoy early and sizable returns. In fact, advanced analytics is arguably the only way in which bottomlined accountable and residual payback from big data will be extracted. The standard big data practice of collecting, storing, transporting, connecting,
http://meetings.informs.org/healthcare2015
$
ting n e v n i e R alue V
INFORMS third conference on the health sector, bringing together researchers and stakeholders around the most current work in healthcare operations research, systems engineering, and analytics in one highly-focused conference. Cross-cultural view of healthcare systems and analysis of operational impacts. Structured networking opportunities, including birds-of-a-feather discussion groups and facilitated networking over lunch.
HEALTHCARE 2015 NASHVILLE TENNESSEE
a na l y t i c s
Collegial, small-scale setting in a great hotel in one of the fastest growing cities in the U.S.
JULY 29-31 OMNI HOTEL
n o v e m b e r / d e c e m b e r 2 014
|
29
g oal - d r ive n a n a ly t i c s
organizing, extracting and even visualizing rapid streams of data is essentially a cost center activity. Only when content of value is operationalized into active decisioning and measured for impact will big data’s liability be converted into an intelligence asset. Big data’s recovery up the Hype Cycle [1] “Slope of Enlightenment” will come in the form of actionable analytics for automated decision-making at the operational level and proactive recommendations at the strategic level. Size and Success Don’t Correlate Big data enthusiasts are finding that the more data they collect, the harder it becomes to understand just what the data is telling them. And most practitioners are surprised to learn how little data is required to build a highly effective goal-driven model. It’s not a matter of having a lot of data, but a valid sampling of data to support the target objective. For advanced analytics, it is far more important for a database to be wide with attributes or variables than long in transactions. Thanks to big data innovations, more variables are being collected than ever before. In fact, data dictionaries are starting to be turned on their side to allow vertical scrolling through a growing number of attributes. 30
|
a n a ly t i c s - m a g a z i n e . o r g
Only variables that have no relationship to the target objective should be excluded. A development model will automatically rank the limited set of variables that have predictive value toward the objective. The remainder can be eliminated from the final model and potentially from the analytic sandbox. Only enough transactional data to adequately represent the solution space for the application at hand is required to develop the model. There are standard rules of thumb based on the final number of attributes or dimensionality of the final model that suggest the number of records or transactions needed to derive the train, test and validation data sets for model development. Most times, this range is from 5,000 to 250,000 records – a mere quark in the vast universe of big data. But without a use plan for data, companies feel at risk to not harvest all possible data. This digital hoarding overwhelms analysis and motivates strategies for deriving streamlined analytic sandboxes. The sandboxes draw targeted data for goal-driven model development from the vast stores of useless “dark data.” One other consideration toward limiting data for more streamlined analytics is to start with available structured data. In most organizations, structured data holds far more predictive value and requires far less preparation labor than open text. w w w. i n f o r m s . o r g
USD_Online MBA BA Analytics Magazine Ad.pdf
1
6/9/14
9:15 AM
Advance your career with an online Master of Business Administration with a specialization in Business Analytics.
C
M
Y
CM
MY
CY
CMY
K
Solve key business problems utilizing big data. Earn an AACSB-International accredited Master of Business Administration with a specialization in Business Analytics from the University of South Dakota.
Learn more: www.usd.edu/cde The University of South Dakota’s Beacom School of Business has been continuously accredited by AACSB-International since 1949.
a na l y t i c s
DIVISION OF CONTINUING & DISTANCE EDUCATION 414 East Clark Street | Vermillion, SD 57069 605-677-6240 | 800-233-7937 www.usd.edu/cde | cde@usd.edu n o v e m b e r / d e c e m b e r 2 014 | 31
g oal - d r ive n a n a ly t i c s
Why jump straight into drilling sideways for limited resources if there are pressure wells to tap at the surface? Big Data 2.0 Must Progress to Analytics 3.0 The International Institute for Analytics and Thomas Davenport rightfully relegate big data into a 2.0 version on its analytic maturity model behind the purpose-driven Analytics 3.0 [2]. The timeline denotes traditional analytics as the first stage, big data as the second and actionable analytics as the third. Organizations that progress quickly to Analytics 3.0 to combine traditional analytics, machine learning, big data, goal-driven strategy and embedded predictive decisioning at the operational level will become leaders that achieve measurable returns. Yet analytically, most organizations are working on the wrong end of the problem. Instead of taking a strategic, goaldriven approach, they are proceeding with a technology focus. They are hiring “data scientists” and extending their technical capability to perform more sophisticated analyses. This approach will fail or fall short at the business level for a host of strategic reasons. They may build technically superior models that conform well to artificial metrics. But their optimized models won’t align with business objectives, 32
|
a n a ly t i c s - m a g a z i n e . o r g
achieve overall performance metrics, integrate with the operational environment, gain adoption by users, integrate effectively into operations or be translated in terms that leadership can apply. The analytic industry is attempting to define “data scientist” as a dynamic analytic practitioner who holds advanced analytic skills, vast IT experience and managerial soft skills to oversee analytic processes at the project level. Not only is this superstar mix of technical skill and leadership personality extremely rare, but the term “data scientist” itself faces multiple challenges. On one hand, many amateur business practitioners are loosely donning the label, diluting its reputation. On the other, “scientist” suggests a formal discipline and deep vertical experience along with a research component. This is probably the most fitting definition for “data scientist.” Yet those technical skills alone won’t achieve Analytics 3.0 objectives. These technology-driven formal data scientists typically view strategic assessment and project planning as theoretical fluff. They jump directly into the trees with little regard to the forest – delighting in writing increasingly complex code, creating ever more sophisticated algorithms, and then wondering why business leaders don’t implement their findings more readily. If these trends continue, then the w w w. i n f o r m s . o r g
title “data scientist” will only live up to its label of a theoretical quantitative specialist and fail to have strategic or even operational impact. The majority of companies don’t realize that common business practitioners can leverage modern predictive modeling software that encapsulates the complexity of machine learning to quickly build “more than adequate models.” This can be done in conjunction with an analytic support team. Beyond IT and the business owner, the team should include
strategic oversight from a seasoned senior consultant who can collaboratively develop an overarching optimized modeling process. The resulting process will follow the blueprints developed from the information amassed in the assessment. The process will not only ensure optimal deployment of the model, but roadmap and tailor all actions from operating within the sandbox, to data preparation, model development, deployment, validation, reporting and model lifecycle management.
Which organization has the best O.R. department in the world?
prize
call for
nominations
DEADLINE TO APPLY IS DECEMBER 1, 2014
The Institute for Operations Research and the Management Sciences annually awards the INFORMS Prize for effective integration of Operations Research/Management Science (OR/MS) and advanced analytics into organizational decision making. The award is given to an organization that has repeatedly applied the principles of OR/MS and advanced analytics in pioneering, varied, novel, and lasting ways.
log onto: www.informs.org/informsprize for more information or contact Peter Buczkowski, 2015 Prize Chair voice: +1 407-560-2299 email: peter.s.buczkowski@disney.com
2015
Tell us why the title should be yours.
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
33
g oal - d r ive n a n a ly t i c s
In the end, Analytics 3.0 will lead organizations to shift their thinking from tactics and technology to strategy and measured impact. Strategic Implementation is Imperative Most leadership today does not realize how expensive it is in the long run to insist upon immediate results and instant payback. Instead of investing time to design and build a modeling factory, they choose to manufacture each new analytic product as custom and demand delivery within nearly impossible timeframes. With big data and analytics, industry typically devalues comprehensive assessment and tailored project design – opting for immediate summaries or projections. Organizations continue to draw little value from disparate, ad hoc analyses that produce some nice-to-know insights, but fall short of driving goal-driven decisions that translate impact back to leadership. It is a common practice to request case studies to evaluate vendors and technology. But it’s misguided, as each implementation is highly situational and based on a multitude of contributors. Just because 10 similar organizations realized substantial gains does not mean that your team is even at the starting line. Case summaries convey very little about 34
|
a n a ly t i c s - m a g a z i n e . o r g
process and project design issues that are critical to achieving overall project success. They are simply indirect justifications that the technology can actually generate substantial returns in the right situations. For analytics to arrive at tangible and residual value, many questions need to be addressed in advance of implementation. A number of public domain industry-standard processes outline specific strategic phases, tasks and issues to be examined before even exploring any data. Yet the vast majority of practitioners fail to reference it and instead jump headlong into the data. They are unaware that the most critical pitfalls relate to a lack of soft skills – not analytic limitations. Following are just a few strategic considerations that sound obvious, yet are rarely ever addressed prior to model development or process implementation: • Buy-in: Is leadership onboard? Are they motivated or ambivalent? Do they view analytics as an esoteric and theoretical function? Or have they heard enough industry buzz to be seriously concerned about their analytic maturity and vulnerability? • Team capability: Do team members appreciate the importance of strategic implementation and project design effort? Will they understand why project definition is imperative to project success w w w. i n f o r m s . o r g
– and why building a structure without sound design and blueprints is likely to fail? • Politics: What is the make-up of the overall team that will contribute to or be impacted by analytics? Has each member been interviewed for their role, experience, objectives, motivations and concerns? Are there competing interests? Threats? For example, are traditional statisticians firmly resistant to shifting their mindset to a more strategic and agile model-building focus?
Likewise, without the oversight of experienced modelers, will egregious errors be made in data preparation and results interpretation? • Alignment of objectives: Has each team member impacted by the analytic process been asked about their individual goals? Have you drilled at least two or three levels deep? Often the surface issue expressed is not the true underlying concern. • Performance projections: Have baselines for current performance been
Analytics Professionals:
Regardless of where you work, let INFORMS help you connect with others in your field.
Blogs www.informs.org/Connect-with-People/Social-Networking
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
35
g oal - d r ive n a n a ly t i c s
established along with target performance and its impact on operations? Without the baseline and target, how will success of the analytic initiative be defined, measured and interpreted? • Ability to affect: Does the organization have the willingness and wherewithal to carry out potential model recommendations? If not, we haven’t passed the “so what” test. It is far less expensive to determine in advance of a modeling objective that the organization “can’t handle the truth.” • Decision culture: Does your company drive more from general leadership experience and feel, or evidence-based decisioning? If the former, is leadership open to letting go of one handlebar and allow a pilot to compete in a series of A/B tests? • Cost of status quo: Referencing back to “Buy-in” at the top of the list, considering the ultimate cost of doing nothing is often what gets leadership off the fence. Leadership need not be analytically literate to appreciate that supporting costly big data initiatives does not make sense unless a more capable and purposeful analytic practice is prepared to leverage it. The information amassed from these and many other strategic and tactical considerations is used to prepare a highly 36
|
a n a ly t i c s - m a g a z i n e . o r g
tailored analytic project design. The resulting process supports agile model development by functional managers and business practitioners. This is the engine required to generate measurable benefit from big data. Goal-Driven Analytics Will Justify Big Data Until leadership grants analytic teams the six to eight weeks to assess and design tailored analytic processes that will rapidly produce analytic models to support specific business targets, data analysis will continue to be a theoretical practice that produces little more than interesting insights and isolated low-value remedies. The vast majority of companies will remain analytically immature and dysfunctional. This creates a significant competitive opportunity for those who invest in formal strategic assessment and design. Here are the primary takeaways: 1. Don’t wait for big data to stand up. It’s a journey and not a destination. Analytics can start bringing value at any stage of a big data implementation and even help justify further big data investment. 2. Get trained. Seek a vendor-neutral trainer that not only provides methods w w w. i n f o r m s . o r g
career analytics. Enroll now only AAS nation. Flexibility
Credential Options
Executive Accelerated Program Industry Recognized Tools & Skills
Wake Technical Community College served 68,919 students in 2012-13 and was ranked the second largest community college in the country in 2012 by Community College Week. A future forward college, it launched the AAS in Business Analytics, the first of its kind, in 2013. The program provides students the knowledge and practical skills necessary for employment and growth in analytics professions in as little as two semesters. Competitive tuition, open-door enrollment, flexible scheduling options, access to industry recognized tools, and a variety of credential options make enrollment in the program both accessible and affordable. This program is funded in full by a $2.9 million Dept. of Labor Trade Adjustment Assistance Community College & Career
g oal - d r ive n a n a ly t i c s
and tactics, but has a specific focus on project-level strategic implementation. 3. Comprehensive assessment. It is infinitely more effective to select the most viable and valuable modeling project after having surveyed leadership, team members, resources and the environment than to perform great work on a doomed initiative or start sifting for insights without a performance target. 4. Conduct an underground pilot. If the initial results fall short, that’s part of the overall discovery process. Shift and cycle again. If they exceed, then market to leadership and expand. 5. Ongoing strategic oversight. Seek the guidance of a seasoned mentor. This consultant will have the experience to anticipate hurdles, overcome elusive pitfalls and provide a low-risk/ high-reward roadmap to greater returns in a shorter time frame. Without a formal and comprehensive assessment performed by a senior strategic analytic consultant, organizations will continue to perform analysis for the sake of analysis. The results of this practice will uncover some discoveries of interest that rarely align with business objectives or translate to impact. Instead, goal-directed analysis driven by a methodical assessment and tailored project design lifts a specific business 38
|
a n a ly t i c s - m a g a z i n e . o r g
objective by a measurable margin. Of course, this is what translates well for leadership and puts data productively to work, whether big or small. ❙ Eric A. King is the president and founder of The Modeling Agency, LLC, an advanced analytics training and consulting company providing strategic guidance and impactful results for the data-rich yet information-poor. King is a copresenter in a monthly live, interactive analytics webinar entitled “Data Mining: Failure to Launch.” He may be reached at (281) 667-4200 x210 or eric@the-modeling-agency.com.
Acknowledgements The author thanks Carla Gentry of Analytical Solution for granting permission to use a slight variation of a fantastic blog phrase as the title of this article. Also, cheers to Professor Dan Ariely of Duke University for permission to quote his hilariously accurate teenage sex analogy for big data. Gratitude is extended to the International Institute for Analytics (IIA) and Tom Davenport to reference Analytics 3.0 and related IIA material in this article and TMA courseware with attribution. And finally, a gracious nod to Sandra Hendren, senior consultant at The Modeling Agency, for her review and brilliant edits. REFERENCES 1. Gartner, Inc., “Hype Cycles 2013 Research Report,” Gartner Technology Research, 2013 (https://www.gartner.com/doc/2571624). 2. Thomas H. Davenport. “Analytics 3.0,” Harvard Business Review (http://hbr.org/2013/12/ analytics-30), December 2013.
Subscribe to Analytics It’s fast, it’s easy and it’s FREE! Just visit: http://analytics.informs.org/
w w w. i n f o r m s . o r g
Visual Analytics Opportunity at your fingertips.
The answers you need, the possibilities you seek—they’re all in your data. SAS helps you quickly see through the complexity and find hidden patterns, trends, key relationships and potential outcomes. Then easily share your insights in dynamic, interactive reports.
Try Visual Analytics and see for yourself
sas.com/VAdemo
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2014 SAS Institute Inc. All rights reserved. S120597US.0214
mu lti- in d u st ry p rob le m
Employing big data and analytics to reduce fraud By Drew Carter and Stephanie Anderson (l-r)
E
ven a cursory Internet search of fraud crimes delivers a multitude of results: the Little League secretary siphoning off a few thousand dollars, the trader known as the London Whale losing more than $6.2 billion for JPMorgan Chase, and hackers gaining access to customer information at major retailers and international banks. Fraud is a multi-industry problem. Banking and credit are the ones that most frequently come to mind for the average 40
|
a n a ly t i c s - m a g a z i n e . o r g
person. However, retail, transportation and manufacturing are also prone to fraud. In fact, it would be difficult to name an industry impervious to it. Take the telecommunications industry, for example. According to the FTC, telecom fraud accounted for 34 percent of its fraud complaints in 2012, up from 20 percent in 2010. Verizon estimates that fraud costs the industry $4 billion a year. In telecommunications, fraud is most frequently focused in three areas: • Defrauding telecommunication companies w w w. i n f o r m s . o r g
• Defrauding telecommunication subscribers • Schemes conducted over the telephone
for specific patterns of behavior. For instance, one can look for transactions of a certain amount – say, more than $1,000 – between employees inside a company. However, similar to ever-evolving e-mail spam, the fraudsters are always devising new methods that can remain undetected for some period. Unknown fraud schemes, especially new ones, may continue for years without detection until they are uncovered in an investigation or a company’s deep dive into costs and profitability variances
While fraud is prevalent everywhere, its identification is not simple. There are two types of fraud schemes: “known” and “unknown.” Known fraud schemes are easier to identify. They are the scenarios where fraud has been identified in the past. Rules engines can be established in computer systems to look
c
ontinuing ducation
COURSES FOR ANALYTICS
PROFESSIONALS
Visit the Website for the latest course schedule: informs.org/continuinged
link between business needs and
INFORMS Continuing Education program offers intensive, two-day in-person courses providing analytics professionals with key skills, tools, and methods that can be implemented immediately in their work environment.
ESSENTIAL PRACTICE SKILLS FOR ANALYTICS PROFESSIONALS
DATA EXPLORATION AND VISUALIZATION
These courses will give participants hands-on practice in handling real data types, real business problems and practical methods for delivering business-useful results.
processes for addressing complex, real-world
essential tools for exploring and
INTRODUCTION TO MONTE CARLO AND DISCRETE-EVENT SIMULATION
FOUNDATIONS OF MODERN PREDICTIVE ANALYTICS
discrete-event simulation, how to identify
tools that will allow you to make the
simulation, and develop skills and intuition for
your technical skills.
Learn the basics of Monte Carlo and
real-world problem types appropriate for
Learn data mining techniques and
applying Monte Carlo and discrete-event simulation techniques.
Learn practical frameworks and systematic
problems and how to facilitate effective action.
a na l y t i c s
Learn methodologies, processes and visualizing data in order to derive insights and knowledge.
n o v e m b e r / d e c e m b e r 2 014
|
41
figh tin g frau d
Screen shots of the AlixPartners “World Platform” anti-corruption toolset (above and page 43).
during an economic downturn. These are the sinister schemes one can’t even imagine are happening because no one knows to look for them. Once they are uncovered and observed, their patterns can be “built into” rules-engines within a few days or weeks. Change from Reactive to Proactive Fraud prevention efforts are primarily spurred by reactive investigations and penalties. Few companies truly engage in proactive fraud monitoring. The majority of thought leadership in proactive monitoring has emerged from the financial services space. With millions of dollars (or more) at risk at the click of 42
|
a n a ly t i c s - m a g a z i n e . o r g
a mouse button, financial services companies have a clear incentive to actively monitor for fraud. One area that industry is monitoring is “bust-out” fraud, or first-party fraud in which the thief applies for a line of credit (credit card, etc.), behaves well, increases the credit line and then disappears, leaving a large balance delinquent. This type of scheme is estimated to cost more than $1.5 billion a year in losses, according to Credit Risk International. A recent bust-out fraud cost Southern California banks at least $15 million. That scheme involved 15 people, is alleged to have started in February 2010 and ran until October 2013. According to the FBI, it included: w w w. i n f o r m s . o r g
• “Processors” who fabricated or hired others to make fictitious checks for the purpose of conducting bust-outs; • “Brokers” who solicited people with legitimate bank accounts; these would lend their accounts to be busted out in exchange for a fee; and • “Runners” or “washers” who allegedly deposited fictitious checks into, then withdraw funds from, the account to be busted out. A few years ago a criminal group of more than 700 people cost U.S. banks over $80 million in losses. The most common scheme involved fraudulent loan applications that misstated how long the applicant had been employed a na l y t i c s
and grossly exaggerated yearly salaries. Via online applications, the culprits received credit cards with sizeable credit limits. Often, these people also received cash advances on the card. Shortly after the cash advances, they sent the issuing bank a check, frequently for slightly more than the outstanding balance. Although the check was returned for insufficient funds, the fraudulent payment caused the bank to temporarily increase credit lines. By the time the fraud was discovered, the bank was out tens of thousands of dollars per fraud incident. Despite the large potential losses, however, even the most sophisticated operators are losing ground to fraud. n o v e m b e r / d e c e m b e r 2 014
|
43
figh tin g frau d
What’s Needed to Succeed? Reactive fraud prevention will always be a handicapped method to prevent losses (and, often, embarrassing public events). Proactive fraud monitoring using advanced analytics, including big data, is required to adapt to the growing threat of fraud. What exactly is big data? We define it by the “4 V s ”: • Volume. Originally described as the size of data versus processing capability, volume today is typically measured simply by size of the data alone. This year, “big” volume might be 25 terabytes (TB); by next year, 250 TB. For comparison, it’s estimated that a jet engine in a Boeing plane generates 20 TB of data for every hour of operation; on one Atlantic crossing, a four-engine jet can create 640 TB of data. • Velocity. This is the frequency of generation and capture of batch, neartime and real-time streams of data. A world of real-time promotional offers (where offers are generated at the moment of interaction) requires lightningfast processing and feedback loops so that things like promotional campaigns can match geolocations, click streams, sentiments and purchase histories. For instance, online-ad technology can operate at 50 to 450 milliseconds (ms) and 44
|
a n a ly t i c s - m a g a z i n e . o r g
high-frequency stock-trading platforms operate at less than 60 ms for transatlantic round-trip transactions. • Variety. Data no longer fits into neat structures that happily reside in a traditional “database.” The proliferation in the variety of data sources (radio-frequency identification, sensors, social networking, mobile devices, etc.) and types (geospatial, etc.) – coupled with traditional sources (documents, click-stream sets, etc.) – conspire to generate a veritable fur ball. Add unstructured data to the mix, and things get even more complicated. • Virality. This is the speed at which data gets spread from person to person, whether by voice, image or machine. Social networks and the data they generate have created a new dimension of measurement: “going viral.” The monetization of data assets is about understanding factors old and new, and how they work together – not necessarily about capturing, storing or reporting on every piece of information passing near the orbit of a company. It’s about knowing what matters, discarding the rest, and focusing on the “important bits.” To come full circle, employing analytics for proactive fraud monitoring requires: Organizing around the data Companies often address their big data challenges and opportunities by w w w. i n f o r m s . o r g
directing a talented IT person to “own” the program at hand. This tactic typically fails. To develop a true data-insights approach to business, an organization must treat data as an asset. And that means the whole company must be structured to access, interpret and act based on insights drawn from the data, focusing on: • Robust internal data sets (organized and cleaned and ready for analysis) • External data (often from a combination of free and paid sources) that provides insight into fraudsters’ behaviors
! SS NE E C LI
(such as applications for multiple lines of credit) – often a signal of coming malfeasance. Agreeing that the business “drives this data” Big data projects must be driven by the company’s core business in a way that makes it user-friendly, not by taking a “build-it-and-they-will-come and figure it out” approach. The business begins by determining the key-performance areas that are crucial to manage or monitor.
es!
C N nde D A le O tte
a b CEs availaeeting N A al M DV tORi nual
A
u n 4T SA 201 ORM INF for
Bridging Data and Decisions Alexandra Newman and Janny Leung, volume editors J. Cole Smith, series editor INFORMS 2014 edition of the TutORials in Operations Research series will be available online to registrants of the 2014 INFORMS Annual Meeting on November 1, 2014. It will be made available online to all 2015 INFORMS members on January 1, 2015. Access the 2014 TutORials at:
http://pubsonline.informs.org/series/educ
Register for the 2014 INFORMS Annual Meeting http://meetings.informs.org/sanfrancisco2014/register.html
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
45
figh tin g frau d
That, in turn, determines the kinds of data required and the kinds of analysis needed to find the insights lurking in the data. For anti-fraud efforts, the business can guide data needs by identifying: • Already-known fraud scenarios – this will provide an initial data set to begin monitoring. It will also provide a basis for monitoring algorithms. • Building up sensitivities to unknown scenarios – while, of course, unknown risks are by definition unknown, companies can identify areas where the effect of fraud would be especially negative, such as an increase in product prices paid by certain customers, which may indicate procurement kickbacks or provide funding for covering up other undesirable behaviors, such as bribing government officials to obtain government contracts, permits and licensing or to overlook illegal or non-compliant activities. Data analytics can help in monitoring these scenarios once desired business processes are defined and reporting dashboards are developed. Sophisticated Analytics Analytics, in this environment, does not mean just a spreadsheet. It means such things as advanced methods of pattern identification, to be designed and operated by experienced analytics and fraud professionals. Pattern recognition 46
|
a n a ly t i c s - m a g a z i n e . o r g
is a science of its own, but it is hardly new. For instance, the Fibonacci sequence was made famous by Italian mathematician Leonardo Bonacci, aka “Fibonacci,” in his 1202 book, “Liber Abaci.” Advanced practitioners today are using pattern recognition methods to establish relationships in fields as diverse as baseball and healthcare. Analytics have even reached the level of sophistication to create original works of art. Dave Cope, a musician and computer scientist, has developed a program called “Emily Howell” that can create original works of music seen by many critics as being on par with that of the world’s greatest musicians. When tackled by experienced professionals, these efforts should deliver: • Accurate insights – the “confusion matrix” is a standard tool to measure accuracy. It is used to identify type 1 (I said you were a safe transaction, but you were actually fraudulent) and type 2 (I said you were fraudulent, but you were actually a safe transaction) errors. The best monitoring provides a balance of missing only a few bad scenarios, but not calling too many scenarios into question. • Timely insights – as required by the nature of the business. • Simple access – delivery of fraud warnings in clear, easy-to-understand language and processes. w w w. i n f o r m s . o r g
Conclusion With ever-increasing fraud instances and always more complex fraud scenarios, proactive monitoring for bad actors and bad scenarios is emerging as required capabilities for companies around the world. No company can afford the direct (monetary) and indirect (customer perception) losses associated with fraud incidents. It can take many years for companies to recover from these situations, and companies
that are not taking the proper precautions face increasingly stiffer penalties. Although proactive solutions are available, and include the use of big data and analytics, they are not simple and require expert guidance. ❙ Drew Carter (dcarter@alixpartners.com) is an applied analytics expert and Stephanie Anderson (sanderson@alixpartners.com) is an expert in fraud compliance and forensic accounting. They are managing directors of AlixPartners, LLP, (www. alixpartners.com), a global business advisory firm and an industry leader of proactive monitoring for compliance and fraud.
ADVERTISE WITH INFORMS INFORMS, The Institute for Operations Research and the Management Sciences, serves the scientific and professional needs of OR/MS educators, scientists, researchers, managers, and students, as well as the institutions they serve. INFORMS publishes 13 scholarly journals, including two electronic-only journals, covering excellent the latest OR/MS methods and applications. INFORMS also organizes professional conferences and application-oriented meetings that provide sponsorship opportunities. INFORMS Advertising Helps You Reach • Consultants to Help You Solve Business Problems • Individual Purchasers of OR/MS & Related Products & Services • Institutional Product & Services Purchasing Decision Makers • Analytics Professionals & Executives
• • • •
National & International Meeting Attendees OR/MS Job Seekers Specialized Software Developers Students Seeking Summer Employment
INFORMS Audience INFORMS audience get the attention of operations researchers who work for corporations, consulting groups, the military, the government, and health care, as well as academics who teach OR/MS, analytics, and the quantitative sciences in engineering and business schools. INFORMS Subscriber Profile Our subscribers are interested in a variety of OR/MS topics and most have purchasing power within their organizations. They are professionals, residing in nearly every country around the globe, found in every sector of business and industry, both public and private. Subscribers rely heavily on INFORMS journals to keep them up-to-date on the most recent OR/MS research and industry developments. Target this specialized audience of OR/MS professionals by placing your print ad or banner ad in INFORMS publications.
Click to view rates: www.informs.org/advertising
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
47
onl in e ban k i ng c u stome rs
Real-time fraud detection in the cloud By Saurabh Tandon
T
his article explores how to detect fraud among online banking customers in near rea time by running a combination of learning algorithms on a data set that includes customer transactions and demographic data. The article also explores how the “cloud environment� can be used to deploy these fraud detection algorithms in short order to meet computational demands at a fraction of the cost it otherwise takes in setting up traditional data centers, and acquiring and codifying new hardware and networks. 48
|
a n a ly t i c s - m a g a z i n e . o r g
Real-time decision-making is becoming increasingly valuable with the advancement of data collection and analytical techniques. Due to the increase in data processing speeds, the classical data warehousing model is moving toward a real-time model. Cloud-based platforms enable the rapid development and deployment of applications, thereby reducing the lag between data acquisition and actionable insight. Put differently, the creation-to-consumption cycle is becoming shorter, which enables corporations to experiment and iterate with their business w w w. i n f o r m s . o r g
hypothesis much more quickly. Some examples of such applications include: • A product company getting realtime feedback for its new releases using data from social media in real-time, postproduct launch. • Real-time recommendations for food and entertainment based on a customer’s location. • Traffic signal operations based on real-time information of traffic volumes. • E-commerce websites and credit firms detecting customer transactions being authentic or fraudulent in real time. • Providing more targeted coupons based on customers recent purchases and location. From a technology architecture perspective, a cloud-based ecosystem can enable users to build an application that detects, in real time, fraudulent customers based on their demographic information and prior financial history. Multiple algorithms help detect fraud, and the output is aggregated to improve prediction accuracy. But Why Use the Cloud? A system that allows the development of applications capable of churning out results in real-time needs multiple services running in tandem and is highly resource intensive. By deploying the system in the a na l y t i c s
cloud, maintenance and load balancing of the system can be handled efficiently and cost effectively. In fact, most cloud systems function as ”pay as you go” and only charge the user for actual usage vs. maintenance and monitoring costs. “Intelligent” cloud systems also provide recommendations to users to dial up/down resources available to run the fraud detection algorithms without worrying about the data-engineering layer. Since multiple algorithms are run on the same data to enable fraud detection, a real-time agent paradigm is needed to run the algorithms. An agent is an autonomous entity that may expect inputs and send outputs after performing a set of instructions. In a real-time system, these agents are wired together with directed connections to form an agency. An agent typically has two behaviors: cyclic and triggered. Cyclic agents, as the name suggests, run continuously in a loop and do not need any input. These are usually the first agents in an agency and are used for streaming data to the agency by connecting to an external real-time data source. In short their tasks are “well-defined and repetitive.” A triggered agent, on the other hand, runs every time it receives a message from a cyclic agent or another triggered agent. The “message” defines the function that the triggered agent needs to n o v e m b e r / d e c e m b e r 2 014
|
49
real - t ime frau d de t e c t i o n
perform. To synthesize, these agents allow multiple tasks to be handled in parallel to enable faster data processing. The above approach combines the strengths and synergies of both cloud computing and machine learning algorithms, providing a small company or even a startup that is unlikely to have specialized staff and necessary infrastructure for what is a computationally intensive approach, the ability to build a system that make decisions based on historical transactions. Creating the Analytical Data Set For the specific use case of fraud detection for financial transactions, consider the following work that Mu Sigma did with a client in the financial services industry. The data set used to build the application was comprised of various customer demographic variables and financial information, such as age, residential address, office address, income type, prior shopping history, income, bankruptcy filing status, etc. What’s predicted is a binary variable (whether the transaction is fraudulent or not). In all, about 250 unique variables pertaining to the demographic and financial history of the customers were considered. To reduce the number of variables for modeling, techniques such as Random Forest was implemented to understand the significance of variables and their 50
|
a n a ly t i c s - m a g a z i n e . o r g
relative importance. A cutoff was used to select a subset of this variable list that could be used to test a financial transaction as fraudulent or not with an acceptable level of accuracy. Algorithms to Detect Fraud The analytical data set defined above is analyzed via a combination of techniques such as logistic regression (LR), self-organizing maps (SOM) and support vector machines (SVM). Perhaps the most easily understood of these is the logistic regression, which assigns a probabilistic score to each financial transaction for its likelihood to being fraudulent. It does so as a function of the variables defined above as important predictors of fraud. SOMs are fascinating, unsupervised learning algorithms that look for patterns across transactions and then “self-organize� these transactions into fraudulent and not so fraudulent segments. As the volume of transactions increases, so does the accuracy in the self-organization of these transactions. Compared to SOMs, SVMs are supervised learning techniques generally used for classifying data, for example, a training data set that includes verified fraudulent and non-fraudulent transactions. The intersection between LR, SOMs and SVMs ensures that the past is studied, the present analyzed in real time and learnings from both are fed back into the w w w. i n f o r m s . o r g
ANALYTICS MATURITY MODEL A superior way to measure your analytics maturity from a trusted professional association
https://analyticsmaturity.informs.org · Assess the way your organization uses analytics · Create an improvement plan · Track your progress · Tap the resources of the leading professional association for advanced analytics professionals
THE INFORMS DIFFERENCE · Reputable · International Leader · Free Access to INFORMS Analytics Maturity Model · Resilient Model That Grows to Meet Your Needs Begin using the INFORMS Analytics Maturity Model.
Just go to https://analyticsmaturity.informs.org. Start today! For additional information, email informs@informs.org or call 433-757-3500, ext. 560
real - t ime frau d de t e c t i o n
82.00% 80.00%
Figure 1: Accuracy of fraud detection.
Accuracy
78.00% 76.00% 74.00% 72.00% 70.00%
10:1
5:1
4:1
Ratio of (no. 0):(no. of 1)
fraud detection framework to make it better and more accurate over time. Results and Model Validation In this case, the models were trained on 70 percent of the transaction data, with the remainder streamed to the agency framework discussed above to simulate real-time financial transactions. Under-sampling on the modeling dataset was done to bring the ratio of number of non-fraudulent transactions to 10:1 (original was 20:1). The final output of the agency is the classification of the streaming input transactions as fraudulent or not. Since the value for the variable being predicted is already known for this data, it helps us gauge the accuracy of the aggregated model as shown in Figure 1. Conclusion Fraud detection can be improved by running an ensemble of algorithms in 52
|
a n a ly t i c s - m a g a z i n e . o r g
parallel and aggregating the predictions in real time. This entire end-to-end application can be designed and deployed in days depending on complexity of data, variables to be considered and algorithmic sophistication desired. Deploying this in the cloud makes it horizontally scalable, owing to effective load balancing and hardware maintenance. It also provides higher data security and makes the system fault tolerant by making processes mobile. This combination of a real-time application development system and cloud-based computing enables even non-technical teams to rapidly deploy applications. � Saurabh Tandon is a senior manager with Mu Sigma (http://www.mu-sigma.com/). He has over a decade of experience working in analytics across various domains including banking, financial services and insurance. Tandon holds an MBA in strategy and finance from the Kellogg School of Management and a master’s in quantitative finance from the Stuart Graduate School of Business.
w w w. i n f o r m s . o r g
Catch the Wave: Real-World Analytics Solutions INFORMS CONFERENCE ON
USINESS ANALYTICS & PERATIONS RESEARCH
Huntington Beach APRIL 12-14, 2015
LEARN HOW ANALYTICS AND O.R. CAN MAXIMIZE THE VALUE OF YOUR DATA TO DRIVE BETTER BUSINESS DECISIONS. √ Real-world use of descriptive, predictive, and prescriptive analytics √ Focus on big data, marketing analytics, healthcare applications √ Most rigorous and real world analytics conference offered √ Administration of Certified Analytics Professional (CAP®) exam √ Not just what to do but HOW TO DO IT Only INFORMS can provide an analytics and O.R. conference backed up by the best minds in industry and academia. Hand-picked speakers take you through case studies on how analytics can maximize the value of your data, driving better business decisions, and impacting the bottom line.
PRESENT A TALK OR POSTER AND SAVE
Present your work at the conference known for real-world analytics and save on registration. Case studies, best practice examples, and academic research with a practitioner orientation are all welcome.
SUBMISSION DEADLINES ORAL PRESENTATIONS December 19, 2014
POSTER PRESENTATIONS January 16, 2015
http://meetings.informs.org/analytics2015
Cor porate Profi le
Analytics Operations Engineering, Inc. Consulting firm applies advanced quantitative methods to solve challenging operations problems. By Mitchell Burman and Lauren Berk
A
s you exit an eighth-floor elevator, Boston’s Financial District transforms into what feels like a university research lab: offices adorned with academic textbooks surround young employees receiving scholarly advice from senior mentors. Their efforts to crack the code of a new algorithm resemble the graduate research process. This is neither a university nor research lab, however. Analytics Operations Engineering (AOE) is a consulting firm passionate about bridging the gap between Ph.D.-level theory and the practical application of advanced analytics. While other firms are just beginning to 54
|
a n a ly t i c s - m a g a z i n e . o r g
realize the importance of data visualization in what they call analytics, AOE has been applying optimization, control theory, stochastic analysis, simulation and data-mining models to business challenges in operations, logistics, and marketing for more than 20 years. AOE’s projects focus on reducing working capital, improving the customer experience and increasing profits for clients. Its consultants work on assignments that range from providing services to direct clients, to supporting projects at large management-consulting firms, to portfolio work for multiple private equity groups in their efforts to cut costs w w w. i n f o r m s . o r g
Members of the AOE Consultant and Analyst team, including CEO Tom Svrcek (front row, fourth from right), founder Mitchell Burman (back row, eighth from right) and co-founder Jim Schor (front row, second from right). and improve the balance sheets at their portfolio companies. By conducting sophisticated analyses at the most detailed level, AOE leverages big data to identify profitable improvements with the highest confidence and precision. In addition to delivering insights and analyses, AOE often produces custom-made software solutions for clients to use in their strategic, tactical and operational decision-making processes. Company History Mitchell Burman and James Schor founded AOE in 1994 while they were graduate students under the guidance of
a na l y t i c s
Dr. Stanley Gershwin at MIT, working together on real-time scheduling projects for manufacturers including Johnson & Johnson, Boeing and Hewlett-Packard. These projects included building a capacity-evaluation model that ran in seconds instead of months for the HP inkjet-printer division. This allowed Burman and his HP associates to identify bottlenecks that were disrupting production and to propose changes to sharply improve production with little added cost. With the demand for printers exploding, their solution resulted in $280 million in increased sales, and the project was a finalist for the 1997 Edelman Prize. This n o v e m b e r / d e c e m b e r 2 014
|
55
Cor porate Profi le
AOE has built up its expertise in big data by generating value in areas such as customer segmentation and retention, promotion optimization, cross-selling, pricing strategy and revenue management.
experience led Burman, Schor and Gershwin to create AOE. By selecting top consultants with diverse academic and business backgrounds while rapidly building its clientele, AOE became a company whose achievements led to it being named to Inc. Magazine’s list of America’s fastest growing companies in 2002. While starting with an emphasis on improving supply chains, leaning production systems and solving routing problems, the firm eventually expanded its focus to other business issues that could be addressed with rigorous quantitative methods. Unlike boutique firms that specialize in specific verticals, AOE consultants apply their techniques to a wide variety of problems and industries. In recent years, AOE has built up its expertise in big data by generating value in areas such as customer segmentation and retention, promotion optimization, cross-selling, pricing strategy and revenue management. Regardless of the application, AOE’s projects remain focused on reducing working capital, improving the customer experience and increasing profits for its clients. Examples of Success Data mining for customer segmentation in retail. A big-box retail chain with terabytes of customer-purchase data wanted to know whether its marketing was helping the company’s profitability. AOE consultants spent six weeks analyzing three years’ worth of customer data and found that regular discounts were needlessly shrinking the retailer’s profit margins on items their customers would have purchased anyway.
56
|
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
BE A PART OF THE ANALYTICS REVOLUTION!
Join the World of Certified Analytics Professionals (CAP®)
®
Top 5 Reasons to Become Certified 1. Certification demonstrates commitment to a profession. Earning a CAP
credential shows your peers, supervisors, clients, and the general public your commitment to analytics and your ability to perform to accepted standards. ®
2. Certification enhances the profession. CAP
exists to grow, promote, and develop analytics professionals and ensures the public that analytics is a true, time-honored profession. ®
3. Certification reflects personal achievement. Earning your CAP
®
and will continuously improve your performance in analytics.
shows the world you will not stop learning
4. Certification establishes professional credentials. CAP
recognizes an individual's accomplishment and stands out on an individual's resume or a company's prospectus. ®
5. Certification improves career opportunities and advancement. CAP
is a clear identifier of those who seek knowledge of changes in the work, technology, business practices, and innovation. ®
6. BONUS REASON: Studies show that those who hold a professional certification like
CAP® earn more over their career than peers who are not certified!
Set Yourself Apart. Join the Analytics Revolution. Become a CAP® Now. Visit http://www.informs.org/certification or email us at certification@informs.org with any questions.
Cor porate Profi le
The simulations showed the impact of new orders on the deliveries of existing orders. The tool, now regularly used by the factory managers, has boosted throughput while improving customer satisfaction.
AOE consultants segmented the customers, defining 12 different behavioral patterns. By looking at customers over time, they also were able to determine that certain types of customers who started with one category of product were most likely to buy a predictable second category of product. AOE proposed that the retailer give customers coupons for specific product categories that they didn’t usually buy. By crafting attractive discounts for new purchases in “next best product� categories, they preserved margin on the products the customer was already buying and increased revenues on products the customer may not have bought otherwise. AOE managed the pilot program and demonstrated that margins were improved by 8 percent, resulting in an overall increase in profit. Custom-made tools for managing manufacturing operations. A semiconductor firm was unable to predict the impact of accepting new orders at a major fabrication plant. It was unclear whether filling a new order would cause bottlenecks at certain machines and delay shipments for other highpriority customers. AOE consultants developed a customized simulation tool for the plant that quickly ran through thousands of production simulations, including set-up times, run times, stochastic downtimes, even strategic options such as buying new machines. The simulations showed the impact of new orders on the deliveries of existing orders. The tool, now regularly used by the factory managers, has boosted throughput while improving customer satisfaction.
58
|
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
Is the largest association for analytics in the center of your professional network? It should be.
▪ Certification for Analytics Professionals ▪ A FREE Community Membership
to join online visit http://join.informs.org
▪ Continuing Ed courses for Analytics Professionals ▪ Online access to the latest in operations research and advanced analytics techniques ▪ Unsurpassed Networking Opportunities available in INFORMS Communities and at Meetings ▪ Subscriptions to online and print INFORMS Publications ▪ INFORMS Career Center - the industry’s leading job board
JOIN
Cor porate Profi le
Tools for pricing. When a privateequity firm bought a national retail apparel chain with hundreds of mall-based stores, it wanted to determine the optimal discounting strategy to move unsold merchandise during their fashion season. The company had ample data on chain-wide sales of each item, and set new 10 percent or 20 percent discounts every four weeks using a standard industry software package. AOE consultants modeled the retailer’s sales over time, adjusting for special factors such as holiday/seasonal promotions. The model showed sharply varying patterns for different items. For example, branded T-shirts were highly responsive to price cuts, while socks showed little sensitivity. AOE used the model to study two years of individual store transactions around the country. It became clear that price cuts in tiny suburbs of New York and Washington had a much lower impact than they did in Midwest malls. Based on AOE’s analysis, the chain decided to run smaller, 5 percent markdowns in stores where sensitivity was low and other strategies in areas with greater sensitivities. AOE built a priceoptimization model that sets strategy for individual stores and product categories, based on price elasticity and store-specific inventories. The model 60
|
a n a ly t i c s - m a g a z i n e . o r g
and corresponding software allow the company to increase its gross margins by 3 percent to 4 percent compared to the methods they were using previously. Analytics Operations Engineering Today AOE has recently promoted Tom Svrcek to the leadership role of CEO. It continues to grow, hiring consultants and analysts directly from top academic programs, as well as individuals with strong academic backgrounds and extensive professional experience; most have Ph.D.s in mathematics or operations research and have taught at prestigious institutions including MIT, The Wharton School and Columbia University. They have written multiple books and articles on supply chain, data mining and systems engineering and have several times been finalists for a number of awards in O.R., including the Edelman Prize and the Wagner Prize. In 2009, AOE began an analyst-cultivation program by recruiting graduates with bachelor’s degrees in mathematics and related fields from schools such as Harvard, MIT, Princeton and Yale. AOE believes part of its mission should be to attract young talent to the field, and to demonstrate that there are interesting and rewarding careers outside of Wall Street and traditional consulting. After w w w. i n f o r m s . o r g
two or three years with the firm, analysts are encouraged to further their education or careers in industry and consulting. Because of the training and experience they received at AOE, the first few graduating classes of analysts have found opportunities in companies including Google, Facebook, Rue La La and Whole Foods, and have been accepted in graduate programs in operations research at U.C. Berkeley, Carnegie Mellon, MIT and NYU. Dimitris Bertsimas, program director MIT’s Operations Research Center (ORC), said, “We have admitted two AOE analysts this year to the ORC’s Ph.D. program. Given their strong academic credentials and practical modeling experience, we expect them to thrive in our program.” Within the last year, AOE has also created its own internal incubation laboratory to encourage its people to invest in more broadly commercializing the company’s two decades’ worth of existing intellectual property. Using internal resources, consultants with entrepreneurial drive can build commercial models and tools that leverage the firm’s existing IP and receive additional compensation if the concept is a commercial success. The lab benefits consultants by providing an additional outlet for their intellectual curiosity – with the potential a na l y t i c s
upside of additional income – while maintaining the security, stability and intellectual challenge of its core business. Future As it celebrates its 20th anniversary, AOE continues to thrive while applying the most sophisticated analytic techniques to help businesses improve both their top and bottom lines. In the era of big data, its consultants produce analyses that move far beyond the techniques on which operations research was grounded. Customers have always sought costcutting techniques, but they increasingly seek analytic solutions to aid decisionmaking. Consultants at AOE combine business knowledge, communication skills and unparalleled technical leadership to quickly deliver tools clients need to succeed in the 21st century. ❙ Mitchell Burman founded Analytics Operations Engineering in 1994 and serves as its president, providing the strategic direction of the company. He received a Ph.D. and a master’s degree in operations from the Massachusetts Institute of Technology. Lauren Berk is an analyst at AOE and holds a bachelor’s degree in mathematics from Yale University. Both are members of INFORMS.
Help Promote Analytics Magazine It’s fast and it’s easy! Visit: http://analytics.informs.org/button.html
n o v e m b e r / d e c e m b e r 2 014
|
61
Softwar e S u r ve y
Decision analysis Decision tools continue to evolve, providing analysts with more horsepower to transform increasingly vast amounts of information into decision advantage.
By William M. Patchak s the U.S. Men’s National Soccer Team’s dreams for World Cup victory ended this summer in a close loss to Belgium, pundits debated what areas of improvement were most needed for the team’s future success. “They need more possession time with the ball,” one stated. “Yes, but it has to be possession with purpose,” another countered. In fact, this statement probably rings true for many of today’s decision professionals. After all, the world is full of information. An individual’s access to information continues to increase beyond what few could have ever imagined. Already, there
A
62
|
a n a ly t i c s - m a g a z i n e . o r g
is great anticipation for wearable digital devices capable of streaming it directly to our wrists or eyeglasses. At the same time, headlines read of fierce debates over the benefits vs. privacy risks of companies collecting it on their customers. Yet for all of the attention paid to devices and methods for acquiring “big data,” so little focus is placed on the most important challenge – the seam between data and the decisions it can enable. How does one go beyond simply “possessing” the data? How does one leverage the vast and varied amount available to provide meaningful decision support? In the last decision analysis survey in 2012, w w w. i n f o r m s . o r g
I touched on how decision analysts have a wide range of software tools available to support their mission “[to evaluate] complex alternatives in the light of uncertainty, value preferences and risk preferences,” as Dennis Buede defined it in the first survey 21 years ago. [1] This year’s list of decision analysis software packages reinforces how decision tools continue to evolve and provide analysts with more and more horsepower to transform the increasingly vast amounts of available information into decision advantage. The Survey In terms of its approach and collection methods, this year’s survey did not stray from previous iterations. An online questionnaire was provided to vendors based a na l y t i c s
Survey directory & data To view the directory of decision analysis software vendors, software products and survey results, click here.
on previous participation or the staff’s knowledge of a new product. Vendors who did not receive the original questionnaire can still provide their software’s information to the survey results online by contacting Patton McGinley (patton@ lionhrtpub.com). Just as in 2012 and previous years, this publication provides the vendor responses verbatim and does not intend for the results to imply quality or cost effectiveness. Rather, the list serves to raise awareness of the variety of tools available. n o v e m b e r / d e c e m b e r 2 014
|
63
de cis io n a na lys i s
In all, this year’s survey features 38 software packages from a total of 21 vendors, with some vendors listing multiple tools or multiple versions of the same tool. Eleven vendors from 2012 did not participate this year, but seven new vendors have joined the response list. And while some software packages are listed for the first time, many from the 2012 survey have returned, albeit with some new features. 2014 Results As was the case with previous editions of the survey, this year’s results (see sidebar) reflect a diverse group of vendors and prices. Along with the United States, companies from the United Kingdom, Sweden, Belgium, Finland and Canada are represented. Meanwhile, prices for the software packages range from under $20 to several thousand dollars, depending on the type of license and the nature of the package. And as in previous survey editions, use examples range throughout commercial and government industries to include energy, finance, healthcare and defense. Focusing on updated features from 2012, many vendors report improvements to user interfaces in addition to new technical functions such as additional probability distributions and 64
|
a n a ly t i c s - m a g a z i n e . o r g
interfaces with Microsoft products (e.g., Excel). Regarding a topic highlighted in the 2012 introductory article, this year’s list features six new Web implementations: three as new features from returning packages and three from packages submitting to the survey for the first time. While not all software tools will (or should) offer Web implementations, the change is worth noting because it may indicate the presence of a trend likely to continue in the future. In both 2010 and 2012, the topics of “built-in coaching” and classroom vs. online training were discussed. This year’s proportion of software packages offering online training increased by 10 percent (from 45 percent to 55 percent) with a corresponding 13 percent decrease in classroom training. While some of this change is due to certain vendors not returning from 2012, several packages now claim to offer online training for the first time. Indeed, the decision analysis community may soon find the will and capability to provide what Don Buckshaw wondered would be possible in his analysis of the 2010 survey: builtin coaching that allows “a novice [to be] confident that their models are producing sensible results.” [2] Beyond such noticeable swings as training options, the small number of entries and changing group of respondents w w w. i n f o r m s . o r g
over the years make it impossible to perform any kind of reliable statistical analysis on the data set. More importantly, to attempt such would be bad science in a survey and article highlighting the need for good science. However, certain high-level observations can be made of this year’s respondent group compared to their predecessors. While the total number of software packages did decrease, there were several features that now make up a larger percentage of the
respondent pool than before. They include multiple stakeholder collaboration (71 percent in 2014), risk preference (66 percent in 2014) and selecting a best option using multiple competing objectives (89 percent in 2014). These three attributes are worth flagging because, being associated largely with the “soft skills� of eliciting and prioritizing objectives and risk preferences, they can be overlooked in a world where data itself is often viewed as the end solution.
Abstract Submission is Now Open!
June 14 - June 17, 2015 Sheraton Montreal
http://meetings2.informs.org/montreal2015 a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
65
de cis io n a na lys i s
Bridging the Gap between Data and Decisions Indeed, these three decision support features – stakeholder group collaboration, risk preferences and multiple objective analysis – are just some of the techniques that help to bridge the gap between information (i.e., data) and informed decision-making. A quick survey of recent literature indicates that the need to do so is readily apparent. According to research by the Economist Intelligence Unit (EIU) and PricewaterhouseCoopers (PWC), “experience and intuition, and data and analysis, are not mutually exclusive. The challenge for business is how best to marry the two…even the largest data set cannot be relied upon to make an effective big decision without human involvement.” [3] The same study also found that executives were skeptical of how data and analytics can assist big decisions, especially with regard to emerging markets. In fact, these “big decisions” (i.e., more strategic level problems) are where decision-makers themselves are often unclear of their risk preferences and where data insights alone may not lead to a clear choice of alternatives for meeting their objectives. As opposed to operational level decisions that can be informed more directly by descriptive types of data analytics, these strategic 66
|
a n a ly t i c s - m a g a z i n e . o r g
problems often require decision professionals to meet their customers halfway between the data and the decision. They require an analyst not only to accurately interpret the data available, but to also demonstrate how it can illuminate a customer’s understanding of their own preferences and objectives, which until that point were not readily apparent. So what is the decision analysis community to do in challenging environments of strategic decision-making? This survey’s list of software products provides a great starting point. Ultimately, however, software cannot do it alone – decision professionals bear the ultimate responsibility. As strategic consultant Dhiraj Rajaram explained in his October 2013 article in Analytics magazine: “Leveraging data effectively to enable better decisions requires more than just data sciences... In the real world, however, not all business problems are clearly defined. Many of these problems start off muddy. To help solve them, one needs to understand and appreciate the business context. It requires an interdisciplinary approach consisting of several different skills: business, applied math, technology and behavioral sciences.” [4]
w w w. i n f o r m s . o r g
Rajaram outlines a scope of skills that decision professionals must rely upon in knowing how to utilize available software tools, knowledge of a customer’s industry and preferences, and interaction with the customers themselves. Bringing this vast and varied array of attributes to the problem, analysts can provide true added-value in helping customers find new ways to leverage available data. Until Next Time Decision-makers today continue to face a range of decision types: from operational to strategic, from evidence-driven to those that require the marriage of evidence with possibilities-based analysis. As in previous years, the software packages listed in the 2014 survey largely reflect this range in uses and offer decision professionals a true spectrum of toolsets to provide their clients with decision advantage. In both those occasions where the analysis of data itself can lead to decision insight and those where input from decision-makers themselves must play a role, software continues to evolve to meet the needs of decision professionals. Already there are indications that data-driven decision support has made inroads, albeit slowly. According to Harvard Business Review in December 2013, “those that consistently use data to guide their decision making are few and far a na l y t i c s
between. The exceptions, companies… [with] a culture of evidence-based decision making.” [5] The challenge remains for decision professionals to expand the culture of evidence-based decision-making to more strategic applications, and to help bring in the mind-sets and preferences of key decision-makers to meet the data. Where hard problems persist, this year’s list of software packages provides a sample of tools available to help decision professionals bridge that continuing gap between data and insightful decisions. William M. Patchak (bill@analyticsolutionsgroup. net) is an analyst with Analytic Solutions Group, a management consulting firm specializing in data analytics and visualization, systems architecture and systems engineering, decision analysis and operations research, and modeling and simulation. He is a member of INFORMS. REFERENCES 1. Buede, Dennis, “Decision Analysis Software: Aiding the Development of Insight,” OR/MS Today, April 1993. 2. Buckshaw, Don, “Decision Analysis Software Survey,” OR/MS Today, October 2010. 3. Economist Intelligence Unit and PricewaterhouseCoopers, “Gut & gigabytes: Capitalising on the art & science in decisionmaking,” PricewaterhouseCoopers, September 2014. 4. Rajaram, Dhiraj, “Why some data scientists should really be called decision scientists,” Analytics magazine, October 2013. 5. Ross, Jeanne W., Cynthia M. Beath, and Anne Quaadgras, “You May Not Need Big Data After All,” Harvard Business Review, December 2013.
n o v e m b e r / d e c e m b e r 2 014
|
67
co n f er e n c e p r e v i e w
WSC 2014: Exploring big data through simulation WSC has been the premier international forum for disseminating recent advances in the field of system simulation for more than 40 years.
By Stephen J. Buckley
68
|
The Winter Simulation Conference (WSC) has been the premier international forum for disseminating recent advances in the field of system simulation for more than 40 years, with the principal focus being discrete-event simulation and combined discrete-continuous simulation. In addition to a technical program of unsurpassed scope and high quality, WSC provides the central meeting place for simulation researchers, practitioners and vendors working in all disciplines and in industrial, governmental, military, service and academic sectors. WSC 2014 will be held Dec. 7-10 in Savannah, Ga., at the Westin Savannah Harbor Golf Resort & Spa and the adjacent Savannah International Trade & Convention Center. The appeal of simulation is its relevance to a diverse range of interests. WSC has always reflected this diversity, and WSC 2014 aligns with and expands upon this tradition. For those more inclined to the academic aspects of simulation, the conference offers tracks in modeling methodology, analysis methodology, simulation-based optimization, hybrid simulation and agent-based simulation. For those more inclined to the application of simulation, tracks include healthcare, manufacturing, logistics and supply chain
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
management, military applications, business process modeling, project management and construction, homeland security and emergency response, environmental and sustainability applications, and networks and communications. The Modeling and Analysis of Semiconductor Manufacturing (MASM) is a conference-within-a-conference featuring a series of sessions focused on the semiconductor field. The Industrial Case Studies track affords industrial practitioners the opportunity to present their best practices to the simulation community. The Simulation Education track presents approaches to teaching simulation at education levels ranging from K-12 to graduate and professional workforce levels. Finally, WSC provides a comprehensive suite of introductory and advanced tutorials presented by prominent individuals in the field, along with a lively poster session, Ph.D. colloquium, a new attendee orientation and a distinguished speaker lunchtime program. The theme for WSC 2014, “Exploring Big Data Through Simulation,” is timely and relevant. The explosion of data throughout the world has created both opportunities and challenges to business and technical communities. In this conference, presenters will discuss how simulation can help. In addition to special tracks on big data simulation a na l y t i c s
and decision-making and scientific applications, conference keynote speaker Robert Roser, head of scientific computing at Fermi National Accelerator Laboratory in Batavia, Ill., and one of the world’s leading experts on experimental particle physics, will speak about the recently discovered Higgs Boson particle and the role of simulation in the discovery. The military keynote speaker is Greg Tackett, director of the Ballistic Missile Defense Evaluation Directorate (BMDED) and the Ballistic Missile Defense System Operational Test Agency (BMDS OTA), U.S. Army Test and Evaluation Command, Redstone Arsenal, Alabama. The WSC is designed for professionals at all levels of experience across broad ranges of interest. The extensive cadre of exhibitors and vendor presentations, the meetings of various professional societies and user groups, along with the various social gatherings, give all attendees the opportunity to get acquainted with each other and to become involved in the ever-expanding activities of the international simulation community. ❙ For more, click here. Stephen J. Buckley is a research staff member at the IBM Thomas J. Watson Research Center in Yorktown Heights, N.Y., and general chair of WSC 2014. He is a member of the INFORMS Simulation Society.
n o v e m b e r / d e c e m b e r 2 014
|
69
Five- M in u t e A n a lyst
Bicycle counters Bicycle counter in Rosslyn, Va., along the Arlington Loop. This sign is one of several bike counters installed by Arlington to monitor bicycle traffic.
By Harrison Schramm, CAP 70
|
As longtime readers and friends know, I like bicycles almost as much as I like analysis, and I frequently think about both on a long ride. I cycle for fitness, fun and as transportation – I have been a bicycle commuter for almost 15 years now. For these reasons, I was overjoyed to discover that the city of Arlington, Va., has installed pedestrian / bicycle counters along the ‘Arlington Loop’ [1] and, even better, the data is freely available on the Internet [2]. The good folks at Bike Arlington have already done some very nice analyses of the data on their website, and it is really cool that they are using trail usage data to determine how to invest in future trails. The Arlington data set is particularly nice because it includes daily weather information in the same portal. As with most analytic tasks, the hard part is not the actual analysis itself per se, but rather the import and cleaning of data. I used MS Excel 2013 to pull the data from the Web via XML, and with minimal cleaning, the
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
Figure 1: Daily bicycle transits by date. This graph shows a nice trend of more cycling in the summer months and less cycling in the winter. There are two outliers: June 1 and Sept. 8.
data was ready for analysis. The 2012 year data was “cleaner” than the 2013 data, so that is what is used here. Unlike many data sets, the historic weather information is also included. While this doesn’t sound like a big deal, it makes analysis much easier. It is natural to ask if the weather, as measured by daily average outside air temperature, has an effect on cyclists.
We can also use this data to think about trail utilization during the week as opposed to the weekend. This is interesting because in major cities, bicycle trails are not just for recreation but are also used by a large number of commuters for work. Here, the “WEEKDAY()” function in Excel was handy to identify the weekdays vs. weekends. We have chosen to compare the behavior of cyclists
Figure 2: Daily bicycle transits as a function of temperature. Each rise in daily average temperature of 1 degree Fahrenheit translates to approximately 10 additional riders (Regression p-value = 0).
a na l y t i c s
n o v e m b e r / d e c e m b e r 2 014
|
71
0 500 1000 1500
Five- M in u t e A n a lyst
Winter Weekday
Winter Weekend
Summer Weekday
Summer Weekend
Figure 3: Boxplots of winter and summer behaviors, weekday vs. weekend. Winter days have, on average, fewer transits on weekends than weekdays. In the summer, this trend is reversed. The winter weekday and weekend behavior is similar (p = .66), but one could argue that the summer weekend vs. weekday behavior is different (p = .056).
during “winter” (January/February) and “summer” (June/July). Figure 3 may suggest that commuters are the major contributors to trail usage in the winter, and “sport” riders are the contributors in the summer months. Conversely, it may be that the winter riders have made the investment in proper winter “kit” because they have to, and use the same kit on the weekends to ride. In conclusion, this is a rich data set, and our analysis here has just scraped the surface, and we hope that some of you will take an interest in it as well. A note on software: Longtime readers will note that I sometimes use Excel, 72
|
a n a ly t i c s - m a g a z i n e . o r g
sometimes use R and sometimes use both in the same article. While both have their strengths, I found parsing XML data to be easier in Excel, and Boxplots to be easier to build in R. ❙ Harrison Schramm (harrison.schramm@gmail. com) is an operations research professional in the Washington, D.C., area. He is a member of INFORMS and a Certified Analytics Professional (CAP). NOTES & REFERENCES 1. For a map see: http://www.bikearlington.com/ tasks/sites/bike/assets/File/Arlington-Loop.jpg. 2. See http://www.bikearlington.com/pages/ biking-in-arlington/counting-bikes-to-plan-forbikes/data-for-developers/. From here, you can create an XML query and pull the data into your favorite analysis package.
w w w. i n f o r m s . o r g
For information on classifieds, contact Patrick Yario p.yario@jobtarget.com
Classifieds Classifieds Classifieds Classifieds Department of Industrial and Systems Engineering University of Minnesota Faculty Opening The Department of Industrial and Systems Engineering at the University of Minnesota invites applications for a tenured or tenure-track faculty position starting in Fall 2015. Applicants at all ranks will be considered. We seek candidates with a strong methodological foundation in Operations Research and Industrial Engineering, and a demonstrated interest in applications including, but not limited to: business analytics, energy and the environment, healthcare and medical applications, transportation and logistics, supply chain management, financial engineering, service operations, quality and reliability. Applicants should also have a strong commitment to teaching, to mentoring graduate students, and to developing and maintaining an active program of sponsored research. Applicants must hold a Ph.D., or expect to complete their degree before Fall 2015, in Industrial Engineering, Operations Research, Operations Management or a closely related discipline. Senior applicants should have an outstanding track record of research and teaching accomplishments. The University of Minnesota is located in the heart of the vibrant Minneapolis-St. Paul metropolitan area, which is consistently rated as one of America’s best places to live and is home to many leading companies. The Department of Industrial and Systems Engineering is the newest department within the College of Science and Engineering at the University of Minnesota and is growing rapidly. Additional information about the department can be found at www.isye.umn.edu. Applicants are encouraged to apply by November 15, 2014. Review of applications will begin immediately and will continue until the position is filled. Applicants interested in meeting with current Industrial and Systems Engineering faculty members at the 2014 INFORMS Conference in San Francisco should apply by October 19, 2014. Additional information and application instructions can be found at www.isye.umn.edu. Candidates may contact the chair of the search committee at isyesrch@umn.edu. The University of Minnesota is an equal opportunity educator and employer.
THE UNIVERSITY OF MICHIGAN Department of Industrial and Operations Engineering Faculty Positions The Department of Industrial and Operations Engineering at the University of Michigan invites applications and nominations for faculty positions beginning September, 2015.
We seek outstanding candidates for faculty positions in all areas of operations research -- methodological and applied -- with a particular focus on (1) analytics, and (2) risk management. Individuals at all ranks are encouraged to apply.
Candidates must have a Ph.D. and must demonstrate a strong commitment to high-quality research and evidence of teaching potential. Candidates for Associate or Full Professor should have a commensurate record of research publications and are expected to provide organizational and research leadership, develop sources of external funding, build relationships with industry, and interact with faculty colleagues. Candidates should provide (i) a current C.V., (ii) a list of references, and one page summary statements describing: (iii) career teaching plans; (iv) research plans, and (v) course (teaching) evaluations for candidates with prior teaching experience. Candidates should have their references send recommendations to us directly at IOEFacultySearch@umich.edu. The deadline for ensuring full consideration of an application is October 31, 2014, but the positions will remain open and applications may still be considered, at the discretion of the hiring committee, until appointments are made. We seek candidates who will provide inspiration and leadership in research and actively contribute to teaching. We are especially interested in candidates who can contribute, through their research, teaching and/or service, to the diversity and excellence of the academic community. The University of Michigan is responsive to the needs of dual career families. Please submit your application to the following:
Web: http://ioe.engin.umich.edu/people/fac/fac_search/
If you have any questions regarding the web application submittal process or other inquiries, please contact, Gwendolyn Brown at gjbrown@umich.edu or (734) 763-1332. The University of Michigan is a non-discriminatory, af�irmative action employer.
Thin k in g a na ly t i ca lly
Figure 1: Four fighters enter the fray, only one can win. Who is it?
Fighters!
By John Toczek John Toczek is the senior director of Decision Support and Analytics for ARAMARK Corporation in the Global Operational Excellence group. He earned a bachelor of science degree in chemical engineering at Drexel University (1996) and a master’s degree in operations research from Virginia Commonwealth University (2005). He is a member of INFORMS.
74
|
Four different fighters are having an all-out battle to determine who among them is the strongest. Figure 1 shows the four fighters: Allan, Barry, Charles and Dan. Each fighter has varying attack and health abilities. At the start of the battle, they have differing health points: Allan has 10, Barry has 12, Charles has 16 and Dan has 18. Also, each fighter has differing attack points: Allan has four, Barry has three, Charles has two and Dan has one. The battle takes place over multiple rounds, each round consisting of a single attack. In each round, one random attacker and one random defender are chosen. When the attacker attacks a defender, the defender loses health points in the amount equivalent to the attacker’s attack points. For example, if Allan is the attacker and Barry is the defender, Barry would lose four health points. The fighters continue to randomly attack and defend in subsequent rounds until there is only one fighter left, who is then declared the winner. A fighter is removed from the battle when his life points become zero (or less). Question: Which fighter is most likely to win the battle? Send your answer to puzzlor@gmail.com by Jan. 15, 2015. The winner, chosen randomly from correct answers, will receive a $25 Amazon Gift Card. Past questions can be found at puzzlor.com. ❙
a n a ly t i c s - m a g a z i n e . o r g
w w w. i n f o r m s . o r g
GENERAL ALGEBRAIC MODELING SYSTEM
High-Level Modeling The General Algebraic Modeling System (GAMS) is a high-level modeling system for mathematical programming problems. GAMS is tailored for complex, large-scale modeling applications, and allows you to build large maintainable models that can be adapted quickly to new situations. Models are fully portable from one computer platform to another. GAMS Integrated Developer Environment for editing, debugging, solving models, and viewing data.
State-of-the-Art Solvers GAMS incorporates all major commercial and academic state-of-the-art solution technologies for a broad range of problem types.
Tommasino-Rao Input Output Balance Software (TRIOBAL) TRIOBAL is an easy-to-use tool for didactic purposes that combines Microsoft Excel and GAMS to implement an iterative procedure for supply and demand balancing (RAS method) introduced by Richard A. Stone. It was developed at the Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA) and teaches students how to work with the input-output matrix of a country. The tool was tested using Italian economic data and can be applied to data from other countries as well. The model and the Excel interface are included in the GAMS model library as part of every GAMS distribution (data utilities models, triobal). This open access to the model makes it easy to experiment with the
Š goodluz - Fotolia
application, and even to extend it (e.g. measuring economic activity through a Social Accounting Matrix).
prepare technical coefficient matrix at base year
prepare flows matrix at update year
prepare production vector at update year
calculate calculate u, v and A1, then operate RAS and store the results in an .xlsx file.
read from .xlsx file and creation of correspondent .gdx file
read from .xlsx file and creation of correspondent .gdx file
read from .xlsx file and creation of correspondent .gdx file
calculation by GAMS
output: coe.xlsx
output: flows.xlsx
output: production.xlsx
output: results.xlsx
For more information about this application please contact Marco.Rao@enea.it or Cristina.Tommasino@enea.itsino@enea.it
sales@gams.com
www.gams.com