Solver International - July 2017

Page 1

J U LY 2017 WWW.SOLVER-INTERNATIONAL.COM

TUTORIAL

CASE STUDY

TECHNOLOGY

Monte Carlo Simulation

Transportation

Analytics in the Cloud


solver... International

Educating and Empowering the Business Analyst

Solver-International.com Solver-International.com provides immediate information, exclusive articles, and updated news for businesses and academics. The Website also offers daily and weekly access for everyone seeking the latest tools to improve their analytic capabilities.


Welcome to Analytic Solver ® Cloud-based Data and Text Mining that Integrates with Excel

Everything in Predictive and Prescriptive Analytics Everywhere You Want, from Concept to Deployment. The Analytic Solver® suite makes powerful forecasting, data mining and text mining software available in your web browser (cloud-based software as a service), and in Microsoft Excel. And you can easily create models in our RASON® language for server, web and mobile apps.

Full-Power Data Mining and Predictive Analytics. It’s all point-and-click: Text mining, latent semantic analysis, feature selection, principal components and clustering; exponential smoothing and ARIMA for forecasting; multiple regression, logistic regression, k-nearest neighbors, discriminant analysis, naïve Bayes, and ensembles of trees and neural networks for prediction; and association rules for affinity analysis.

distributions, 50 statistics and risk measures, rankorder and copula correlation, distribution fitting, and charts and graphs. And it has full-power, point-and-click optimization, with large-scale linear and mixed-integer programming, nonlinear and simulation optimization, stochastic programming and robust optimization.

Find Out More, Start Your Free Trial Now. In your browser, in Excel, or in Visual Studio, Analytic Solver comes with everything you need: Wizards, Help, User Guides, 90 examples, even online training courses. Visit www.solver.com to learn more or ask questions, and visit analyticsolver.com to register and start a free trial – in the cloud, on your desktop, or both!

Simulation/Risk Analysis, Powerful Optimization. Analytic Solver is also a full-power, point-and-click tool for Monte Carlo simulation and risk analysis, with 50

Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com


J ul y 2017 • Vol u me 1, Nu mbe r 1 • www.solver-intern a t ion a l.com

CONTENTS Cover Story

Analytics in the Cloud “Analytics in the Cloud” is a hot topic these days in news articles, blog posts, and vendor webinars. But what does this mean for you? What exactly is “the Cloud,” anyway? What are its advantages and drawbacks for analytic modelers?

40

Tutorial

14 2

|

Solver-International | July 2017

Risk Analysis and Monte Carlo Simulation Risk analysis is the systematic study of uncertainties and risks while Monte Carlo simulation is a powerful quantitative tool often used in risk analysis. By Dan Fylstra

Solver-International.com


COL UMNS & DE PA RT ME N TS 4 Off the Top 6 Impact Analytix 10 Guest Columnist 48 INFORMS Society News 50 Glossary

Executive Interview

Ellie Fields— Tableau Software

Case Study

Moving vehicles and bids efficiently

22

How can a company engage people with data, beyond simply doing analytics— reaching people who need to use data in their daily life but don’t need to author a dashboard.

Finding the best supplier, at the right price, is a common problem in business. Using the best analytics to get a solution is no problem.

36

28

6

Case Study

Excel: Big Data Tool? Can you work with Big Data in Excel? From the barrage of recent news, white papers, and sales calls about Big Data, you would think not. Better think again.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

3


Off the

TOP

Tom Inglesby, editor tom@solver-international.com

Big Enough Big Data. In 2014—ages ago in internet time—Hugh J. Watson of the Department of MIS at the University of Georgia explained, “From an evolutionary perspective, big data is not new. A major reason for creating data warehouses in the 1990s was to store large amounts of data. Back then, a terabyte was considered big data.” In fact, Teradata, a leading data warehousing vendor, has more than 35 customers, such as Walmart and Verizon, with data warehouses over a petabyte in size. Even four years ago, eBay captured a terabyte of data per minute and maintained over 40 petabytes, the most of any company in the world. In 2012, president Obama’s campaign was seeking ways to overcome the Democrat’s serious losses in the 2010 mid-term election. One approach was to analyze voters’ data, big data, in order to find where the Obama Coalition from 2008 had gone. Dan Wagner was hired as the “targeting director” for the Democratic National Committee (DNC) in January of 2009 and he became responsible for collecting voter information and analyzing it to help the committee approach individual voters by direct mail and phone. According to Sasha Issenberg, writing in MIT Technology Review in 2012, Wagner appreciated that the raw material he was feeding into his statistical models amounted to a series of surveys on voters’ attitudes and preferences. He asked the DNC’s technology department to develop software that could turn that information into tables, and he called the result Survey Manager. Issenberg reports that, by the 2012 election, Chris Wegrzyn, a database applications developer, became the DNC’s lead targeting developer and oversaw a series of acquisitions, all intended to free the party from the traditional dependence on outside vendors. The committee installed a Siemens Enterprise System phone-dialing unit that could put out 1.2 million calls a day to survey voters’ opinions. Later, party leaders signed off on a $280,000 license to use Vertica software from Hewlett-Packard to allow their servers to access not only the party’s 180-million-person voter file but all the data about volunteers, donors, and those who had interacted with Obama online. That wasn’t the beginning of Big Data in elections but it turned out to be the defining moment for this generation. In 2016, Big Data had become a common term used by millions with twice as many definitions. However, Jason Shultz wrote for Surefire Data Solutions, “… big data should not be a scapegoat to hold any electoral woes or aggressions. Data is a tool, and it needs to be used properly. In the future, data of all sizes needs to be looked at more objectively, to give citizens a clear view of the possibilities. We all also need to remember that ‘data is not a substitute for innovation.’ Winning takes not only the right tools, but strategy.” And next up: 2018 mid-terms! Si Comments are welcomed at WWW.Solver-International.com or by e-mail.

4

|

Solver-International | July 2017

SOLVER INTERNATIONAL DIGITAL MAGAZINE A JOINT VENTURE BETWEEN LIONHEART PUBLISHING, INC. AND FRONTLINE SYSTEMS, INC.

SOLVER INTERNATIONAL ADVERTISING AND EDITORIAL OFFICE Send all advertising submissions for Solver International to: Lionheart Publishing Inc. 1635 Old​41 Hwy., Suite 112-361, Kennesaw, GA 30152 USA Tel.: 888.303.5639 • Fax: 770.432.6969 Email: lpi@lionhrtpub.com URL: www.lionheartpub.com

PRESIDENT John Llewellyn, ext. 209 llewellyn@lionhrtpub.com Direct: 404.918.3275

EDITOR Tom Inglesby tom@solver-international.com Direct: 760.529.9437

NEWS SUBMISSIONS editor@solver-international.com

ART DIRECTOR Alan Brubaker, ext. 218 albrubaker@lionhrtpub.com

ONLINE PROJECTS MANAGER Patton McGinley, ext. 214 patton@lionhrtpub.com

ASSISTANT ONLINE PROJECTS MANAGER Leslie Proctor, ext. 228 leslie@lionhrtpub.com

ADVERTISING SALES MANAGERS Sharon Baker sharon@lionhrtpub.com Direct: 813.852.9942 Aileen Kronke aileen@lionhrtpub.com Direct: 678.293.5201

REPRINTS & SUBSCRIPTIONS Kelly Millwood, ext. 215 kelly@lionhrtpub.com

FRONTLINE SYSTEMS, INC. P. O. Box 4288, Incline Village, NV 89450 www.solver.com

Solver International is published bimonthly by Lionheart Publishing, Inc. in cooperation with Frontline Systems, Inc. Deadlines for contributions: Manuscripts and news items should arrive no later than three weeks prior to the first day of the month of publication. Address correspondence to: Editor, Solver International, Lionheart Publishing, Inc., 1635 Old​ 41 Hwy., Suite 112-361, Kennesaw, GA 30152. The opinions expressed in Solver International are those of the authors, and do not necessarily reflect the opinions of Lionheart Publishing Inc., Frontline Systems, Inc. or the editorial staff of Solver International. All rights reserved.

Solver-International.com


Welcome to Analytic Solver ® Cloud-based Simulation Modeling that Integrates with Excel

Everything in Predictive and Prescriptive Analytics Everywhere You Want, from Concept to Deployment. The Analytic Solver® suite makes the fastest Monte Carlo simulation and risk analysis software available in your web browser (cloud-based software as a service), and in Microsoft Excel. And you can easily create models in our RASON® language for server, web and mobile apps.

Comprehensive Risk and Decision Analysis Tools. Use a point-and-click Distribution Wizard, 50 probability distributions, automatic distribution fitting, compound distributions, rank-order correlation and three types of copulas; 50 statistics, risk measures and Six Sigma functions; easy multiple parameterized simulations, decision trees, and a wide array of charts and graphs.

nonlinear optimization, simulation optimization, stochastic programming and robust optimization. And it’s a full-power tool for forecasting, data mining and text mining, from time series methods to classification and regression trees, neural networks and more, with access to SQL databases and Spark Big Data clusters.

Find Out More, Start Your Free Trial Now. In your browser, in Excel, or in Visual Studio, Analytic Solver comes with everything you need: Wizards, Help, User Guides, 90 examples, even online training courses. Visit www.solver.com to learn more or ask questions, and visit analyticsolver.com to register and start a free trial – in the cloud, on your desktop, or both!

Optimization, Forecasting, Data and Text Mining. Analytic Solver is also a full-power, point-and-click tool for conventional and stochastic optimization, with powerful linear and mixed-integer programming,

Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com


Impact

ANALYTIX

Jen Underwood

A New World of Data

W

e live in an incredible era of extremely rapid, disruptive innovation. Globalization, accelerated technology change, infinite cloud scale, ubiquitous connectivity, and an internet of smart things powered by artificial intelligence is enabling a fourth industrial revolution — the digital transformation. Successful organizations are masters of data. A culture of analytics permeates today’s most advanced companies. To fully create a culture of analytics, an organization must bring together its two greatest assets: its people and its data. Thus, if you can extract intelligence from massive volumes of data,

Photo Courtesy of 123rf.com | © everythingpossible

6

|

Solver-International | July 2017

you should enjoy unprecedented levels of opportunity in the realm of digital transformation. Here are a few key technologies to understand as we modernize analytics for a new world of data. INTERNET OF THINGS (IOT) Digital transformation is fueling the growing maturity and affordability of edge technologies that can communicate with the internet. According to Gartner estimates, by 2020 there will be more than 26 billion connected devices. From vehicles, appliances, machines, cellphones, and wearable devices to just about anything else you can think of, intelligent things powered by data will compute, communicate, sense, and respond. As more business processes and decisions get automated, scalable data storage, secure digital data lifecycle management, solid metadata management, and enhanced data quality procedures will rise in importance for efficiently sharing or monetizing data. IoT drives demand for big data analytics to uncover hidden patterns, unknown correlations, and other useful information. In big data analytics, advanced

Solver-International.com


analytical techniques such as deep learning are used with large diverse data sets of structured, unstructured, and streaming data ranging from terabytes to zettabytes. Unstructured data sources do not fit in traditional data warehouses. Thus, a new ecosystem of big data analytics technologies has been developed to ingest, process, store, and analyze unpredictable volumes, velocities, and varieties of data. HADOOP FOR BIG DATA ANALYTICS Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle extreme volumes of concurrent tasks or jobs. In a modern analytics architecture, Hadoop provides low-cost storage and data archival for offloading old historical data from the data warehouse into online cold storage. It is also used for IoT, data science, and unstructured analytics use cases. Within the Hadoop framework, there are a plethora of related technologies for loading, organizing and querying data. Here is a list of the most popular ones today. • Apache Spark – Opensource cluster computing framework with highly

• •

• • • • • •

performant in-memory analytics and a growing number of related projects Apache Kafka – a distributed streaming platform for building realtime data pipelines and streaming apps MapReduce – a parallel processing software framework that takes inputs, partitions them into smaller problems and distributes them to worker nodes Hive – a data warehousing and SQL-like query language Hadoop Distributed File System (HDFS) – the scalable system that stores data across multiple machines without prior organization YARN – (Yet Another Resource Negotiator) provides resource management for the processes running on Hadoop Ambari – a web interface for managing Hadoop services and components Cassandra – a distributed database system Flume – software for streaming data into HDFS HBase – a non-relational, distributed database that runs on top of Hadoop HCatalog – a table and storage management layer Oozie – a Hadoop job scheduler

Educating and Empowering the Business Analyst

“Cognitive intelligence and deep learning technologies will take on countless tasks humans historically performed.”

– Jen Underwood

July 2017 | Solver-International

|

7


Impact

ANALYTIX • Pig – a platform for manipulating data stored in HDFS • Solr –a scalable search tool • Sqoop – moves data between Hadoop and relational databases • Zookeeper – an application that coordinates distributed processing Notably over the past two years, Apache Spark has moved from being a component of the Hadoop ecosystem to a big data analytics platform of choice. It is growing faster than Hadoop. Spark provides dramatically increased data processing speed compared to Hadoop. Spark itself has many related projects including the core Apache Spark runtime, Spark SQL, Spark Streaming, MLlib, ML, and GraphX. It is now the largest big data open source project today with 1,000-plus contributors from more than 250 organizations. The distributed and rapid storage of just about anything unstructured or structured into Hadoop Distributed File Structure (HDFS), combined with the capability of contemporary analytics tools to massively parallel query that data in a timely manner for analysis, is appealing, powerful, and driving the data platform modernization movement. While the Hadoop ecosystem continues to rapidly improve, exponential volumes of data from digital devices

8

|

Solver-International | July 2017

are just starting to overwhelm traditional data architectures. OPEN SOURCE ANALYTICS Another market force that is changing analytics landscapes globally has been the shift by most vendors to embrace Open Source projects such as Hadoop, Apache Spark, R, and Python. Competition drives innovation and pushes businesses forward. However, harnessing the collective genius of a worldwide community of analytics developers is priceless. After Open Source was adopted by major vendors that saw opportunities to monetize it by providing better tools, security, maintenance, and support, the adoption risks that historically held Open Source back were reduced. That is one of the many reasons why Open Source is literally everywhere these days. CLOUD AND HYBRID ANALYTICS Although most analytics applications today still use older data warehouse and OLAP technologies on-premises, the pace of the cloud shift is significantly increasing. Internet infrastructure is getting better and is almost invisible in mature markets. Cloud fears are subsiding as more organizations witness the triumphs of early adopters. Instant, easy cloud solutions continue to win the hearts and minds of

non-technical users. Cloud also accelerates time to market allowing for innovation at faster speeds than ever before. IoT inspired cloud streaming analytics, cloud data warehouse and cloud data lake technology is exceptionally simple, fast, and cost-effective to spin up, scale up, or even scale down versus investing in multi-million-dollar hardware purchases that are almost immediately outdated. Cloud analytics plug-and-play, pointand-click, no-code designs along with pre-packaged templates empower far more organizations to enjoy sophisticated, highly scalable, advanced analytics solutions that used to be extremely complex to build in-house. As more groups leverage cloud in digitalization strategies, the center of data gravity shifts, changing where analytics takes place. Analytics in the digital era often spans data residing onpremises and in the cloud. To ease hybrid analytics complexity, we are seeing novel cross domain solutions being introduced. For example, databases today have optional storage locations on-premises or in virtual external tables in the cloud. A transformed class of data-as-a-service architectures, data virtualization, and logical data warehouses that enable data analysis without moving data is currently evolving. Lastly a new generation of intelligent, enterprise data

Solver-International.com


catalogs is expanding to allow analytics professionals to manage metadata, improve data quality, run sophisticated data searches, and get smart data usage-based recommendations powered by machine learning. PREPARING FOR THE DIGITAL FUTURE The exponential growth of data, digitization, and internet connectivity is the “backbone” of the Fourth Industrial Revolution. It has the potential to propel societies forward, enable innovative business models, and help governments. Digitization doesn’t just enable what we do, it transforms it — not only business models, but also policy and social norms. Like many changes in our lifetime, digital transformation is a blue ocean of opportunity to reinvent. There are also significant risks to mitigate as disruptive business models change the game. We are just beginning to see a future world where basic data visualization and data analysis that have been the foundations of analytics will be partially automated with savvy smart data discovery. Cognitive intelligence and deep learning technologies will take on countless tasks humans historically performed. To prepare, analytics professionals will need to think digital, think big, and enjoy diving into stateof-the-art analytics technologies to pave the path forward. Si

Jen Underwood is Founder and Principle of Impact Analytix, LLC. Impact Analytix is a boutique integrated product research, consulting, technical marketing and

creative digital media agency led by experienced handson practitioners. Jen can be tweeted at @idigdata

Stand Out

from the overcrowded field of Data Scientists. Attend the most comprehensive and concentrated live certificate training available for analytics at the enterprise level. Lead your organization to achieve measurable returns in data science with predictive analytics.

…and watch your career take off! Contact a Training Advisor Now

Educating and Empowering the Business Analyst

+1 (281) 667-4200 Opt 3 the-modeling-agency.com/si

July 2017 | Solver-International

|

9


Guest

COLUMNIST

Eric Torkia, MASc

Simulation Models? Here’s how and why

P

erception is reality. That was something said to me many years ago by an executive at SAP. I have pondered this statement and find that it is true, applicable in most situations. In a world of constant information, we are now aware of events much more quickly and visibly. Risk perception is a highly personal process of decision-making, based on an individual’s frame of reference developed over a lifetime, among many other factors. (Brown, 2014)

Perceived vs. Actual Probabilities: What was a safer mode of transport after 9/11?

Why I think planes are dangerous: • •

Imminent terror attack I don’t control who are passengers and if an accident happens I am helpless at 30k ft.

Why I think cars are safer than planes: • •

Why planes actually are safer: • •

10

1 in 767,303 chances of dying this year in a plane or 1 in 9,737 for an entire lifetime. Large infrastructure for security and safety

|

Solver-International | July 2017

I am safe from terror because no one will get in the car I don’t trust. I am in full control of my environment

Why cars are actually more dangerous: • • •

1 in 8,938 chances of dying this year in a car or 1 in 113 for an entire lifetime. Bad weather conditions Bad drivers

Solver-International.com


According to Makridakis (Makrdakis, Hogarth, & Gaba, 2009), a good example of misperception of risk is when 1,700 more people died on the road in 2002 than in 2001 because they elected to drive instead of taking a plane. Terrorism or not, the risk of death is much lower when you take a plane instead of the car – therefore the more people who took cars to avoid terrorists, the more people were at risk of dying. The obvious motivation for this being that the news media and the 24 hours news cycle made a highly tragic but rare event like 9/11 feel imminent. Fortunately, the numbers don’t bear it out (in fact driving is 100 times more dangerous) and that is the point of simulation. Building a model can help you figure out if a risk is real or being manipulated by our personal bias. WHAT EXACTLY IS A MODEL? When we think about reality, what are we really doing? We are creating models in our minds. Models are visual, written, or mathematical abstractions of reality used to explain and analyze a problem or phenomena. The simplest example of an abstract mathematical model that we can all relate to is: PROFIT = REVENUE – EXPENSES. To

demonstrate that profit is an abstract concept, consider that even though you know exactly what it [profit] means, you will never find a pile of profit in nature, nor can you step in it on a spring day in the park. Below is an illustration depicting the relationship between the “real” world and simulation. TYPES AND FLAVORS OF MODELS Various model types and flavors exist. We shall cover several important classifications, all important to the disciplines of decision and risk analysis. Deterministic vs. Stochastic (Probabilistic) A deterministic model is one where you can calculate the output precisely given a specific set of inputs. For example, if Revenue is $3 million and Expenses are $2 million, then

per our model [Profit = Revenue – Expenses] our Profit is $1 million – no uncertainty about that. Quantitative vs. Qualitative As the names imply, Quantitative Analysis looks at quantifiable values (hard numbers) while qualitative models seek to take abstract concepts such as experiential data and translate it into numbers we can monitor or manage, e.g. customer satisfaction index. The primary tools for qualitative data are surveys, interviews, and testimonies while historical data is the basis for most Quantitative Analysis. Symbolic vs. Numerical This is perhaps an oldschool distinction to make but here it is: Symbolic models are

The relationship between the “real” world and simulation.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

11


Guest

COLUMNIST

Various model types exist, all important to the disciplines of decision and risk analysis.

based on algebraic modeling using symbolic form. That means you distill the situation into equations and manipulate them algebraically to solve for x (say). Numerical models also require an equation but it can be far simpler. You put numbers or scenarios in on one side, and record the output on the other, using the brute force of modern computers. Then by looking at the aggregate outputs you can see which variables are correlated as well as get the “area under the curve” at specific confidence intervals. Trying to do this with symbolic methods typically yields pages of algebra that we cannot easily understand. Descriptive vs. Predictive vs. Prescriptive Descriptive Models are the basis for all further modeling. Descriptive Models are simply a description in mathematical form, usually in deterministic terms.

12

|

They describe what was. A good example is an accounting report, such as an income statement or perhaps a sales report, which gives you the state at a specific point in time with clear and understood calculations. Predictive Models come in several flavors such as forecasting, data mining, and machine learning models. Machine learning usually requires lots of historical data: Starting from a standard model form, such as a linear or logistic regression, tree or neural network model, they adjust and “fit” various model parameters to the observed data, until the model’s output on the historical data closely matches the historical outcomes. If a model structure is unknown, machine learning can give clues to the modeler of unobserved underlying relationships. For example, when analyzing credit risk, you may consider that age, annual income, and marital status are good predictors

Solver-International | July 2017

for collecting a loan. Certain combinations of these predictors will dictate the loan has a good chance of being fully collected while others would predict default. Simulation models lie on the boundary between predictive models and prescriptive models, and include some elements of both. Simulation is used when we don’t have historical data on the whole process – for example, we haven’t built the new assembly line yet – but we may have data on some elements of the process, such as processing times at assembly stages and varying demand for end products. Through simulation, you can incorporate uncertainty in the inputs and assess their impacts on the outcomes, also providing insight into key influencers on the target outcome. Prescriptive Models also come in several flavors, but unlike predictive models, their output is a decision or action to be taken. Solver-International.com


Prescriptive modeling also starts with a descriptive model that calculates certain results, but includes further logic to reach a decision. For example, decision trees, multi-attribute decision matrices, or business rule systems may be used to “compute” a decision. For situations involving many resource allocation decisions, mathematical optimization is used. The model includes decision variables that may specify yes/no outcomes such as “build or don’t build the new plant,” or amounts of resources such as “we need x square feet of floor space and y production line workers.” Most optimization models at present are deterministic. For example, they might assume constant demand for products, and focus on allocating resources to efficiently produce them. But it’s possible to combine simulation modeling and optimization modeling, to find the best decisions in the presence of uncertainty. Depending on the form of the resulting model, we may be able to apply fast methods

that yield “known optimal” outcomes, such as stochastic linear programming, or we might have to fall back to bruteforce methods that yield only “better, but not proven optimal” outcomes, such as simulation optimization. This approach allows us to take uncertainty into account and optimize our desired measure across the full range of possible outcomes. Software to do all of this is not only available, but increasingly easy to use. What

Educating and Empowering the Business Analyst

industry needs is more business analysts who have learned how to use these powerful methods, and can apply them to practical business problems. I hope you are one of them – or you soon will be! Si Eric Torkia, MASc, is Executive Partner for Analytics Practice at Technology Partnerz Ltd. St-Lambert, Quebec, Canada. Technology Partnerz is an established reseller for analytics tools and services.

July 2017 | Solver-International

|

13


d

TUTORIAL

RISK ANALYSIS AND

MONTE CARLO SIMULATION Risk analysis is the systematic study of uncertainties and risks while Monte Carlo simulation is a powerful quantitative tool often used in risk analysis. BY DAN FYLSTRA

14

|

Solver-International | July 2017

Solver-International.com


Uncertainty and risk are issues that virtually every business analyst must deal with, sooner or later. The consequences of not properly estimating and dealing with risk can be devastating. The 2008-2009 financial meltdown – with its many bankruptcies, homes lost to foreclosure, and stock market losses – began with inadequate estimation of risk in bonds that were backed by subprime mortgages. But in every year, there are many less-publicized instances where unexpected (or unplanned for) risks bring an end to business ventures and individual careers.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

15


d

TUTORIAL: RISK ANALYSIS There’s a positive side to uncertainty and risk, as well: Almost every business venture involves some degree of risk-taking. Properly estimating, and planning for the upside is just as important as doing so for the downside. Risk analysis is the systematic study of uncertainties and risks we encounter in business, engineering, public policy, and many other areas. Monte Carlo simulation is a powerful quantitative tool often used in risk analysis. ARE UNCERTAINTY AND RISK DIFFERENT? Uncertainty is an intrinsic feature of some parts of nature – it is the same for all observers. But risk is specific to a person or company – it is not the same for all observers. The possibility of rain tomorrow is uncertain for everyone; but the risk of getting wet is specific to me if (a) I intend to go outdoors and (b) I view getting wet as undesirable. The possibility that stock A will decline in price tomorrow is an uncertainty for both you and me; but if you own the stock long and I do not, it is a risk only for you. If I have sold the stock short, a decline in price is a desirable outcome for me. Many, but not all, risks involve choices. By taking some action, we may deliberately expose ourselves to risk – normally because we expect a gain that more than compensates us for bearing the risk. If you and I come to a bridge across a canyon that we want to cross, and we notice signs of weakness in its structure, there is uncertainty about whether the bridge can hold our weight, independent of our actions. If I choose to walk across the bridge to reach the other side, and you choose to

16

|

Solver-International | July 2017

stay where you are, I will bear the risk that the bridge will not hold my weight, but you will not. Most business and investment decisions are choices that involve “taking a calculated risk” – and risk analysis can give us better ways to make the calculation. HOW TO DEAL WITH RISK If the stakes are high enough, we can and should deal with risk explicitly, with the aid of a quantitative model. As humans, we have heuristics or “rules of thumb” for dealing with risk, but these don’t serve us very well in many business and public policy situations. In fact, much research shows that we have cognitive biases, such as overweighting the most recent adverse event and projecting current good or bad outcomes too far into the future, that work against our desire to make the best decisions. Quantitative risk analysis can help us escape these biases and make better decisions. It helps to recognize up front that when uncertainty is a large factor, the best decision does not always lead to the best outcome. The “luck of the draw” may still go against us. Risk analysis can help us analyze, document, and communicate to senior decision-makers and stakeholders the extent of uncertainty, the limits of our knowledge, and the reasons for taking a course of action. WHAT-IF MODELS The advent of spreadsheets made it easy for business analysts to “play what-if:” Starting with a quantitative model of a business situation in Excel or Google Sheets, it’s easy to change a number in an input cell or parameter, and see the

Solver-International.com


d

TUTORIAL effects ripple through the calculations of outcomes. If you’re reading this magazine, you’ve almost certainly done “what-if analysis” to explore various alternatives, perhaps including a “best case,” “worst case,” and “expected case.” But trouble arises when the actual outcome is substantially worse than our “worst case” estimate – and it isn’t so great when the outcome is far better than our “best case” estimate, either. This often happens when there are many input parameters: Our “what-if analysis” exercises only a few values for each, and we never manage to exercise all the possible combinations of values for all the parameters. It doesn’t help that our brains aren’t very good at estimating statistical quantities, so we tend to rely on shortcuts that can turn out quite wrong. SIMULATION SOFTWARE: THE NEXT STEP Simulation software, properly used, is a relatively easy way to overcome the drawbacks of conventional what-if analysis. We use the computer to do two things that we aren’t very good at doing ourselves: 1. Instead of a few what-if scenarios done by hand, the software runs thousands or tens of thousands of what-if scenarios, and collects and summarizes the results (using statistics and charts). 2. Instead of arbitrarily choosing input values by hand, the software makes sure that all the combinations of input parameters are tested, and values for each parameter cover the full range. This sounds simple, but it’s very effective. There’s just one problem:

If there are more than a few input parameters, and the values of those parameters cover a wide range, the number of what-if scenarios needed to be comprehensive is too great, even for today’s fast computers. For example, if we have just 10 suppliers, and the quantities of parts they supply have just 10 different values, there are 1010 or 10 billion possible scenarios. Even an automated run of 1,000 or 10,000 scenarios doesn’t come close. What can we do? The Monte Carlo method was invented by scientists working on the atomic bomb in the 1940s. It was named for the city in Monaco famed for its casinos and games of chance. They were trying to model the behavior of a complex process (neutron diffusion). They had access to one of the earliest computers – MANIAC – but their models involved so many inputs or “dimensions” that running all the scenarios was prohibitively slow. However, they realized that if they randomly chose representative values for each of the inputs, ran the scenario, saved the results, and repeated this process, then statistically summarized all their results – the statistics from a limited number of runs would quite rapidly “converge” to the true values they would get by actually running all the possible scenarios. Solving this problem was a major “win” for the United States, and accelerated the end of World War II. Since that time, Monte Carlo methods have been applied to an incredibly diverse range of problems in science, engineering, and finance -- and business applications in virtually every industry. Monte Carlo simulation is a natural match for what-if analysis in a spreadsheet.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

17


d

TUTORIAL: RISK ANALYSIS

Three histogram charts of the possible values of an input parameter.

18

|

Solver-International | July 2017

RANDOMLY CHOOSING REPRESENTATIVE VALUES Now for the hard part (not really that hard): How do we randomly choose representative values, using a computer? This is the heart of the Monte Carlo method, and it’s where we need some probability and statistics, and knowledge of the business situation or process that we’re trying to model. Choosing randomly is the easier part: In the external world, if there were only two possible values, we might use a coin toss, or if there were many, we might spin a roulette wheel. The analog in software is a (pseudo) random number generator or RNG – like the RAND() function in Excel. This is just an algorithm that returns an “unpredictable” value every time it is called, always falling in a range (between 0 and 1 for RAND()). The values we get from a (pseudo) random number generator are effectively “random” for our purposes, but they aren’t truly unpredictable – after all, they are generated by an algorithm. The RNG’s key property is that, over millions of function calls, the values it returns are “equidistributed” over the range specified. To ensure that the values randomly chosen are representative of the actual input parameter, we need some knowledge of the behavior of the process underlying that parameter. Here are three histogram charts of the possible values of an input parameter. The first two are probably familiar. On the top is a Uniform probability distribution, where all the values between 0 and 1 are equally likely to occur. This is the distribution of values returned by the RAND() function.

Solver-International.com


d

TUTORIAL In the middle is a Normal probability distribution, the most common distribution found in nature, business and the economy. Note that, unlike the Uniform distribution, the Normal distribution is unbounded – there is a small chance of very large or very small/negative values. On the bottom is an Exponential probability distribution, which is commonly used to model failure rates of equipment or components over time. It reflects the fact that most failures occur early. Note that it has a lower bound of 0, but no strict upper bound. Our task as business analysts is to choose a probability distribution that fits the actual behavior of the process underlying our input parameter. Most distributions have their own input parameters you can use to closely fit the values in the distribution to the values of the process. Software such as @RISK from Palisade, ModelRisk from Vose Software, and Analytic Solver Simulation from Frontline Systems (sponsors of this magazine) offers you many – 50 or more – options for probability distributions. WHAT HAPPENS IN A MONTE CARLO SIMULATION Given a random number generator and appropriate probability distributions for the uncertain input parameters, what happens when you run a Monte Carlo simulation is pretty simple: Under software control, the computer does 1,000 or 10,000 “what-if ” scenario

calculations – one such calculation is called a Monte Carlo “trial.” On each trial, the software uses the RNG to randomly choose a “sample” value for each input parameter, respecting the relative frequencies of its probability distribution. For example, for a Normal distribution, values near the peak of the curve will be sampled more frequently. If you’ve specified correlations, it modifies these values to respect the correlations. Then the model is calculated, and values for outputs you’ve specified are saved. It’s as simple as that! At the end of the simulation run, you have results from 1,000 or 10,000 “whatif” scenarios. You can step through them one at a time, and inspect the results (on the spreadsheet, if you’re using one), but it’s generally easier to look at statistics and charts to analyze all the results at once. For example, here’s a histogram of one calculated result, the Net Present Value of future cash flows from a project that involves developing and marketing a new product. The chart shows a wide range of outcomes, with an average (mean) outcome of $117 million. But the full range of outcomes (in the Statistics

Histogram: Net Present Value of future cash flows from a project.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

19


d

TUTORIAL: RISK ANALYSIS on the right) is from negative $77.5 million to positive $295 million! And from the percentiles, we see a 5 percent chance of a negative NPV of $7.3 million or more, and a 5 percent chance of a positive NPV of $224 million or more. Another view, below, from this same chart shows us how sensitive the outcome is to certain input parameters, using “moment correlation,” one of three available presentations. $C$31 is customer demand in Year 1, $C$32 is customer demand in Year 2, and $C$35 is marketing and sales cost in Year 1 (treated as uncertain in this model). This is often called a Tornado chart, because it ranks the input parameters by their impact on the outcome, and displays them in ranked order. On the right, we are displaying Percentiles instead of summary Statistics. How did we construct this model? We started from a standard “what-if” model, then replaced the constant values in three input cells with “generator functions” for probability distributions. We also selected the output cell for Net Present Value, as an outcome we wanted to see from the simulation. We can pretty much always go from a what-if model to a Monte Carlo simulation model in

This chart shows how sensitive the outcome is to certain input parameters.

20

|

Solver-International | July 2017

a similar way. A chart, page 21, shows our chosen distribution – a “truncated” Normal distribution, which excludes certain extreme values – for customer demand in Year 2. STEPS TO BUILD A MONTE CARLO SIMULATION MODEL If you have a good “what-if ” model for the business situation, the steps involved in creating a Monte Carlo simulation model for that situation are straightforward: • Identify the input parameters that you cannot predict or control. Different software may call these “inputs,” “forecasts,” or “uncertain variables” (Analytic Solver’s term). For these parameters (input cells in a spreadsheet), you will replace fixed numbers with a “generator function” based on a specific probability distribution. • Choose a probability distribution for each of these input parameters. If you have historical data for the input parameter, you can use “distribution fitting” software (included in most products) to quickly see which distributions best fit the data, and automatically fit the distribution parameters to the data. Software makes it easy to place the generator function for this distribution into the input parameter cell. • If appropriate, define correlations between these input parameters. Sometimes, you know that two or more uncertain input parameters are

Solver-International.com


d

TUTORIAL related to each other, even though they aren’t predictable. Using tools such as rank-order correlation or copulas, which modify the behavior of the generator functions in a simulation, you can take this into account. Thinking about the first step, if you can predict the value of an input parameter, it’s really a A truncated Normal distribution for customer demand in Year 2. constant in the model. But if you have a prediction that’s only an estimate in a range—or model to a Monte Carlo simulation within ‘confidence intervals’—then it model. The main effort is focusing on should be replaced with a generator the real uncertainties in the business function. If you can control the value, situation, and how you can accurately this parameter is really a decision variable model them. The software does all the that can be used later in simulationwork to analyze thousands of what-if based what-if analysis, or in simulation scenarios, and it gives you a new visual optimization. perspective on all of them at once. Thinking about the third step: If you We also saw at the beginning of know the exact relationship between this tutorial that uncertainty and input parameter A and input parameter risk are present in virtually every B (say that B = 2*A), you can just define business situation, and indeed most a probability distribution for A, and use life situations – and the consequence of a formula to calculate B. Correlation not estimating risk properly, and taking methods are intended for cases where steps to mitigate it, can mean an early you know there is a relationship, but career end, or even a business failure. the exact form of that relationship is That’s just the negative side – risk uncertain. For example, airline stocks analysis can also show you that there’s tend to rise when oil stocks fall, because more upside than you ever imagined. both are influenced by the price of crude With the effort so modest and the oil and jet fuel – but the relationship is payoff so great, the answer should far from exact. be obvious: Monte Carlo simulation should be a frequently-used tool in SO, WHY DO THIS? every business analyst’s toolkit. Si We’ve seen that it really isn’t very Dan Fylstra is President of Frontline difficult to go from a good what-if Systems, Incline Village, Nev.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

21


d

CASE STUDY

MOVING AND BIDS EFFICIENTLY Finding the best supplier, at the right price, is a common problem in business. Using the best analytics to get a solution is no problem. 22

|

Solver-International | July 2017

Solver-International.com


VEHICLES Photo Courtesy of 123rf.com | Š everythingpossible

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

23


d

CASE STUDY: PHILADELPHIA SCHOOL DISTRICT Whether in government or the private sector, choosing the right vendors is an essential of doing business. In many cases, it’s not just finding the low bidder or the supplier closest to the project, it is a matter of many variables coming together at the right time, cost and location, and the decision-maker needs to do priority assignment and risk analysis. In the case of the School District of Philadelphia, several vendors will normally compete for contracts during the bidding process, and it is up to the department administrator to make a careful and impartial decision. With many variables to weigh in the decision, optimization software can be called upon to solve the relationships between the variables and find the optimal solution. This method is both impartial and faster than human analysis but often requires advanced knowledge of analysis methodology, not always found in every department. Charles Lowitz, the fiscal coordinator for the transportation office of the School District at the time, understood this. He looked for software that could be applied to his Microsoft Excel contract model. Using Excel had been convenient for the district since it was already in place to collect information about

24

|

Solver-International | July 2017

the vendors. Lowitz was able to use its formulas to express relationships in his business model, simplifying the early stages of the project. The School District of Philadelphia operates about 40 percent of the school bus routes itself, but needs to contract out the rest to meet its obligation to transport students in the sixth largest district in the country. Ultimately, vendors would be selected by experience and capacity, but the District needed to determine the optimal mix of vendors—each supplier had unique limitations in how many buses they would supply, what routes they would serve, and at what cost. Approximately 725 bus routes were on the table to be contracted out to private vendors. It was Lowitz’s

Solver-International.com


d

CASE STUDY responsibility to come up with a way to maximize the return on investment and improve the process by which the routes were awarded. In previous years, the process had been done by figuring out, by hand, which contracts would make the most sense given the constraints of budget and time. The school district would then need to figure out a way to distribute bus routes between the bids. For various external reasons, the district found that it was preferable to keep 30 to 40 percent of the routes in-house with existing buses and then to outsource the rest of the fleet. The transportation department had to find companies willing to take on the rest of the routes within the constraints of the model. In the end, the analytic power of Premium Solver Platform (now known as Analytic Solver Optimization) from Frontline Systems in Incline Village, Nev., was employed to find the partnerships that would be most beneficial, from a financial and operational perspective. RFP REQUIREMENTS The main variables associated with the vendors chosen were: • Cost: The district had to determine which route sets to award to which vendors at the lowest possible cost. Some vendors had a minimum number of routes they would bid on, at a certain cost, and if they weren’t awarded that minimum, their prices would increase. • Vendor capabilities: The district needed to make sure a vendor could accomplish what it put in its bid. If a company only had 50 buses, then the school district could not feasibly

award them a contract that included more than 50 routes. • Vendor reliance: The district needed to make sure the company was reliable, but at the same time, they didn’t want to rely on too few businesses. Some school districts award all their contracts to one vendor and that can quickly turn into a mess if that vendor were to fold, experience financial difficulties, or run into serious equipment problems with its fleet. Previous experience with the district, financial stability, and exhibited business acumen were also factors taken into consideration when awarding these contracts. Vendors were allowed to bid on any number of routes – from one or two to all of them—but the district stipulated that it would not award more than 300 routes to any one vendor, to protect its own interests. “In our contract awards process, we are required to have a manual process to verify the solution suggested by the optimization software,” Lowitz explains. “Fortunately, it all worked out, and I don’t think there would have been an assurance that it was the optimal solution without the software. The strategy employed by our procurement office necessitated having to have a product like

In previous years, determining which contracts would make the most sense was done by hand.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

25


d

CASE STUDY: PHILADELPHIA SCHOOL DISTRICT Premium Solver Platform to come up with the right answers.” THE OPTIMIZATION MODEL During the first year Lowitz used the software, there were 16 requests for proposals submitted to the district. Using an optimization model created in Excel, he tracked which of these vendors would offer the school district the best opportunity. The optimization model he created took into consideration all of the variables that made up the bid and the needs of the district. The final product included a surprising 1,552 binary integer variables. The binary variables expressed the values of 1 for yes and 0 for no. After the model was ready to run, it only required minutes to gather the necessary data to plug into the program and determine the best contract placement. Lowitz used a standard linear programming model, one that best fit the number of integer variables to be analyzed. There were approximately 100 route sets total, grouped by geographical region in the school district, and there could be anywhere from two to 15 routes per set. When all the analysis was done, 12 of the 16 vendors were awarded contracts. The size of the contracts reached both ends of the spectrum, with one vendor taking four routes and another getting almost 300 routes. When he required help, involving the input of minimum integers within

A standard linear programming model fit the number of variables to be analyzed.

26

|

Solver-International | July 2017

Premium Solver, Lowitz discovered that customer support for the platform was readily available. The analytics experts at Frontline Systems were able to help him solve the problem. “I had one issue that I had to call Frontline about, and they were great with support,” he reports. It became clear that the idea of entering a minimum into the model was a bit tricky. For example, a company might indicate it would only accept a contract for a minimum of 14 routes. Realistically, 14 isn’t the minimum that vendor could be awarded—if the district didn’t award any routes to that vendor, which was possible, then the actual minimum would be zero. “Once we got over that, the only hiccup in the model, the rest fell into place,” Lowitz recalls. “It may have taken a day to construct the model, then just minutes to come up with the optimal solution.” In the end, the School District of Philadelphia was able to award an optimized number of contracts to privately owned bus companies without resorting to handwritten notes and untrustworthy trial-anderror strategies. By implementing Premium Solver Platform analytic tools, creating a model with the proper variables, and then running the program, the school district saved both time and money. “I can’t say enough about it,” Lowitz says. “It had been a long time since I had had to use an optimization product, so I was rusty. But the software was very intuitive, making it very easy to come up with the solution. It was relatively quick to put the model together.” Si

Solver-International.com


solver International

the bi-monthly digital publication covering prescriptive and predictive analytics, will notify you of coming events, news and views from industry experts, and so much more.

WWW.SOLVER- INTE RNA TION AL.C

TUTORIAL

Monte Carlo Simulation

JULY 2017

OM

CASE STUDY

Transportation

TECHNOLOGY

Analytics in the Cloud

SUBSCRIBE TODAY to get notices by e-mail of information you can use to boost your career and improve the success of your company or academic institution.

VISIT OUR WEBSITE TO SIGN UP!

Solver-International.com


d

EXECUTIVE INTERVIEW

ELLIE FIELDS—

TABLEAU SOFTWARE How can a company engage people with data, beyond simply doing analytics—reaching people who need to use data in their daily life but don’t need to author a dashboard. Ellie Fields is a Senior Director of Product Development at Tableau Software where she manages strategy and execution of the capabilities that help Tableau customers scale their analytics deployment, including mobile, collaboration, data driven alerting, and more. She leads the “self-service at scale” development team, and has been at the company for more than eight years in a variety of different product roles. We caught up with her while she was on one of her many trips, evangelizing on data and business intelligence.

Solver Internatio nal: You were with Tableau early on, right?

Ellie Fields: I was with Tableau pretty early, when it was under 100 people, and I’ve seen the company grow a lot. SI: You have a BS in engineering from Rice and an MBA from The Farm, Stanford Graduate School of Business. When you were there, did they do anything with systems like Tableau? Or was that before the MBA program focused so much on analytics?

Fields: Now they do a little bit with it, but no, the data industry wasn’t there. I’ve been a data geek my whole life

28

|

Solver-International | July 2017

Solver-International.com


Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

29


d

EXECUTIVE INTERVIEW: ELLIE FIELDS and in college I was an engineer. I used a tool called Matlab, which is still around, a wonderful tool but it requires scripting. When I got out of school and went into the finance industry, I used Excel—because everyone wanted to be able to see what was happening. There was a critical collaboration element that kept people from using more sophisticated tools like Matlab. And then I went to Microsoft. That was after business school and I had to use Cubes. It wasn’t until I got to Tableau that I felt like I had come to how you should work with data, which is this visual, fluid, drag-and-drop, almost touching your data kind of experience. When I was in business school, Tableau was still very, very young. The way you worked with data was with Excel, and the Business Intelligence (BI) industry thought that a separate group of people would build the dashboards for you. You would put in requests and wait to get a dashboard back, and then tell them what else you wanted or how they could fix it, and they’d send it back to you. You’d have this process, back and forth over a period of weeks, if not months, and finally—maybe—you’d get a dashboard that would be useful to you. Tableau really disrupted that by saying, “Hey, anybody who cares about any data should be able to work with that data.” SI: Now, we’ve got dashboards and visualization tools, everything seems to be aimed at bringing out the information from the data.

Fields: I would cast it slightly differently. There are a lot of chart tools and ways to visualize your data. I think what’s interesting about data is

30

|

Solver-International | July 2017

when you can actually think with data. Sometimes you write a document to think through a subject; sometimes you draw a diagram on a whiteboard. With the right tools, you can actually think with data, too. So you do more than thinking, “Oh, I have this chart or this dashboard, and it is my output, my answer.” There’s so much data in today’s world and so many ways to look at it, really what you want to do is have an interactive experience, where you can question and answer—maybe by yourself, maybe with others—and you can go through a thinking process with data. That’s what people are seeking when they look for a data tool, a spreadsheet, or business analytics. A lot of times, they end up with a tool that does charts, and they get a bar chart of their sales or what have you, and they think, “Well, that’s nice, but I still haven’t had a chance to ask and answer questions of my data.” SI: You want to be getting information out of the data, something that you can use, not just looking at numbers.

Fields: Like any thinking process, it’s not always linear. Just looking at the endpoint doesn’t always get you what you want from it. SI: Modern tools allow collaboration. People can refine their data and look at things from a different point of view, based on what their success points are, and then combine them with input from others on the same team.

Fields: There are a lot of different ways to do that. I think of that as data conversations, and people use

Solver-International.com


d

EXECUTIVE INTERVIEW Tableau for that all the time, in a variety of different ways, whether it be simply sitting at the same screen or working together remotely. There are salespeople, for example, who have Tableau on a tablet, and they collaborate using data, building a shared data story with their customers. We’re building a feature right now that allows you to collaborate with data right inside a spreadsheet. There’s a set of comments that you can add, and you can snapshot the data as you add those. If you add a comment, I can see what you were looking at, and I can, moreover, go in and interact with the data after that, to continue the thread of thinking or look at it a different way. Data conversations are a real thing. They’re happening now. One interesting, large-scale social data experiment, is called Makeover Monday that some people in the UK run. I find it fascinating. What they do every week is they pick a data set, and it could be on anything—sports, the environment, politics, anything— and they put that data set out to the community. People take it on Sunday and Monday—Makeover Monday— and create different looks at that data to try and understand it in different ways. You can see the conversation happening. It usually develops on Twitter or on people’s blogs, and they say, “Oh, that’s interesting, you found this after you did that. I looked at it this way, and I found this additional thing.” It’s a way to have different people collaborating on the same data. You see that happening in companies, too, but of course you don’t always get the visibility to it from outside of the company.

SI: Does it matter how large a company is to benefit from this type of a conversation?

Fields: I think any two people could benefit from it. At Tableau, we have independent consultants who use Tableau, and they benefit just by themselves. Our customer base goes from one-person companies all the way up to some of the world’s largest organizations. Again, if you look at it as thinking with data, or talking with data, those concepts are useful at any scale. SI: Some of the things that our magazine looks at are forecasting, data mining, prescriptive analytics, simulation, risk analysis, and optimization. While there are a lot of software tools for all of this, how does Tableau fit in?

Fields: We think of Tableau as part of an ecosystem of data. There are a lot of new ways to store data; there are also a lot of different ways to analyze data. Tableau is the best tool to “think with data,” to ask and answer questions of your data, and to do that at scale, in an enterprise way, across a big or small organization. But there are also tools that do very specialized things. For example, R (see Glossary Page 50) is a very widely adopted tool that does statistical analysis really, really well. Python is taking hold, as well. At Tableau, we want to solve most of the common business cases. To do that, we have some forecasting algorithms in the product. For people who are doing hard-core forecasting all day long, they’re likely to adopt a tool like R, and we allow you to reach out to R and to Python, and bring those results into Tableau to visualize

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

31


d

EXECUTIVE INTERVIEW: ELLIE FIELDS them and do some interactive analysis on them. So really, we take an ecosystem approach. There are some tools that are good for some things, other tools are better for other things. We want to solve the business case use. If most business users need to do things like simple forecasting or data analysis, connecting to different data, we want to be a tool that they can use extensively. But when you get into data science and some of the more advanced practices, we see people bringing their results into Tableau and using our tools cooperatively. SI: In the 1980s and ’90s, interactivity and integration of software products was the Holy Grail. Everybody wanted to be able to do it, and nobody could figure out how to do it.

Fields: Integrating tools has been hard in the software industry. At Tableau, we use Open Standards, like ODBC to connect to applications where we don’t have a native connection. It’s always hard to integrate enterprise systems, but today there are a lot of good ways to do it, too. SI: The term business intelligence, BI, is sometimes used rather indiscriminately. Companies are at different levels of maturity in getting their data together to be able to create intelligence and make it available as needed. What is your perspective on that? What industries do you think are the most advanced in BI?

Fields: When I think about analytics maturity, I think about how widely analytics is used in a company.

32

|

Solver-International | July 2017

Is it a semi-priesthood of a few people who are allowed to use data? Are there generalized data skills throughout the organization? Do people have access to data, and are they using it in a forwardlooking way to make decisions? As opposed to what were our sales last year, it’s how do we grow our sales next year. And those, to me, are the elements of analytic maturity. A lot of times, you see that in very fast-moving industries. For example, the tech industry tends to change very, very quickly, so there’s a high value for tools that help them navigate that change. That industry is pretty far along in terms of data. The healthcare industry is trying to use data. They have a lot of restrictions, but there are a lot of healthcare companies trying to use data in a very strategic way. Retail is definitely one of the leaders in terms of using data and being smart about it, and using that data to influence their business, rather than using it as a scorecard. So those are three off the top of my head. SI: Data is everywhere. It’s generated by almost every type of device that we have, from cell phones to Xbox to your refrigerator. How can you zero in on what you need?

Fields: That’s a question that a lot of people are grappling with now. Machine learning is going to be very important in terms of helping us catch exceptions. For example, you have strings of sensor data coming in. You may not need to handle that data most of the time, but a machine can tell you when things are off by a certain standard deviation, and you can get some automated reporting on that. That’s one way.

Solver-International.com


d

EXECUTIVE INTERVIEW A lot of big data just needs to be monitored. The key is to figure out what we need in order to get more answers, and what we just need to monitor. If you’re trying to understand root causes or get into correlations or important patterns, that might be a time when a human being dives into that data and starts looking around and working with the big data. But from an operational perspective, I think the key is to figure out what’s worth watching, and use computers to help us do that. SI: The IoT, the Internet of Things, is going to generate tons of data that’s just going to be floating around out there. We’re going to have to find a way to figure out what we want to monitor and what we want to deal with.

Fields: Right. And which ones we want to keep. I mean, if you’ve got thermostat data from a building that you own, or your own house, it may be that the last two or three days of data is all you want to keep. You don’t necessarily need to go back years and warehouse all that data. Some of the streaming data is worth keeping around, and some of it might just go back into the ether. SI: Users have many data tools they use, including Tableau and Excel. Which one is the best? Obviously, you have a bias here—but is there a love-hate relationship between Tableau and the others in the industry, or do you work together?

Fields: We definitely work together with some of them, especially the data science-oriented tools. When people

are trying to do advanced analysis, they want an open ecosystem. We do have competitors that we don’t necessarily work with. We are a partner of Microsoft so we both compete with them and partner with them. When you look at any tool, you need to ask, “What are the things I want to do with this and what am I trying to get out of it.” If you want to do a quick calculation of your grocery costs, maybe you throw that into Excel; but if you can get the data in a systematic way, then maybe Tableau is a better tool because you can connect to the system. It’s really finding and using the right tool at the right time. Again, what we’re trying to do with Tableau is let business users answer their questions across an enterprise in a very self-service and scalable way. The data scientists, on the other hand, may end up using very sophisticated tools that they create themselves in Python. SI: Going back to our earlier discussion regarding MBA programs, our audience includes many MBA students and instructors who are teaching analytics to MBA students. They expect to get into business—some of them are in business and taking night school or online classes, for that matter— and they need to know something about data and analytics, but at the same time, they don’t need to be experts at it. What advice would you give them? What’s essential that they need to learn?

Fields: I think they should be looking at three big buckets. The first is, before you start diving in and getting hung up on how many different data

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

33


d

EXECUTIVE INTERVIEW: ELLIE FIELDS sources you have, try to spend a few minutes framing up the questions and what you think the important things are. There is so much data out there, so much you could do with it, that very quickly, you can get lost if you don’t have an intent, a way of working with data. That’s number one: apply good old-fashioned strategic thinking. The second is getting conversant with some tools and understanding the basic structure of data is helpful. I wouldn’t get too hung up on any one tool or type of analysis for an MBA, but being able to understand data, understand ways that people are analyzing data, the different patterns, and being able to pick up a tool like Tableau and work the data is really powerful. You can answer questions quickly. You can get some insight. You can really get a leg up by understanding your business better. Then the third thing is to not underestimate creating change with that data. A lot of times, what people say is, “Oh, I did this analysis. I have the answer. Let’s all go do this now.” They forget that a lot of working with data is communication. It is helping people with changes, helping people see things a different way. One thing that can be useful to effect change is to have a conversation based on data––like we were saying earlier, a data conversation. You invite people into the analysis. You let them play with the data, understand the data, look at their own hypotheses, so that you can have a much richer interaction. It’s a very different model of making change than just saying “I have the answer and I’m going to talk at

34

|

Solver-International | July 2017

you until I convince you that I have the answer.” We see entire companies that have adopted this kind of selfservice analytics and data conversation approach, and they are really changing their culture to a culture that’s much more curious, in many ways much more egalitarian, because everyone has access to data and has the right to have a theory about the data. You end up with these very rich conversations about data, versus just pre-canned answers. SI: Companies now can get into data analysis without even having software on site; they have it in the cloud. What is the position of Tableau as far as the cloud is concerned?

Fields: The cloud is clearly important. We have a cloud product in Tableau Server that can sit on cloud services, like AWS (Amazon Web Services). Tableau Online is a fullyhosted cloud service. We’re moving a lot of our analytical capacity from Tableau Desktop into an authoring tool in the cloud, and we connect to a lot of cloud data sources. Our fundamental philosophy, at least at this stage, is that some companies want their systems to be on premise; some want to be in the cloud. Most are on a journey somewhere in between. Some companies say no cloud, totally on premise. They can use Tableau. Some are trying to be 100 percent cloud. Our job is not to tell companies where they should be on their cloud journey; our job is to help them do analytics wherever they are. We like to think of Tableau as kind of the Switzerland of data. Imagine a

Solver-International.com


d

EXECUTIVE INTERVIEW company with multiple data sources. Some are cloud, some are not. Some are transitioning to the cloud. With Tableau, you can simply connect to that new cloud data source when it comes online, and you’re still doing analytics in the same familiar tool you were using before. You’re just connected to a new cloud data source. That’s one thing that makes it easy for companies to transition to the cloud, because they can continue to access that data within their Tableau system. SI: How much does a good analyst need to know about big data, the data sets that are really big, bigger than an SQL database can handle? Is it necessary to have a special tool to work with this, is it something you have to deal with in a different way?

Fields: I think the fundamental concepts are still the same. It depends on the data source, and how it’s ingested, and how clean it is, and how much special handling it needs. From the analyst’s point of view, a lot of times their company or their data team will ingest big data and make it available to them, for example as a data source in Tableau Server. And in that case, there’s not really that much more an analyst needs to know. Clearly, you can find a lot more—usually, in a very large data set, there’s just more to work with. But you don’t necessarily need to approach it differently, if you’ve got your data performing enough for your analysis. You probably need to adjust your techniques, depending on the size of the data. Working with big data can be a little bit different, but again, a lot of that is in the data pipeline. If

that data pipeline is solid, the analyst ought to be able to work with very, very large data. We have business analysts who use Tableau and work with data that’s in Hadoop clusters or in Teradata or in Google BigQuery, and they’re able to work with it quite fluidly, even though it’s very large data. SI: While a lot of people use Tableau, many haven’t really looked at it or any other data visualization tool. What would you say to the non-users of visualization? Why should they learn data visualization, and specifically Tableau?

Fields: I think you’re hard pressed to find a job these days that doesn’t somehow touch data. Even if you’re working on a warehouse floor, there’s data about how the goods are moving around the warehouse. There’s data about how people move through a store. There’s data about everything. The mission of Tableau is to help people see and understand their data. You don’t necessarily need to be interested in data visualization. You don’t have to be a data geek. You just need to care about your business and care about answers. We see educators, whether it be principals or teachers, working with data about students to try and help their students achieve better. They don’t really care about data. They don’t have to care about data. They just need to use the data to do their job. I think Tableau is the best tool to let you interactively work with your data, reach out to any kind of data you want and have that conversation with your data, without having to be a specialist with it. Si

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

35


d

CASE STUDY

36

|

Solver-International | July 2017

Solver-International.com


EXCEL:

BIG DATA TOOL?

Can you work with Big Data in Excel? From the barrage of recent news, white papers, and sales calls about Big Data, you would think not. Better think again.

Photo Courtesy of 123rf.com | Š dolgachov

If you’ve had the experience of waiting in line to board a plane at your local airport only to have it announced that the departure would be late because the incoming flight was delayed, join a very big club. Flight on-time performance of the major airlines has been a growing problem as more aircraft are in the sky, more people are in the boarding queue, and more security causes slowdowns throughout the process. Finding the causes and relief from them requires exploring a lot of data, which is known as Big Data.

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

37


d

CASE STUDY A common theme expressed in multiple white papers has been that widely used spreadsheets can’t handle Big Data and advanced analytics. Companies need to move to new tools—specifically those that the vendors with the white papers offer. Implicitly, the benefits outweigh the expense and steep learning curve. But what if you can work with Big Data in a program like Excel? It turns out that you actually can, using the classic tool of statistically representative sampling. A blog post from Frontline Systems shows how you can do this with XLMiner, also known as Analytic Solver Data Mining. The project in question studies the airline data sets used in an online tutorial by HortonWorks, one of the best-known Big Data firms. This 120 million record data set covers all commercial flights within the United States from October 1987 to April 2008 – 29 commercial airlines and 3,376

airports, including 3.2 million cancelled flights and 25 million flights at least 15 minutes late. DATA BY THE MILLION The post shows how you can quickly summarize results across 120 million records using point and click—instead of writing lambda function code in Python—by performing an aggregation query against the data in Frontline’s Apache Spark Big Data cluster on Amazon Web Services. The authors were able to obtain the average delays for 341 airports, aggregated over the 22year period, as an Excel data table. Result? Farmington, N.M., seems to have the longest delays. That data table was used to create a visualization, using Microsoft Power Map. Where the HortonWorks tutorial simplified matters by restricting the data to flights originating from Chicago’s O’Hare airport, Frontline’s analysis covers all 3,376 airports using a simple menu selection: Get Data - Big Data – Sample. This allows you to draw a statistically representative random sample of about 100,000 records from the Apache Spark cluster. Like HortonWorks, the post partitions the data into a training set from 2007 flights, and a validation set from 2008 flights.

Flight delay times, by airport, visualized with Microsoft’s Power Map for Excel.

38

|

Solver-International | July 2017

LOGISTIC REGRESSION Applying logistic regression over 100,000 records to obtain a binary classifier, using data about each flight

Solver-International.com


d

CASE STUDY

Photo Courtesy of 123rf.com | © dolgachov

to predict whether or not it was delayed, takes a fraction of a second in XLMiner. Comparing results with Iteration 1 of the HortonWorks study, the XLMiner model has essentially equivalent Recall (ratio of true positives, 0.64) and Accuracy (59 percent), even though it used less than 0.1 percent of the total data in the data set. The post goes further, using Feature Selection to ask, “Are all the variables in the airline data set really important? Which ones provide useful information about the possible delay of a flight?” A quick visualization shows that the scheduled departure and scheduled arrival times have the strongest correspondence with departure delays, according to the Welch test. This leads to a model that confirms

Travelers and airlines incur great costs when flights are delayed or cancelled. frequent airline travelers’ anecdotal experience: time of day really matters. An 8-variable logistic regression model has Recall=0.64, Accuracy=59 percent, but a simple 1-variable model using time of day has Recall=0.63, Accuracy=59 percent. While there are obviously some drawbacks to pushing Excel or any other program beyond its design parameters, exploring those parameters with care can show how to use common systems in uncommon ways. Don’t think you can’t handle Big Data until you try. Si

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

39


d

COVER FEATURE

ANALYTICS IN THE CLOUD

BY DAN FYLSTRA

40

|

Solver-International | July 2017

Solver-International.com


d

COVER FEATURE “Analytics in the Cloud” is a hot topic these days in news articles, blog posts, and vendor webinars. But what does this mean for you? What exactly is “the Cloud,” anyway? What are its advantages and drawbacks for analytic modelers?

You are most likely to use specific WHAT IS “THE CLOUD?” software applications that are “hosted “The Cloud” refers to computing in the cloud” – that’s called SaaS or services that you can use remotely “Software as a Service.” In a 2017 survey over the internet, usually – but not from Okta, the top three generalalways – through a web browser. The purpose SaaS offerings were Office 365, term “cloud” in computer networking Salesforce.com, and Box.com – we’ll dates back to at least the mid-1990s, discuss analytic SaaS offerings later. but was popularized in 2006 when These applications run on “virtual Amazon introduced its “Elastic servers” (more on that later) in a public Compute Cloud” service. Inside “the cloud,” there’s a vast array of computers, memory storage, communication channels and more – but you typically don’t have to deal with any of that complexity to use a cloudbased service. When you run analytics software “on premise,” you are using your own, or your company’s computer hardware, network lines, electricity, physical space, and technicians who must maintain that equipment. When you run analytics software “in the cloud,” all those elements are provided as part of the service – though your keyboard input, display output, and some “The Cloud” includes Software as a Service (SaaS), Platform as a Service (PaaS), and computing still happens Infrastructure as a Service (IaaS) offerings. on your own computer. Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

41


d

COVER FEATURE: ANALYTICS IN THE CLOUD cloud service – referred to as “PaaS” (Platform as a Service”) or “IaaS” (Infrastructure as a Service”) offerings. In a 2017 survey from Synergy Research, Amazon Web Services was the market leader, following by Microsoft Azure, Google Cloud Platform, and IBM Cloud. CLOUD BENEFITS AND DRAWBACKS Cloud computing is having a huge impact. For most companies, it’s a lot of trouble to purchase, install, and maintain computer

equipment – they would rather focus on their own business. Because cloud computing services are effectively “rented” by the hour, day, month, or year, what was once a “capital investment” is now an “operating expense” for many firms. The major public cloud providers reap economies of scale by operating millions of server computers, and competition means that many of these economies are “passed through” to cloud users.

The Microsoft Azure data center in Chicago has 162 “cargo containers,” each one housing up to 2,500 physical servers. Each server can run eight or more virtual servers or cloud services.

42

|

Solver-International | July 2017

Solver-International.com


d

COVER FEATURE Information security – once thought of as a major drawback of cloud computing, since it relies heavily on the internet – is now emerging as a benefit, when public cloud services are realistically compared to on-premise information systems. In eight major security breaches in 2013-2015, suffered by leading firms such as Target, Home Depot, JP Morgan, and eBay as well as the federal government, every breach occurred in on-premise data centers – not one in a public cloud. Even the CIA has been using a private version of the Amazon cloud for the last three years. This doesn’t mean you can ever take security for granted! The public cloud services provide leading information security features, but you

and your application provider have to use them. Using easily-guessed passwords, or the same login and password on multiple systems, makes you the weak link in information security. Paying attention to the “padlock” icon in your browser, using https: (SSL or Secure Sockets Layer encryption) instead of http: (no encryption), not opening “phishing” emails, and keeping your own antivirus software up to date are other, everyday measures you need to take. Availability is another possible drawback – or benefit – of cloud computing. When internet access is “down,” or if your cloud service provider needs to perform

solver... International

Educating and Empowering the Business Analyst

Solver-International. com provides immediate information, exclusive articles, and updated news for businesses and academics. The Website also offers daily and weekly access for everyone seeking the latest tools to improve their analytic capabilities.

Solver-International.com

SolverIntl-website-halfhoriz.indd 1

Educating and Empowering the Business Analyst

7/12/17 12:11 PM

July 2017 | Solver-International

|

43


d

COVER FEATURE: ANALYTICS IN THE CLOUD maintenance, you cannot use a public cloud service. But this must be compared to “downtime” on your own equipment, and availability when you are traveling. All in all, cloud computing is a “better idea,” and its impact continues to grow. HOW DOES IT WORK? Cloud computing became feasible because of two key technologies – highspeed data communications and the internet, and virtualization of computing hardware – plus industry standards. People over 50 can probably remember terminals and modems that operated at “1200 baud”, about 120 characters per second; today’s data flows 1,000 to 10,000 times faster, even for consumers. IT professionals can remember when a physical computer could run only one operating system, and one or two applications at a time; today, a physical computer, which occupies less than an inch vertically in a “rack,” can run eight to 16 “virtual computers,” each with its own operating system and applications. Virtualization means that your analytic problem – say data mining, optimization, or simulation – can be run in a remote data center, using only a “slice” of the CPU time and memory on a physical computer – and high-speed communications means that results can be brought back to you in seconds. But your own desktop, laptop, or tablet’s processor and memory still play an important role in using a cloud application. Your web browser is a sophisticated piece of software that interprets HTML (Hypertext Markup Language), CSS (Cascading Style Sheets) and JS

44

|

Solver-International | July 2017

(JavaScript programming code) – cloud applications rely heavily on its capabilities. Many people are running old browser versions, and even more are running 32-bit versions of Chrome, FireFox, or Internet Explorer. Your cloud application can run out of memory in the browser, or run slowly because of it. Upgrading is free, and it pays to make sure you have the latest 64-bit version of your web browser. WHERE IS THE DATA? Applications of all kinds – but especially analytics applications – make use of models and data, and both must be accessible to the analytics software, wherever it is running – on your own computer or server, or in the cloud. If your model and data are on your own desktop PC, you can either run the analytics software on that same PC, or transfer them (for example via file upload) to a cloud-based application. High-speed data communications has made this an easy, everyday experience for many users – who hasn’t used Office 365, Google Docs, Box.com, or Dropbox.com? Excel workbooks and CSV (Comma Separated Value) files are very common ways to store modest-size data sets. And increasingly, large databases and other data sources are maintained online (“in the cloud”), so it’s as easy – or easier – to access such data in a cloud application as in a desktop application. This is true even of “Big Data” – data sets too large to fit in a single traditional database. For example, Frontline Systems, sponsor of this magazine, operates an Apache Spark Big Data cluster on Amazon Web Services, and currently offers Solver-International.com


d

COVER FEATURE free use of this cluster to universities teaching analytics using Frontline’s other tools.

ADVANCED ANALYTICS IN THE CLOUD With cloud computing services, data increasingly hosted in the cloud, and BI and data visualization tools available in the cloud, it makes sense that analytics tools – for forecasting, data mining, simulation and risk analysis, decision analysis, and mathematical optimization – should move to the cloud as well. Partly because they are more compute-intensive than BI or general office applications, advanced analytics tools have remained “on-premise” for somewhat longer. But the shift is clearly underway. Microsoft offers a popular cloudbased service called Azure ML for data mining and machine learning. IBM’s offerings include “BigInsights on Cloud” and “IBM Analytics for Apache Spark.” SAS Institute offers SAS Cloud Analytics, and FICO Inc. offers FICO Analytic Cloud to its large customers.

BUSINESS INTELLIGENCE IN THE CLOUD Analytics models often use data found in BI or “business intelligence” systems, or in “data warehouses.” These systems collect, and usually summarize data drawn from day-to-day “transactional” database systems. They may offer a relational, tabular, or multidimensional “view” of the data. In the BI world, it is common to speak of “analytics,” but this usually means “slicing and dicing” and “drilling down into” data – the analysis is usually limited to “sum and group by.” To clarify this, Gartner has begun referring to mathematical methods as “advanced analytics.” BI systems were traditionally run in on-premise servers, but increasingly they are moving to the cloud. Amazon Web Services offers a highly scalable relational database called Amazon Redshift, and recently introduced a BI service called Amazon QuickSight. Microsoft offers Power BI, a cloud-based service that works with desktop Excel and Power BI Designer. Tableau offers a cloud service called Tableau Online, that complements its Tableau Server and Tableau Desktop products. All of these tools can “slice and dice” and “drill down into” data, and create sophisticated visualizations Frontline’s AnalyticSolver.com offers forecasting and data mining, simulation and risk analysis, and mathematical optimization tools that also work with desktop Excel. and dashboards.

Photo Courtesy of Frontline Systems

Educating and Empowering the Business Analyst

July 2017 | Solver-International

|

45


d

COVER FEATURE: ANALYTICS IN THE CLOUD Frontline Systems, sponsor of this magazine, operates AnalyticSolver.com, a cloud-based SaaS offering that is hosted on Microsoft Azure. It includes tools for forecasting, data mining and text mining; Monte Carlo simulation and risk analysis; decision tree analysis; and conventional and stochastic optimization. DEVELOPING CLOUD-BASED APPLICATIONS You might be wondering: How are cloud-based applications developed? What about mobile apps? Can an application developed for a desktop or laptop run in the cloud, or on a mobile device? A complete answer would require a full-length article, but in brief: Developers need new skills to create cloud and mobile apps – but those skills can be learned, and modern developer tools ease the way. Where a desktop or server application typically uses callable libraries through an API (Application Programming Interface), a cloud-based app typically uses services through a REST – Representational State Transfer – API. (Older apps use a SOAP – Simple Object Access Protocol – API). Cloudbased APIs for analytics are relatively new, but Microsoft and others offer REST APIs for machine learning, and Frontline Systems’ Rason.com service is a developer portal offering data mining, simulation and optimization REST APIs. WHAT ABOUT DESKTOP ANALYTIC SOFTWARE? The advent of cloud-based analytics doesn’t mean that desktop-based analytics is going away – after all, modern desktops and laptops are more powerful and capable than ever, and

46

|

Solver-International | July 2017

you certainly can use them to create and solve analytic models. But you don’t have to choose – you can use the best of both. Chances are good – especially if you work in a large company – that you’re already using both desktop Microsoft Office and cloud-based Office 365. If so, you’ve already seen that it’s easy to work on the same Excel workbooks, Word documents, or PowerPoint presentations in both environments. If you use Google Sheets, you know that you can upload and download Excel workbooks from/ to the desktop. Tableau Desktop and Tableau Online, and Power BI Designer and PowerBI.com also inter-operate easily. Frontline Systems’ AnalyticSolver. com cloud service is designed to work easily with its desktop Analytic Solver software for Microsoft Excel – the same forecasting and data mining, simulation and risk analysis, and optimization models work in both versions. WHAT’S NEXT? Analytics in the cloud isn’t just coming, it’s here. If you’re reading this magazine, there’s a decent chance that you’ve heard about or tried Solver, Risk Solver, or XLMiner Analysis ToolPak for Excel Online or Google Sheets, XLMiner.com, or AnalyticSolver.com – more than 300,0000 users have tried them when this article was written, and the total is rising every day. You can count on cloud-based analytic software getting more and more capable, and desktop software doing the same. So by all means, use them! If you haven’t already, now’s the time to add cloud-based analytics to your arsenal of tools. Si

Solver-International.com


Welcome to Analytic Solver ® Cloud-based Optimization Modeling that Integrates with Excel

Everything in Predictive and Prescriptive Analytics Everywhere You Want, from Concept to Deployment. The Analytic Solver® suite makes the world’s best optimization software available in your web browser (cloud-based software as a service), and in Microsoft Excel. And you can easily create models in our RASON® language for your server, web and mobile apps.

Linear Programming to Stochastic Optimization. It’s all point-and-click: Fast, large-scale linear, quadratic and mixed-integer programming, conic, nonlinear, non-smooth and global optimization. Easily incorporate uncertainty and solve with simulation optimization, stochastic programming, and robust optimization.

And it’s a full-power, point-and-click tool for forecasting, data mining and text mining, from time series methods to classification and regression trees, neural networks and association rules – complete with access to SQL databases, Apache Spark Big Data clusters, and text document folders.

Find Out More, Start Your Free Trial Now. In your browser, in Excel, or in Visual Studio, Analytic Solver comes with everything you need: Wizards, Help, User Guides, 90 examples, even online training courses. Visit www.solver.com to learn more or ask questions, and visit analyticsolver.com to register and start a free trial – in the cloud, on your desktop, or both!

Monte Carlo Simulation, Data and Text Mining. Analytic Solver is also a full-power, point-and-click tool for Monte Carlo simulation and decision analysis, with a Distribution Wizard, 50 distributions, 50 statistics and risk measures, and a wide array of charts and graphs.

Tel 775 831 0300 • Fax 775 831 0314 • info@solver.com


Society News

INFORMS

Polly Mitchell-Guthrie

Thinking aCAP Analytics students: Start your career on the road to success with aCAP certification

W

ith an increasing number of industries relying on analytics to better understand and harness the power of their data, the need for talented analytics professionals is increasing rapidly. By next year alone, the United States will be facing a shortage of nearly 190,000 data scientists. At the same time, a growing number of universities across the country are launching or expanding their analytics programs, and more and more students are preparing to enter this growing field. For emerging analytics professionals who lack the realworld experience necessary to distinguish themselves from their peers in an increasingly competitive job market, the Associate Certified Analytics Professional (aCAP™) program serves employers as a trusted, independent verification of an individual’s analytics knowledge and abilities. In addition, as the first step towards the full Certified Analytics Professional (CAPŽ) certification, it demonstrates

48

|

an early dedication to staying abreast of current best practices through continued learning opportunities. With more than 200 companies in 20 countries employing professionals with a CAP certification, the program is a world-renowned certification, recognized by organizations both large and small. Currently, 20 percent of Fortune 100 companies employ at least one analytics professional with a CAP certification. Managed by INFORMS, the leading international association for operations research and analytics professionals, the CAP program was launched following a job task analysis conducted by a group of subject matter experts selected from around the world. Built upon seven unique analytics areas of practice, the program combines education, skill and experience with an analytics code of ethics and an in-depth exam that ensures anyone with a CAP certification represents the highest standards in the analytics field.

Solver-International | July 2017

The aCAP certification serves as a bridge for young professionals who are transitioning from their university experience to successful employment opportunities, where they will accumulate the necessary experience for a full CAP. By adhering to most of the same rigorous requirements as the CAP certification (minus the additional years of related work experience and a demonstration of soft skills that comes with experience), potential employers are assured that anyone with the aCAP certification will be a valuable addition to their organization. For these young professionals, the requirements of the aCAP program will ensure continued growth and education on the most up-to-date best practices in the analytics field. Si Polly Mitchell-Guthrie is Director of Analytical Consulting Services at the University of North Carolina Health Care System and past chair of the Analytics Certification Board overseeing the CAP and aCAP programs.

Solver-International.com


ASSOCIATE CERTIFIED ANALYTICS PROFESSIONAL Analyze How aCAP Can Help Launch Your Analytics Career

The Associate Certified Analytics Professional (aCAP) is an entry-level analytics professional who is educated in the analytics process but may not have practice experience. This prestigious program begins a career pathway leading to the CAP designation.

www.certiďŹ edanalytics.org


Te r m s o f

THE TRADE

Glossary

Ambari

A

A web interface for managing Hadoop services and components

Apache Kafka

a distributed streaming platform for building real-time data pipelines and streaming apps.

Apache Spark

M

Open-source cluster computing framework with highly performant in-memory analytics and a growing number of related projects

Cassandra

C

O

A distributed database system

Cubes

A Hadoop job scheduler

P

Software for streaming data into HDFS

Python

R

Google BigQuery

G

BigQuery is Google's fully managed, petabyte scale, enterprise data warehouse for analytics. BigQuery is serverless; there is no infrastructure to manage or a database administrator.

Hadoop

H

The Apache Hadoop software library is a framework that allows the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

Hadoop Distributed File System (HDFS)

S

R

R is a language and environment for statistical computing and graphics. It is a GNU project similar to the S language and environment, The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

Solr

A scalable search tool

Sqoop

Moves data between Hadoop and relational databases

W

the scalable system that stores data across multiple machines without prior organization.

HBase

A non-relational, distributed database that runs on top of Hadoop

Pig

A platform for manipulating data stored in HDFS

Python is a high-level programming language for generalpurpose programming. Python emphasizes code readability and a syntax which allows programmers to express concepts in fewer lines of code than might be used in languages such as C++ or Java.

Flume

F

ODBC

ODBC stands for Open Data Base Connectivity, a connection method to data sources.

Oozie

A cube is a set of related measures and dimensions that is used to analyze data. • A measure is a transactional value or measurement that a user may want to aggregate. Measures are sourced from columns in one or more source tables, and are grouped into measure groups. • A dimension is a group of attributes that represent an area of interest related to the measures in the cube, and which are used to analyze the measures in the cube. The attributes within each dimension can be organized into hierarchies to provide paths for analysis.

MapReduce

a parallel processing software framework that takes inputs, partitions them into smaller problems and distributes them to worker nodes

Y

HCatalog

Welch’s Test

Welch’s Test for Unequal Variances (also called Welch’s t-test, Welch’s adjusted T or unequal variances t-test) is used to see if two sample means are significantly different. The null hypothesis for the test is that the means are equal. The alternate hypothesis for the test is that means are not equal.

YARN

(Yet Another Resource Negotiator) provides resource management for the processes running on Hadoop

A table and storage management layer

Hive

A data warehousing and SQL-like query language

50

|

Solver-International | July 2017

Z

Zookeeper

An application that coordinates distributed processing

Solver-International.com


SUBSCRIBE

solver International

Solver-International.com

to receive our Monthly e-News and bi-monthly Digital Publication!


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.