SPECIAL EDITION
rediscoveringBI
Radiant Advisors Publication
BI DECISION MAKERS: THEIR SECRET LIVES
Stephen Swoyer
DATA ANALYSTS: THE NEXT C-LEVEL LEADERS?
Ted Cuzzillo
WHO IS THE DATA SCIENTIST?
Lindy Ryan
BILEADERSHIP R E C H A R G E D
01 JAN 2014 ISSUE 11
Today's BI environment is about imagining innovative ways to lead BI
04
07
32
FEATURES 04 The Secret Lives of BI Decision-Makers Stephen Swoyer
07 Who Is the Data Scientist?
Lindy Ryan
21 Leadership Roundtable
32 Can Data Analysts Be C-Level Leaders?
Ted Cuzzillo
SPONSORS
25
10 From Rearview Mirror to Predictive Analytics Ashish Gupta
13 Are You Thinking About Hadoop All Wrong?
Stefan Groschupf
17 Let’s Not Screw This Up
Michael Whitehead
25 Predict Everything
Simon Arkell
29 Win with Data Virtualization
17
Bob Eve
29 rediscoveringBI Special Edition • #rediscoveringBI • 2
FROM THE EDITOR
Editor In Chief
Today’s business intelligence environment is about imagining innovative ways to approach – to lead – BI.
lindy.ryan@radiantadvisors.com
From the emerging technologies in the marketplace, to the thought-leadership of industry experts and the fresh perspectives of the in-the-trenches BI community, this year’s Special Edition of RediscoveringBI focuses on BI Leadership: how leaders in our industry are pioneering the next era of business intelligence. Articles in this issue are by-lined by well-known industry writers and leaders within the vendor community, and range from topics on data science and discovery, to analyzing the personality of the data scientist and peeking into the secret lives of BI decision-makers. We’ve also gathered up each of our sponsors into a leadership roundtable, where they each give their answer to one very important question: what is the most important factor in BI Leadership in 2014 and how they are leading that change. At this time of year, it’s great to reflect on what you’ve accomplished in the previous year, and to give thanks to everyone that you’d had the opportunity to work alongside, to collaborate with, and to grow with. We’ve seen a tremendous response to RediscoveringBI this year, and we are sincerely grateful for our contributors and readers for helping to make RediscoveringBI an industry leading publication
Lindy Ryan Radiant Advisors
Contributor Bob Eve Cisco boeve@cisco.com
Contributor Ted Cuzzillo ted@datadoodle.com
Contributor Ashish Gupta Actian Ashish.gupta@actian.com
Contributor Stefan Groschupf Datameer sgroschupf@datameer.com
Contributor Michael Whitehead WhereScape mikew@wherescape.com
Contributor
I would also like to take a moment to recognize and thank each of this year’s Special Edition sponsors for their support and contributions to the issue: Cisco, Actian, WhereScape, Predixion Software, and Datameer.
Simon Arkell Predixion Software
On behalf of myself, our writers, and our design team, thank you!
Stephen Swoyer
Lindy R yan Lindy Ryan Editor in Chief, rediscoveringBI
sarkell@predixionsoftware.com
Contributor stephen.swoyer@gmail.com
Art Director Brendan Ferguson Radiant Advisors brendan.ferguson@radiantadvisors.com
For More Information: info@radiantadvisors.com
Research Director, Data Discovery & Visualization Radiant Advisors rediscoveringBI Special Edition • #rediscoveringBI • 3
THE
SECRET
LIVES
OF BI DECISION-MAKERS Stephen Swoyer | Distinguished Writer, Radiant Advisors
W
HEN WE THINK about leadership, we can’t help but project ourselves into a Walter Mitty-like role. In case this doesn’t ring any bells, it’s an allusion to “The Secret Life of Walter Mitty,” an oft-anthologized story penned by James Thurber in 1941. Thurber’s tale has its titular character daydreaming his way through life, chiefly because (from Walter Mitty’s perspective) his daydream-scape – wherein he assumes the identity of an Air Force pilot, among other characters – seems preferable to the routine of his workaday existence.
rediscoveringBI rediscoveringBI Special Special Edition Edition •• #rediscoveringBI #rediscoveringBI •• 4 4
The parallel to leadership has precisely to do with the problem of routine: i.e., the boring. Sample a dozen books on “leadership” and you’ll read very little about boredom. Instead, you’ll be captivated or intimidated (and sometimes simply inundated) by tales of challenges met, competitors bested, orthodoxies or conventional wisdoms outstripped, difficult or hard decisions made, and so on. Books on leadership are (almost by definition) exercises in Mittyesque storytelling: nobody wants to read (much less to publish) a boring book on leadership, after all. Thus the problem: leadership in business is preoccupied by the Cult of the Challenge; would-be leaders are encouraged to welcome (or to “seek out”) new challenges and to take their measure – i.e., as leaders, as decision-makers – by “mastering” them. In this regard, the metric of the difficult decision point, or DDP, assumes a kind of critical importance. The more DDPs one racks up, the bolder, the more decisive, the more imaginative one’s leadership! But what if the most important component of leadership is the routine? What if a definition of leadership that gives disproportionate weight to the anomalous – e.g., to critical “challenges” or to existential “decision points” – is actually unhelpful? What if – like the daydreaming
of Walter Mitty – the Cult of the Challenge is a way for us to distract ourselves from the dull, boring, and uninspiring, but (for all that) the inescapably critical, aspects of business management? This last question has especial salience with respect to business intelligence (BI) leadership, especially in the context of a BI industry besotted by big data, advanced analytics, and other breaking technology trends. The urgency attending big data, especially, all but demands a DDP-like response. If you don’t have a big data strategy, fuhgeddaboudit: you risk consigning your company to obsolescence – or (worse still) to the fate of a footnote buried deep in the backpages of an SEC filing. (“On May 30th, 2017, we acquired the assets of the former Acme Anvils Inc., a defunct provider of steel and iron props for the film and cartoon industries. Acme went into receivership because it missed the big data and CGI waves.”)
rediscoveringBI rediscoveringBI Special Special Edition Edition •• #rediscoveringBI #rediscoveringBI •• 5 5
Threat Inflation, Big Data-Style Earlier this year, Michael Whitehead, CEO of agile data integration (DI) specialist WhereScape Inc., offered an exasperated assessment of big data hype. “We haven’t solved the basic problems yet,” he told me, citing the performance issues – e.g., poor usability, limited uptake, and notorious inflexibility – that plague BI tools and BI programs. “What’s happening at the edge is all very exciting, but at the core it still takes far too long to build a basic data warehouse that is changeable and just works,” he argued. Whitehead conceded that he saw promise in big data – but qualified his concession with a caustic to-be-sure. “[Big data is] an excuse to ignore those [historical] problems with BI [tools and programs] and focus on something else. Something new. While we’re doing that, we can tell ourselves that this time it will be different. Big data [technologies] will fix these problems.” What’s encouraging is that BI decision makers are staying the course. In spite of a barrage of big datathemed marketing, CIO action items for 2013 look... superficially similar to those of 2007. Such was the counter-intuitive finding of this year’s edition of “The Successful BI Survey,” which is published annually by BIScorecard, a business intelligence and information management consultancy. “Big data is ... way down [on] the list [of priorities] for most [respondents],” Cindi Howson, a principal with BIScorecard.com, told me last October. Instead, she said, BI decision-makers are focusing on dashboards and self-service BI, as well as on mobile BI efforts. In addition, Howson and BIScorecard flagged an ongoing focus on integrating data sources and eliminating silos. (This last is a seemingly never-ending effort, thanks to the irrepressibility of spreadmarts and rogue information sources, to say nothing of the inevitability of mergers, acquisitions, and consolidations. In other words: the inescapable conditions of both business and business processes work to contest it.) In short, BI decision-makers are (at least with respect to the terms outlined above) acting leader-y. Holding fast, standing firm, digging in – pick the metaphor of your choice – in face of a full-on big data-style box bombardment. Of course, the dashboards they’re building (or which they plan to build) aren’t anything
WHAT’S ENCOURAGING IS THAT BI DECISION MAKERS ARE STAYING THE COURSE. like the dashboards of 2007 – they’re altogether more interactive, use more helpful (or intuitive: i.e., “disclosive”) visualizations, and likewise incorporate more sophisticated analytics (fed by or running against a more diverse complement of data sources) than did their predecessors. Ditto for their self-service BI efforts: self-service isn’t anything new, but – with big BI vendors such as Information Builders Inc. (IBI), IBM Corp., Microsoft Corp., MicroStrategy Inc., Oracle Corp., SAP BusinessObjects, and SAS Institute Inc. now shipping self-service BI discovery tools – it’s starting to become a lot more compelling. So, too, is mobile BI: QlikTech Inc., for example, started employing a mobile-first development philosophy last year. Other vendors have followed. There’s legitimate reason for optimism in these developments. There are also grounds for a sobering reality check. Even though BI vendors are delivering improved (i.e., substantively usable) tools, and even though decision-makers not only haven’t succumbed to big data hysteria but (to their collective credit) seem committed to getting their BI houses in order, a critical metric of BI success – viz., adoption – continues to lag. More to the point, BI adoption hasn’t changed. Not one iota. There simply hasn’t been any improvement Continued on P35 rediscoveringBI Special Edition • #rediscoveringBI • 6
WHO IS THE
DATA
SCIEN TIST?
Lindy Ryan | Research Director, Data Discovery & Visualization, Radiant Advisors
ATA SCIENTISTS have been called the sexiest new job in business intelligence (BI). They are the stuff of dozens of articles, infographics and, on occasion, the beneficiary of a zinger or two about ponytails and hoodies. In a dueling keynote at SPARK! Austin, Dr. Robin Bloor put on a pair of thick-rimmed glasses and morphed into a data scientist in front of my eyes. At Teradata PARTNERS in October Neil Raden called them new versions of old quants. They’ve been firmly denied any of Jill Dyche’s affections according to her blog post Why I Won’t Sleep with a Data Scientist. And, my personal favorite: industry writer Stephen Swoyer called them unicorns – fantastical silver-blooded beasts and impossible to catch.
D
rediscoveringBI rediscoveringBISpecial SpecialEdition Edition• •#rediscoveringBI #rediscoveringBI• •7 7
The list of these emerging data scientist urban legends could go on. My point is this: everyone who’s anyone has something to say about a “data scientist.” But for all their allegorical appeal, what – or who – is a data scientist? I talked to three, and a few other folks in the industry, and here is what I’ve found. The Skillset of Data Scientist, Ph.D. One often-disputed characteristic of the data scientist is their educational background and skillset, and there are some essentialities to being a good data scientist. They must be of an analytical and exploratory mindset; they must have a good understanding of how to do research; they should be possessing of statistical skills and be comfortable handling diverse data; they should be clear, effective communicators with the ability to interact across multiple business levels; and, finally, they must have a thorough understanding of their business context. But do they need a Ph.D. – or, to put it another way, should our data unicorns be limited to Data Scientist, Ph.D.? Dr. Alexander Borek of IBM – who has a Ph.D. in Engineering and Data Management himself – says that typically a PhD student with statistical skills is a good candidate for a data scientist role – so long as he or she is willing to engage with the business context. Further, Boulder-based data scientist Dr. Nathan Halko – whose Ph.D. is in Applied Mathematics – says that math teaches you the ability to abstract problems and dive into data without fear. And, while a business background is important, it doesn’t give a data scientist the skillset to execute a data problem: Nathan says that some of his contributions within his organization have been ideas he’s hacked together over the weekend because he has the ability to execute – turning those ideas into solutions that can be delivered to others. Perhaps data people simply can more easily understand what the business needs than a businessperson can understand what the data is capable of? Either way, competency in mathematics and statistics is unanimously important for a data scientist – perhaps more important than a business background. Yet, there’s also a common sentiment that it’s (typically) easier to interface with someone who has a business background, as opposed to a data scientist. And that’s part of a business-education skillset: clear, effective communication delivered in a simple format that
THE CHARACTERISTIC THAT REALLY SEEMS TO SET THE DATA SCIENTIST UNICORN APART FROM THE DATA USER HERD IS THEIR
PERSONALITY.
business executives expect – and that lacks some mysterious data jargon. Equally as important for the successful data scientist is the ability to translate and engage between both business and IT, and have a firm understanding of the business context in which they operate. Why They Are/Aren’t Unicorns The two competencies are complementary, even if they are imbalanced. But there’s a third skillset of these elusive data scientists that’s a little more… intangible. At this month’s Big Analytics Roadshow in New York, a data science panel – comprised of data executives from Teradata, Comcast, Tableau, and Radiant Advisors – had much to say the perfect data scientist resume (though it was noted noted that the days of typing up a resume is “so twentieth century”). They also had quite a bit to say about what’s not on the resume. Teradata’s SVP of Global Product Deployment and Strategy Tasso Argyros said that you don’t need a degree that says Data Science to be a data scientist any more than you need an MBA – you need a foundarediscoveringBI Special Edition • #rediscoveringBI • 8
A Special Thank You to Dr. Nathan Halko of SpotRight, Dr. Alexander Borek of IBM, and Siva Yannapu of Blue Cross Blue Shield for contributing your thoughts and insights. You guys are unicorns in my book (except Nathan, who thinks unicorns have wings and is therefore a pegasus).
tional understanding of these concepts, yes, but more important, you need an eagerness to explore and discover within data. The characteristic that really seems to set the data scientist unicorn apart from the data user herd is their personality. A true data scientist possesses what John O’Brien of Radiant Advisors called a “suite of hidden skills”, including things like innovative thinking, the readiness to take risks and play with data, and a thirst to explore the unknown – and he looks to see how these skills are embedded within the blend of education and experience. Even in their own self-descriptions data scientists do seem to echo those same characteristics. Siva Yannapu of Blue Cross Blue Shield noted that data scientists are outof-the-box thinkers; Nathan Halko described the data scientist as willing to have their hand – or hoof – in the metaphorical cookie jar (of data). Being a data scientist isn’t about checking off a list of qualifications and adding buzzwords to your resume. It’s about becoming a data scientist – having the eagerness and hunger to dig deep inside data and find value. That’s aspiration, and it’s an intrinsic characteristic not taught in any program. Sexy By Association As far as I can tell, my little herd of unicorns doesn’t find themselves all that sexy – or rare even. One said it’s all the new data that’s really sexy, not the guy in the glasses tinkering with it, thereby making the data
scientist merely Sexy By Association. What they do think is that the role of a data scientist is a very interesting one that is intellectually challenging and can have a huge impact on the success of the business. Data scientists, according to Alex, are generating a quick and solid return on investment for most businesses and they will soon become a solid component of any larger business organization. Hence, it is a good future to bet on. Sure it’s sexy and maybe a little (or a lot) elusive, but we’ve got to start thinking of data science more broadly – not a particular set of technology or skills, but as those people that have a set of characteristics: curiosity, intelligence, and the ability to communicate insights and assumptions. Those naturally inquisitive people already living and breathing the data within our businesses are every bit as much a data scientist, even if they don’t have the fancy title. One thing is for certain: data scientists come in many colors as the rainbows that their fantastical counterparts dance upon. But data scientists, no matter how sexy or rare they are, aren’t the only source of discovery within the organization, especially with the rapid increase of self-sufficient discovery tools allowing everyday business users to explore their own data. If you define data scientist community as a set of skills, you’re missing out on a ton of people that already exist in your organization, and that can contribute a ton of value, too. rediscoveringBI Special Edition • #rediscoveringBI • 9
FROM
REARVIEW MIRROR
TO PREDICTIVE ANALYTICS Ashish Gupta | CMO and SVP Business Development, Actian
The Age of Data Is Upon Us! The state of data as we have known it is forever changed. Never again will data be static – it is born digitally and flowing constantly. The data tap has been turned on and it will never be turned off. It’s likely you’ve already been affected by some of the key hallmarks of this new era: • Utilization of massive amounts of data providing context and insights in real time • Data born in and residing on the cloud, leveraging amazing elasticity across the value chain • Social-informed decision-making that utilizes information from a variety of sources at record speed In the Age of Data, the old technology and tired approach to business analytics and intelligence we’ve depended on (and paid handsomely for) for years won’t cut it. Traditional stack players are at a significant disadvantage as they are stuck in the “Innovator’s Dilemma.” Times of market upheavals introduce a new era with a new set of winners. Those that don’t or can’t change are left behind. Backwardlooking BI is not adequate for the new era. Forwardlooking analytics demand what Radiant Advisors calls the “modern data architecture,” to scale and deliver the flexibility that enables customers to unlock the business value of their data.
The Age of Data is a time when the leaders of the past show fading likelihood of a viable future while the leaders of tomorrow gain a solid foothold. The interregnum between old and new is both dynamic and challenging, a modern-day Wild West. As the dust settles, the winners in this new era will become very apparent. How will these winners beat the competition? Certainly not by looking in the past. They’ll win by leveraging data analytics like a crystal ball to anticipate future developments and take action for transformative competitive advantage or risk avoidance. Analytics and BI are fast evolving from the domain of high science and deep technologists to mainstream adoption everywhere you turn. The data is at hand, and now businesses for the first time have the opportunity for line-of-business users to: • Discover what can’t be seen with human observation to act on a short-lived opportunity, or avoid a fast-emerging threat • Predict what will happen rather than peering into a rear-view mirror • Prescribe the optimal action, promotion, or offer for an organization and its customers Several times in my career, I’ve have had the good fortune of being on the right side of the equation when there is a tectonic shift in the industry caused rediscoveringBI Special Edition • #rediscoveringBI • 10
by radical changes in customer needs met by new vendors with a more modern solution. In such periods of change, legacy players are ousted from their leader position because they are unable to innovate. When PBX legacy vendors refused to replace their proprietary hardware and software stacks, they quickly found themselves losing to software-based solutions, like Microsoft Lync or cloud offerings. Proprietary hardware-based videoconferencing services were overthrown by newer, more modern offerings. Interestingly, in each case, pure-play software companies are heavily favored in the interregnum. Actian has placed bets on a completely modern set of technology assets all designed for the opportunities presented to us in these dynamic times. Our mission is to deliver Business Analytics Optimization solutions that irreversibly shift the price/performance curve beyond the reach of traditional legacy stack players stuck in the Innovator’s Dilemma. We aim to
analytics all do the same exact things with the same old data. They ask the same questions (and not always the right ones!). They query the same samples of data at the same intervals. They never win using analytics, because they never innovate. • The only constant is change. The market will change, the business will change, technology will change and people will change. Companies, people and technology that do not ride the wave of change will be crushed by it. How can organizations both large and small successfully pivot their businesses to evolve beyond a backward-looking dashboard approach to outcompeting their peers with predictive analytics? All organizations should be able to take action on the insights gained from their data, and they should be able to do it while that information still matters.
Predictive analytics initiatives will soon move beyond the realm of deep data science and IT, and into the hands of lines of business to prompt and even automate meaningful business action. democratize analytics and help our customers get a leg up on their competition, retain customers, detect fraud, etc. by helping them leverage their data to take action today based on predictions of the outcomes of tomorrow. Winning in the Age of Data The opportunities afforded to us in the Age of Data are nearly limitless. The challenges are also very real. Organizations will face a new set of obstacles, and so will the technology they leverage. Is your current analytics solution capable of helping you manage the following challenges? • Time is the new gold standard. You can’t produce more of it. You can only increase the speed at which things happen. • Noise makes signals difficult to hear. When everyone says or does the same thing, it’s just noise. The truth is that most companies doing
Actian’s approach is aligned with the needs of a marketplace that demands simplicity and compelling price-performance. Our end-to-end platform gives organizations the ability to connect, prepare, optimize, and analyze data on a cloud-ready platform that also runs natively on Hadoop. The real kicker is that organizations can do all of this on commodity hardware at a fraction of the cost legacy vendors could ever imagine. The Promise of the Age of Data Data is growing, and it is moving very quickly. And so is the ecosystem that supports data analytics. We’ve spent the last few years defining the size of big data and the basic frameworks needed to manage it. We’ve determined that Hadoop, while still in its infancy, is extremely valuable as a “data lake,” and is the perfect data source for most BI projects. We’ve also all agreed that the Hadoop ecosystem of today needs some work before we see more organizations move their big data rediscoveringBI Special Edition • #rediscoveringBI • 11
initiatives from sandbox to production. We’re fast approaching a time when more big data applications are built on a Hadoop platform, customized by industry and business needs. Predictive analytics initiatives will soon move beyond the realm of deep data science and IT, and into the hands of lines of business to prompt and even automate meaningful business action. Soon, we’ll see Hadoop interacting with all other platforms, as a data services hub in the midst of a rich ecosystem of data flows. Workloads will gravitate to the most suitable platform and the compute process will occur where the data lives. Gone are the days where we waste time, money, and compute power moving the data to the process. Actian’s approach to big data analytics is well aligned with the next phase of big data maturity and beyond. Our platform is purpose-built to accelerate analytics and support the most promising opportunities presented by emerging data types and forward-looking analytics at scale. Actian provides end-to-end advanced, “next generation” architecture to create and sustain business value. We help organizations combat the challenges of time constraints, innovation imperatives and constant change in the Age of Data to:
Deliver the Fastest Time to Analytic Value. Actian Accelerates: • • • •
Business response times to real-time Cost savings with 200X price performance Analytic iteration by 100X or more New analytic application delivery to minutes
Drive More Innovation Faster. Actian Helps You Innovate With: • No boundaries for the business • No limits on massive amounts of diverse data • No constraints on analytics or analysts Adapt to Change Faster. Actian Helps You Adapt: • Instantly to changes in the business and market • Rapidly to the massive influx of new data • Intelligently to the constant mixture of new workloads • Immediately with rapid analytic deployment Actian’s analytics platform is structured to help customers of all sizes – not just the massive enterprise with the massive budget – turn data into business value. With Actian, customers have gone from seeing business intelligence as an unattainable goal, to an implementable strategy that that sits right in the palm of their hands. rediscoveringBI Special Edition • #rediscoveringBI • 12
ARE YOU THINKING ABOUT HADOOP ALL
WRONG?
Stefan Groschupf | CEO, Datameer
W
E ARE AT A technological crossroads.
Forty years ago, when databases first came in to play, hardware was by far the bigger cost over human capital or time. The traditional 3-tier architecture of first needing to extract and transform data before loading it into a data warehouse, and then putting a business intelligence (BI) tool on top of that was the best we could do given the limitations of proprietary hardware that was extremely expensive to scale. And, while this approach worked very well -- and still does to this day -- the fact is that business needs today go above and beyond what traditional databases are capable of doing. Today, human capital and time are the far bigger expense over hardware, and businesses in general have less and less time to make decisions. rediscoveringBI Special Edition • #rediscoveringBI • 13
This isn’t about teaching an old dog new tricks, it’s about letting RDBMSs CONTINUE TO WORK on the traditional transaction-based use cases they were built for...
Yet, with the exponential increase in data complexity, the time it takes to get data integrated and analyzed in traditional systems is increasing. This leaves businesses with traditional systems stuck with accepting old, incomplete data to inform decisions, or a reliance on gut feelings. Think about this: TDWI says the average change cycle to add a new data source to a data warehouse is 18 months. I don’t know a single department that could possibly wait 18 months for an answer. This isn’t about teaching an old dog new tricks, it’s about letting RDBMSs continue to work on the traditional transaction-based use cases they were built for, but then bringing new systems that were purpose built for big data workloads. Enter Hadoop Moore’s Law is what paved the way for Hadoop, a linearly scalable storage and compute platform that is optimal for data analytic workloads. This brings to the table a schema-on-read approach as opposed to the traditional schema-on-write with ETL. And it’s this fact -- that ETL is no longer needed -- that opens up big data analytics to solving new business use cases that traditional systems simply can’t. There’s no longer a prohibitive 18-month change cycle. Let me be clear. Potential cost-savings aside, the most immediate benefit your business can realize from implementing Hadoop with a self-service big data analytics tool like Datameer is a significant time-savings when it comes to integrating data. This, again, is thanks to the fact that Hadoop is linearly scalable
on commodity hardware and does not require a data model to be created before data is stored. The basic concept is this: use a self-service big data analytics tool, like Datameer, to integrate any data, all data -- structured, semi-structured, unstructured, big or small: all of it -- in Hadoop, in its raw format. Call it a data lake, a data reservoir, or whatever you will; let it be your central repository for raw data. Once you have all your data integrated -- and remember you can easily add anytime a new data source crops up -- you begin your analysis by simply building “views” or “lenses” on your data with Datameer to find the insights that matter to your business. Think of big data analytics on top of Hadoop as 3D printing. The raw data is your raw material, and just like it doesn’t matter what you want to print, it doesn’t matter what kind of analysis you want to perform -- your data stays raw and is pushed through a template you build in Datameer, to unveil the insights you’re looking for. Don’t Get Stung By Hive One thing I want to make very clear is that Hadoop is not “just another data source,” and any BI tool that simply connects to Hadoop as a data source is severely limiting Hadoop’s potential benefit to your business. In fact, if a BI tool is your only interface to Hadoop, you’re leaving a lot on the table and minimizing your ROI from implementing Hadoop in the first place. If you truly have a big data use case, involving structured and unstructured data, you need a tool that is purpose-built for Hadoop. rediscoveringBI Special Edition • #rediscoveringBI • 14
Traditional BI tools that connect to Hadoop usually do so through Hive, a data warehouse infrastructure built on top of Hadoop that allows for querying and analysis of structured data only. Like structured data stores used in traditional BI, Hive requires tables and schemas that are then queried via a SQL-type of language. This approach carries the same limitations of many existing systems in that the questions that can be explored are limited to those that have data in the Hive schema, rather than the full raw data that can be analyzed with Datameer. Forcing data into a schema with Hive negates the flexibility that Hadoop provides.
sources that are housed in several different systems, including search engine advertising, marketing automation and email nurturing, and CRM systems. The level of integration -- including trying to join structured and unstructured data -- was extremely time consuming and cost prohibitive with their traditional systems.
In short, if you’ve invested in Hadoop, and you want to be able to have a business user build an analysis on data stored in Hadoop, that’s great, but the BI tool is limiting them to structured data only. This is why using a self-service tool that works directly with MapReduce and HDFS, like Datameer, is critical.
Datameer’s 55 pre-built data connectors enabled the company to quickly load data from Google Adwords, web logs, logs from a content delivery network, Marketo, JSON from product logs, and a CRM system -- all within less than a week. After the initial load, Datameer was set up to load data on an hourly basis.
Ultimately, if you want to bring Hadoop in to complement your existing data warehouse, it all comes down to your particular business use case(s). Let me illustrate what’s possible by bringing Hadoop to the table with three different examples.
From there, business users joined all the data together using Datameer’s spreadsheet user interface, and started to build analyses using Datameer’s pre-built analytic functions.
Sales Funnel Optimization A leading software security company used Datameer and Hadoop to integrate and analyze all of their customer acquisition data to understand and measure how people move through a sales funnel to eventually become a customer. This meant bringing together data
If you truly have a big data use case, involving structured and unstructured data, you NEED A TOOL that is purpose-built for Hadoop.
Through the analysis, they identified the bottlenecks in the conversion process that enabled them to triple customer conversion and increase revenue by $20 million within six months. Predictive Maintenance While a lot of big data use cases are about making money, using data to optimize production is a
great way to save time and money. One study found that auto-manufacturing executives have estimated the cost of production downtime ranges anywhere from $22,000 per minute to $50,000 per minute. One of the leading global auto manufacturers used Hadoop and Datameer to combine unstructured and structured data from Programmable Logic Controllers (PLC) and proprietary factory and maintenance ticketing systems. The PLC devices housed detailed robot data, including the temperature of components when the robot broke down. By pulling together and analyzing temperature and vibration sensor log files with maintenance history, the manufacturer was able to understand why certain robots broke down in the past. With this knowledge, the manufacturer was able to create a robot maintenance schedule to identify and service robots before failure occurred, which resulted in lowering its factory outage time by 15 percent. Competitive Pricing Agility is a must when it comes to making rapid pricing decisions based on competitive data. Using a traditional BI tool and data warehouse, a major retailer’s IT team struggled to prepare the necessary data in a timely fashion, plus rediscoveringBI Special Edition • #rediscoveringBI • 15
the cost of expanding the existing data warehousing systems proved to be prohibitive. The team needed a single hub of all competitive pricing information for all lines of business that was flexible and could handle the variety and volume of new data coming in. The team also required a single datastore as the entry point and hub for all enterprise data assets that could also feed other decision systems. Using Datameer, the team was able to load raw data of all different sizes and formats into Hadoop daily, and then cleanse and transform the data using pre-built functionality in Datameer. Datameer and Hadoop then fed lower-capacity data warehouses like Netezza, Oracle, and Teradata. By bringing all the data together, the retailer was able to compare product
pricing with competitive stores, test hypotheses, and gain competitive insights. The Bottom Line When leveraged properly as the powerful storage and compute platform that it is, Hadoop brings an absolutely unprecedented amount of flexibility to data analytics. With Datameer on top, Hadoop’s no-ETL and schema-on-read approach means not only no more 18-month change cycle, but it grants business users the flexibility to finally interact with their data on a truly iterative basis. Only then are businesses able to ask and answer questions they’ve never been able to ask before, allowing them to make data-driven decisions that drive their business forward.
rediscoveringBI Special Edition • #rediscoveringBI • 16
LET’S
I
NOtSCREW
THIS UP
Michael Whitehead | CEO, WhereScape
F YOU’RE AT ALL familiar with whom I am and what I do, you’d probably identify me as a big data skeptic. In the present instance, however, I’ve come to praise big data.
Big data matters because it has cachet. Just about everybody – from the person in the street to the C-level executive on high – has heard of it. The executives are alert to DM and to DM-related issues in ways they haven’t been for almost two decades. They’re interested in what’s going on. The practical effect of this is that IT and DM are relevant again.
I see this as a critical second chance. Fifteen years ago, we in DM botched the decision support revolution. In an atmosphere of excitement, optimism, and critical C-level buy-in, we set the bar too high, neglected to manage expectations, and categorically failed to follow through. We delivered inflexible data warehouse systems. We developed unusable business intelligence (BI) tools. We forced organizations to change their business processes to suit our own product agendas. And in the nascent Age of Big Data, we’re gearing up to do it all over again.
rediscoveringBI Special Edition • #rediscoveringBI • 17
Little (Interposing) Boxes, All the Same The premise of decision support was actually grounded in a promise: if companies embedded reporting and analytic insights into their business processes, they could use timely information to enrich, and in some cases to drive, decision making. In spite of early success, we didn’t make a good-faith effort to deliver on this promise. An example is the data warehouse, which still takes too long to build, still costs too much to change, and is still too hard to manage. This isn’t because the data warehouse model is broken or outmoded; it’s because the way we insist on building and managing data warehouses is wildly unrealistic.
Big data matters because it has cachet. Just ABOUT EVERYBODY – from the person in the street to the C-level executive on high – has heard of it. WhereScape customers such as Delta Community Credit Union, the largest credit union in Georgia, are making good on both the premise and the promise of decision support. They’re using WhereScape’s solutions to simplify and automate the development and management of their data warehouse environments. The combination of WhereScape 3D (a data discovery tool) and WhereScape RED (a data warehouse automation and management environment) enables them to automate the practices of warehouse prototyping, design, creation, management, ongoing optimization, and even refactoring. By eliminating hand-coding and automating the interactions and hand-offs between systems, customers like Delta Community Credit Union have been able to increase productivity and accelerate the delivery of reporting and analytic applications.
Picture a process flow diagram, with its requisite boxes and arrows. The Platonic Ideal of this diagram would have as few interposing boxes as possible. In practice, of course, this never, ever happens. There are many good reasons for this, but one of the most important is that software vendors target the product, not the process: they pursue a strategy whereby they attempt to insert or implant a product as one of several interposing boxes in a process flow. In effect, they design themselves into a process. Think of WhereScape RED as prophylaxis. RED doesn’t have its own tool-specific language, doesn’t require its own special server, and doesn’t mandate the use of a dedicated middle tier. Simply put, RED doesn’t “impregnate” a process: there’s no “WhereScape” box in a process flow; there are, instead, direct arrows between and among sources and targets. WhereScape RED leverages open interfaces like SQL, ODBC, and XML to automate and orchestrate dataflows between systems; it automates the work of an ELT tool by pushing transformations down into a target system. It also leverages tool- or product-specific capabilities when and where they’re available. In Teradata environments, for example, WhereScape RED will use Teradata Parallel Transporter to accelerate warehouse loading. In some specific cases, of course, it’s faster and more efficient to exploit ANSI-standard SQL in place of vendor-specific tools. WhereScape RED is smart enough to use the right tool for the job. WhereScape designs and builds its products with interconnectedness – and by this I mean the need
In most cases, they’ve managed to significantly reduce costs, too. This is stuff the rest of the industry could and should be doing. Basically, it entails a conceptual shift – from a product-centric to a process-centric orientation. rediscoveringBI Special Edition • #rediscoveringBI • 18
for interoperability and exchange between “boxes,” interposing or otherwise – foremost in mind. Gimme a Break(down) The DM industry’s response to big data has been more of the sameold, same-old. In most cases, this means a bouillabaisse of proprietary, stack-centric big data “solutions,” self-serving technological or architectural prescriptions, and not-yet-ready-for-prime-time front-end tools. A few vendors even dangle that most tantalizing of tchotchkes: Big Data-in-a-Box! The thing is, there’s something altogether unprecedented about “big data:” it outstrips the conventional containers into which we like to segment or bin IT technologies. Big data is inescapably multi-disciplinary: it presupposes interconnectedness – interoperability and exchange: commerce –
seen through the lens of big data, no product is an island entire of itself. This doesn’t preclude the development of big data-oriented products that target very specific use cases, nor that of generalized “big data-themed” products which address process-, domain-, or function-specific practices. Nor does it invalidate an entire class of products as in some sense “preBig Data.” And it doesn’t necessarily preclude the development of a stack-ish, platform-like offering that consolidates multiple tools or products into a kind of omnibus “big data” product. But this isn’t what the IBMs, Oracles, and SAPs – to say nothing of the Clouderas, MapRs, or even Hortonworkses – are doing. Instead, they’re developing and marketing “Big Data-ina-Platform” products. There’s the IBM “Big Data Platform,” which comes with its own distribution of Hadoop. Teradata has its “Unified Data Architecture.” SAP has, well,
these “solutions” has in common is a product-centric model: each aims to insert or implant itself – as an interposing box – into a process. Each interposing box introduces latency and increases complexity and fragility. What’s more, each interposing box has its own infrastructure. This includes, crucially, its own vendor-specific support staff with its own esoteric knowledge-base. At best, this means recruiting armies of Java or Pig Latin programmers, or training-up DBAs and SQL programmers in the intricacies of HQL. It means figuring out kludges or hacks to compensate for the primitivity of HCatalog, Hadoop’s still-gestating metadata catalog. At worst, this means investing significant amounts of time and money to develop platform-specific knowledge-bases. WhereScape was founded chiefly to address this dysfunction. With 3D and RED, we’ve focused on
there’s something altogether unprecedented about “big data:” it OUTSTRIPS THE CONVENTIONAL containers into which we like to segment or bin IT technologies. between and among domains. It is holistic in scope in precisely the way that data management is not. From a product perspective, then, a big data-aware tool must operate in a context in which problems, practices, and processes are by definition multi-disciplinary. This means that no product is completely self-sufficient; no product is isolated, siloed, or alone: as Parson Donne might have put it:
HANA-über-alles. Meanwhile, Oracle, Dell, and Microsoft are all pushing their own, platform-specific spins on big data. And don’t forget the Hadoop platform players: Cloudera and MapR continue to pursue increasingly proprietary Hadoop strategies; DataMeer and Platfora, to pitch Hadoop-based stacks as lock-stock replacements for traditional decision support systems. The one thing each of
automating the practices and processes that support and enable a data warehouse environment, such as scoping, warehouse creation, ongoing management, and periodic refactoring. RED even automates the creation and management of warehouse documentation, diagrams, and lineage information. It does this by completely eliminating hand-coding. This means no hand-coding in SQL – nor in esorediscoveringBI Special Edition • #rediscoveringBI • 19
teric, tool-specific languages – and no manual scripting of complex and extremely brittle jobflows. RED takes care of building native database objects, automatically documents them, and schedules them to be loaded into a target warehouse. It leverages open standards like ANSI SQL, ODBC, and XML, along with database-specific SQL variants, optimized database loaders, and other available in-database facilities. RED even exploits operating system-specific features in areas such as scheduling, scripting, and caching. In other words, WhereScape doesn’t have its own infrastructure. RED and 3D speak the languages and accommodate the idiosyncrasies of the OLTP systems, warehouse platforms, analytic databases, NoSQL or big data repositories, BI tools, and all of the other “boxes” that collectively comprise an information ecosystem. In Praise of True Pluralism In this way, WhereScape products target the disconnects between otherwise isolated systems in a process. These are the points at which a process flow breaks down. Breakdown of this kind is the inevitable consequence of a product-focused development and marketing strategy. By the looks of it, we’re going to
see lots of breakdown in the big data-scape. The thing is, it categorically doesn’t have to be this way! Think of the big data-scape as a kind of free tradezone in which “trade” is analogous to process: i.e., data moves from box to box, with minimal restriction or interference. Without, that is, platform-specific embargoes; without, in other words, inessential interposing boxes. Automation is the answer. Not automation for its own sake, but automation as integral to process flow: automation that eliminates breakdown, increases responsiveness, lowers costs, and – most important – empowers IT to focus on value creation. Automation that frees IT to address the nice-to-have action items which – owing to complexity, low priority, or hypotheticity – otherwise won’t get done. Automation such as that achieved by Delta Community Credit Union, which uses WhereScape to deliver new reporting and analytic applications in days or weeks, as distinct to months. Yep: Delta Community Credit has zero tolerance for inessential interposing boxes. What about you? rediscoveringBI Special Edition • #rediscoveringBI • 20
5 INDUSTRY LEADERS GIVE US THEIR VIEWS
LEADERSHIP ROUNDTABLE
What do you feel is the most important factor in BI leadership in 2014, and how you are leading that change?
L
EADERS LEAD! This means getting in front of important trends and then pulling your organization forward.
Over the past several years you led a tectonic shift from IT-based BI with limited analytics to business-driven BI and widespread analytics propelled by self-service technology. The result has been greater insight with better performance. Congratulations on a job well done. What’s next? The data explosion is upon us, with organizations overwhelmed by machine-generated data, mobile, cloud, social data and more beyond their traditional enterprise systems. You cannot force fit traditional ETL and data warehouse based data integration onto
today’s distributed data environment. 2014 is the year for BI leaders to shift their focus from self-service BI and analytics to self-service data integration so your organization can take advantage of this amazing opportunity. The roadmap is simple. Use data virtualization to integrate the new data as well as “layer over legacy enterprise data cow paths.” Next provide the business with a self-service data directory on top. And then sell, sell, sell until this new approach mainstreams. Good luck. Your organization is counting on you! Cisco | Bob Eve, Product Marketing DV Bus Unit
rediscoveringBI Special Edition • #rediscoveringBI • 21
R
EALIZING HOW LITTLE TIME businesses have to make decisions these days, shortening the time it takes to get to insight is, without a doubt, the most valuable thing a BI or big data analytics tool can bring to the table. Datameer takes this charge very seriously by omitting altogether one of the most time-consuming and overly restrictive pieces of the traditional BI puzzle: ETL. Because Datameer uses Hadoop’s HDFS as its data store, and it has more than 50 built-in connections to the most popular structured, semi-structured, and unstructured data stores, getting data in to storage is a schema-free process that takes minutes instead of weeks. On top of that, during the ingest process, Datameer has a proprietary algorithm that pulls a highly representative sample of the data. That way, business users can work directly with the sample data in an iterative manner to build their full analysis and then run the entire big data set through the template, rather than waiting for the batch processing to return results after each transformation. Together, these advancements mean that businesses can finally count on being able to make data-driven decisions on demand. Datameer | Stefan Groschupf, CEO
I
N 2014, WE WILL witness a shift in the very definition of BI as industry innovators leverage data to anticipate future developments and take action for transformative competitive advantage or risk avoidance – a heads-up display instead of a rearview mirror. Analytics and BI are fast evolving from the sole domain of high science and deep technologists to mainstream adoption. The data is at hand, and now line-of-business users – not just data scientists – have the opportunity to predict what is going to happen; prescribe the optimal action, promotion or offer for an organization and its customers; and discover what can’t be seen with human observation to act on a short-lived opportunity - or avoid a fast-emerging threat. Actian offers a completely modern platform designed to surface opportunities that give the competitive edge in increasingly dynamic times. We deliver solutions that irreversibly shift the price/performance curve beyond the reach of traditional legacy stack players. We aim to democratize analytics and help our customers get a leg up on their competition, retain customers, detect fraud, etc. by helping them leverage their data to take action now based on accurate predictions of tomorrow. Actian | Ashish Gupta, CMO and SVP Business Development
rediscoveringBI Special Edition • #rediscoveringBI • 22
A
T WHERESCAPE, WE BELIEVE data warehouses cost too much to build and too hard to change. We’ve built our business model around this. Unfortunately, it isn’t just data warehouses that are sub-optimal. Wherever the interests of business and IT intersect, you’ll find less than optimal solutions. What’s wrong with data warehousing, what’s wrong with BI – what’s wrong with how IT and the line of business relate and interact – has to do with what always gets prioritized in project management: i.e., an obsession with minimizing risk – not maximizing outcomes. In most cases, the success of a project isn’t the foremost priority. The first priority, instead, is notscrewing-up. If project planning departs from conventional or established practice, it becomes vulnerable to blow-back; if something goes wrong, the slightest divergence from orthodoxy is going to be questioned and prosecuted. In this context, it becomes “irresponsible” to innovate or to think differently. The thing is, the way we’re doing stuff, the “best practices” and “methodologies” and “strategies” that won’t get us fired (and which permit us to cover our posteriors) just. aren’t. working. In 2014, WhereScape is going to push back against this. We’re going to encourage our customers to put potential project outcomes and risk on equal footing. Part of this involves reframing what we mean by “risk.” Instead of approaching projects in a way that prioritizes risk avoidance – e.g., “Is it worth the risk?” – we must instead prioritize high-value outcomes: “Can we risk not doing this?” Wherescape | Michael Whitehead, CEO
A
T PREDIXION, OUR VISION is to put the right information in the hands of the right people at the right time. We want our customers and partners to Predict Everything™ That’s our leadership agenda for 2014 and beyond. Believe you me, it’s a tall order. In most cases, Classical Analytics doesn’t get much beyond the Information Elite: data scientists, statisticians, business analysts, and the like. From this perspective, workers, technicians, nurses, and other employees in the field are an afterthought. These are precisely the people who can have the biggest impact! The problem of reaching them is what we at Predixion call the Last Mile of Analytics. It’s the unpaved, bumpy, poorly mapped gap stretching between prediction in the abstract – i.e., the discovery and refinement of insights – and prediction in practice, which is what it means to make predictive insights available to in situ employees. Reaching this vast constituency of people is the most important and the hardest part of analytics. At Predixion, it’s our reason for being. Competitively, it’s a no-brainer for us because nobody else is really trying to do it. Our ability to push prediction far beyond the enterprise – our demonstrable success in “paving” the proverbial last mile of analytics – is our trump card. But I’d happily give up that trump card if it meant the industry were shifting its emphasis from thinking small and – at long last! – started grappling with the problem of pushing information out to people where they work, act, and make decisions to Predict Everything. Predixion Software | Simon Arkell, CEO rediscoveringBI Special Edition • #rediscoveringBI • 23
big data summit
MAY 11-13, 2014 | DALLAS, TX
WWW.BIGDATASUMMIT.US
THE DEFINITIVE EVENT AND MEETING PLACE FOR ENTERPRISE SENIOR EXECUTIVES RETURNS! CURRENTLY STRATEGIZING THEIR NEXT BIG DATA PROJECTS. Leading topics to be discussed include: • • • •
The Impact of Big Data on the Internet of Things Big Data in the Cloud The Ethics of Big Data: Balancing the Risk and the Innovation The Key to Big Data Security
“
“
I’ve been to 7-8 events in Big Data and have never found one as productive as this. The delegation was so high and diverse. - Asheesh Mangle, VP of Sales & Business Development, Impetus
DESIGNED IN COLLABORATION
WITH THE BIG DATA EXECUTIVE BOARD:
Strategic Partner: For further information please contact: Jason Cenamor Director-Technology Summits, North America + 1 312.374.0831 jason.cenamor@cdmmedia.com
rediscoveringBI Special Edition • #rediscoveringBI • 24
PREDICT
EVERYTHING Simon Arkell | CEO, Predixion Software
a
t Predixion, we champion rapid time-tovalue for predictive analytic insights. By “rapid time-to-value,” we mean doing instead of planning or preparing to do.
In almost all cases, users of our flagship product, Predixion Insight, are getting value from predictive analytics within weeks. Do not misunderstand me: at Predixion, we believe in getting things right. Our customers continuously refine and improve the quality of their analytics. The features and innovations we’ve built into Predixion Insight make it easy for them to do just this. These same features and innovations also make it possible to democratize prediction. I’m talking about a fundamentally different way of producing and disseminating predictive insights. Customers like the Carolinas Healthcare System, Kaiser Permanente, and GE (to name just a few) use Predixion to push insights out to where they can have the greatest impact: to nurses in the hospital intensive care unit (ICU), to directors and coordinators in hospital admissions offices, to airline technicians on the tarmac – even to men and women laboring in oil fields.
rediscoveringBI Special Edition • #rediscoveringBI • 25
This is possible because Predixion Insight provides a framework for end-to-end prediction. This might sound overly ambitious, but it isn’t. With Insight, we didn’t try to build our own algorithms, and we didn’t undertake to write thousands of lines of code to address the complexities of NoSQL analytics. We didn’t have to! Smart people at Microsoft (in the SQL Server division), at EMC (in the Greenplum database division), and in the Hadoop community had already done so. As a small company, Predixion had to pick and choose what to prioritize. This meant leveraging open standards and common, in-database resources whenever
By “rapid time-to-value,” we
mean doing instead of
planning or preparing to do. they were available. For example, Insight utilizes the native data mining algorithms and analytic functions that ship with SQL Server, Greenplum’s MPP database, the open source R statistics environment, and Mahout, the predictive analytic stack for Hadoop. These platforms offer built-in functions for text mining, numerical analysis, and NoSQL analytics, along with routines for data mining and prediction. Under its covers, Predixion Insight leverages connectivity standards such as ODBC and JDBC, in addition to the data preparation and data transformation feature set of SQL Server Integration Services (SSIS). Not that we aren’t doing Very Smart things on our own. With version 3.1 of Predixion Insight, we introduced our Machine Learning Semantic Model (MLSM). The MLSM is designed to make data preparation reusable and portable. An MLSM “package” is effectively a self-contained predictive application; it includes all of the logic and data transformations required to build a predictive model. MLSM packages can be shared, collaboratively enhanced, or easily adapted to support new applications. In other words, the MLSM lets you push powerful, portable predictive applications to the outside environment – i.e., from the business campus or home office out into the field. This is uniquely powerful. It’s a complete departure
from the “classical model” – think SAS or SPSS – with its critical dependence on PhDs, data scientists, statisticians, and other power users. This is analogous to how books used to be laboriously hand-copied prior to the invention of the printing press. Both models place responsibility for the production and transmission of insight in the hands of a highly-skilled few: monks and friars, statisticians and data scientists. The problem with the SAS/SPSS model is that it cannot be extended to support both the production and the mass dissemination of predictive insights. Thanks to the MLSM, the Predixion model can. Don’t take my word for it. When GE invested in Predixion, they did so with the expectation that we could help them enable the “Industrial Internet.” Imagine developing a fully update-able predictive app for an ultrasound machine and then pushing it out to 10,000 devices in the field such that predictions can be delivered to doctors at the point of care in real time. What if you didn’t have to wait for hours or days to get your MRI results? What if a clinician could be given a probability score that a certain disease state is present while the MRI is being performed? What if you could instantly update all of these apps (via the cloud) and what if the results of all connected devices could be aggregated to provide a macro view of the patient condition? The possibilities are endless. This isn’t snake oil, black magic, or marketing blather. As I’ve explained, we at Predixion leveraged the legwork of many extremely smart people. This enabled us to focus on other hard stuff – like the problem of rediscoveringBI Special Edition • #rediscoveringBI • 26
bringing predictive insights to the people via the wizard-driven, self-service design and modeling environment we expose as part of Predixion Insight. Insight enables the equivalent of a what-you-see-is-whatyou-get (WYSIWYG) design experience for predictive analytics. If you’re a highly technical user, you can bypass its wizards and jump right in. However, even the most gifted of technicians will appreciate the conveniences we’ve built into Insight – e.g., detailed statistical breakdowns on stuff like variable distributions and correlation relationships, along with other, even more esoteric stats.
analytics.” This is a term that Predixion first coined and which others have since used. It speaks to the problem of enabling on-site, in-the-field, in situ employees to make real-time decisions on the basis of predictive analytic insights. I’m talking about people who don’t know what an algorithm is – nor should they. Think of the nurse in the ICU, the sales person in the waiting room, the technician on the tarmac – the people who ultimately interpret and act on predictive insights.
Because it’s a WYSIWYG-like design tool, Insight can be used productively by non-technical people, too. Subject matter experts and other smart, savvy users might not know much about predictive modeling but they do know an awful lot about their business domains. Insight’s wizards and self-service capabilities can guide the user through the process of creating a predictive model, automate the selection of algorithms and variables, and even automatically create and score a model.
The “classical model” of predictive analytics isn’t just unwieldy, it’s pernicious: it gives disproportionate emphasis to a small (if critical) part of the predictive analytic process – namely, planning, preparation, and development – at the expense of the whole. As I’ve tried to point out, it all but ignores the myriad ways in which insights are used and acted upon by people in the real world: by nurses, doctors, and veterinarians; by retail sales clerks, delivery truck drivers, and plant managers; by social workers, rescue and aid workers, and community organizers; by engineers, scientific researchers, and academics: by individuals in occupations or avocations of every kind.
We also innovated around the end-user consumption paradigm that we call the “last mile of predictive
Get Real, Get Started
rediscoveringBI Special Edition • #rediscoveringBI • 27
Our industry has this bad habit of focusing on planning and preparation and of giving short shrift to outcomes.
This isn’t surprising. Our industry has this bad habit of focusing on planning and preparation and of giving short shrift to outcomes. Think back to CRM, that infamous example of dot.com-era excess. CRM was unwieldy. CRM cost, on average, millions of dollars to implement. And CRM was 60 percent unsuccessful. No, that isn’t a typo: 60 percent of CRM projects failed. Does that surprise you? CRM technology was notoriously complex, hard to install, and even harder to use. In spite of this, Siebel was the 800-pound gorilla of CRM. Until Salesforce.com came along, that is. Salesforce.com offered easyto-use, subscription-priced CRM software that didn’t need installing – and promptly ate Siebel’s lunch. I had the pleasure of dining recently with Marc Benioff, Salesforce.com’s founder and CEO, and was blown away by his passion, irreverence, and vision. He took on the 800pound Siebel gorilla and won.
The fact is that we at Predixion developed Insight in the same conditions as Benioff and his team developed Salesforce.com – and in response to basically the same challenges. Even today, our two visions are strikingly similar: Salesforce.com’s recent proclamation of the “Internet of Customers” is based on the idea that there’s a flesh-and-blood customer behind every device or connection in the Internet-of-Things. We live this each and every day at Predixion. The predictive apps our customers build and deploy communicate with and consume data from a myriad of connected devices or endpoints, as well as very large data sets in Hadoop, Greenplum, or SQL Server. They’re already plugged into and interacting in an Industrial Internet-of-Things. You American’s don’t like to be confrontational and don’t like to name your competitors – but I do. As we say in Australia, “let’s call a spade a f*&^ing shovel.” SAS is the dominant force. They’re famous, they have huge customers, and they charge a lot of money for technology they built 35 years ago. But SAS isn’t unassailable. For starters, they don’t offer flexible pricing and they don’t – they can’t – lead with cloud services. And let’s face it, their technology is only useful and usable if you have a PhD. SAS, along with IBM (thanks to Big Blue’s acqui-
sition of SPSS) are our only other real competitors. But whether they realize it or not, they’re both perfectly and dramatically poised for massive disruption. A little-known secret is that Predixion beats them in the market. Where we see them we outsell them because we have easier-to-use software that we offer as a service with flexible pricing and we appeal to the people who’s business problems actually get solved in real time with the predictions. We are demystifying a space whose complexity SAS made a big business out of and we think this will get even easier over time as their users die of old age. The truth is that young, up-and-coming developers don’t want to use legacy products like SAS; instead, they want REST-ful cloud services like Predixion, or open source technologies like “R” – which (by the by) Predixion Insight can package and automate for general consumption. If the ranks of SAS and SPSS users are thinning out; if developers and architects – to say nothing of successive generations of graduates produced by university undergraduate and MBA programs – prefer cloud apps and open source tools; if SAS and IBM can offer little in the way of pricing relief and flexible application delivery options – need I say more? Really? The writing is on the wall. Think Siebel. Think Dinosaur. The age is upon us. Predict Everything. rediscoveringBI Special Edition • #rediscoveringBI • 28
WIN
with DATA VIRTUALIZATION
Bob Eve | Director Product Marketing, Data Virtualization Business Unit Cisco
“In the struggle for survival, the fittest win out at the expense of their rivals because they succeed in adapting themselves best to their environment.” - Charles Darwin
T
HIS INSIGHT FROM Charles Darwin’s The Origin of Species in 1859 captures the essence of every organization’s need to continuously adapt. This is especially true when it comes to data. Today, businesses are challenged to quickly take advantage of the myriad data-driven opportunities available, including: • Big data • Analytics • Cloud computing • Self-service • Mobility
and more impactful execution. It’s a winning formula, with time of the essence.
Businesses that successfully leverage their data will be the leaders. Those who don’t will fall behind. Will you be a leader or an “also ran?”
“On average, organizations take 7.8 weeks to add a new data source to their data warehouses.” - TDWI
Data Virtualization Today Businesses understand data’s competitive value: better data leads to better insights, which leads to faster
This formula is the primary driver for data virtualization (DV). Data-driven businesses can no longer wait for months-long data warehousing efforts to bear fruit. Further, IT has now recognized that consolidating all data in traditional data warehouses -- or endlessly replicating data sets via marts and feeds -- are losing strategies. By empowering the business with data faster and less expensively, data virtualization has risen to fill this void.
Data virtualization provides business with logical business views of their data. This data is virtually consolidated in real-time from sources across the internal and extended enterprise, so it is easier to set up and change. Data virtualization supports traditional use rediscoveringBI Special Edition • #rediscoveringBI • 29
• Unprecedented Scale Requires Unprecedented Innovation Data virtualization adoption has moved from project to enterprise scale, and now beyond with global deployments spanning tens of thousands of users and hundreds of internal and external data sources. Supporting these volumes -- across such a widely distributed source and consumer landscape -- is forcing today’s data virtualization vendors to innovate in unprecedented ways.
cases such as BI, and new use cases such as mobility and analytics. Data virtualization addresses the proliferation of traditional and new big data sources, plus the movement of data to the cloud. And, data virtualization complements traditional data warehousing and replication, unleashing your data by flexibly taking advantage of earlier storage and integration investments. Adoption Accelerates Forrester estimates data virtualization’s overall market size at $12B by 2017, growing at roughly 22% annually. For 2013, Gartner, TDWI, and TechTarget’s BI Leadership Group report approximately 20-27% adoption overall, plus another 33% having data virtualization “under consideration.” These surveys report better business insights; increased agility and cost savings are the primary drivers of adoption. Many of these initial adopters have barely scratched the surface of their planned deployments. In his October 2013 Data Virtualization
Day speech to over 350 professionals from over 130 organizations, Forrester’s Noel Yuhanna identified six areas where initial adopters were expanding: • Developing enterprise-wide data virtualization strategies. • Broadening their focus to include unstructured data and semi-structured data. • Moving beyond read-only use cases. • Expanding integration with external data sources. • Expanding integration to the cloud (private and public). • Some are extending the framework to support transactional activities. • To meet these evolving requirements, Cisco is expanding its market leading Cisco Data Virtualization Suite on several critical dimensions.
One example is performance at scale. As sources and consumers disperse beyond the enterprise data center, data access and integration occurs via the Internet’s wide-area networks -- rather than via local high-speed networks. To meet this performance-at-scale demand, Composite Software, recently acquired by Cisco, is combining Composite’s query optimization algorithms and techniques with Cisco’s network optimization algorithms and technology to provide a unique and powerful 1+1=3 solution. Self-Service to the Fore Over the past five years, the concept of business self-service has revolutionized the business intelligence market. Bursting on the scene and expanding into thousands of enterprises, Qliktech, Tableau and Spotfire brought BI tools directly to the business user.
“Using a data virtualization technique is: number one, much QUICKER time to market; number two, much more COST EFFECTIVE; and three, gives us much MORE AGILITY in actually changing new models of customer programs.” - Martin Brodbeck, CTO Pearson Education
rediscoveringBI Special Edition • #rediscoveringBI • 30
Armed with these tools (as well as ubiquitous Excel) business users no longer want IT to provide reports. They want IT to provide the data, and they will use their self-service BI tools to do the rest. To meet this need, the Cisco Data Virtualization Suite will soon include a business directory that empowers business users with friendly and secure access to curated data sets. This self-service data integration resets the contract between business and IT, enabling IT to better serve business’s data needs with greater agility, lower costs, and consistent control. Analytics and the Data Scientist Shortage Analytics present a huge business opportunity … if you have the team and technology to take advantage! Unfortunately, because the typical analysis project requires so many skills, there is a shortage of data scientists who can do the work. How can we simplify the work to accelerate the outputs of these specialists as well as open the door to a broader set of analysts? The typical analysis process includes six major stages, implemented in an iterative manner including: • • • • • •
Find the data Access the data Build a sandbox for the data Build the analytic model Statistically analyze the results Develop and communicate the business insight
Interestingly, analysts typically spend more than half their time and effort assembling the data (steps 1-3)
With data the new basis of competition, adopting new data integration methods IS A MUST for survival. needed to perform the analytics, leaving less time for actual analysis (steps 4-6). And the rise of big data has made the data steps more challenging as data sources and types proliferate. Cisco recently announced Cisco Collage, a data sandboxing extension to the Cisco Data Virtualization Suite that takes on this challenge directly, providing the fastest possible path to analytic data. The Fittest Adapt With data the new basis of competition, adopting new data integration methods is a must for survival. Innovative organizations understand this opportunity and are already using data virtualization to get ahead of their competition. Data virtualization vendors are also pushing the innovation envelope to increase users’ business advantage. Market-leader Cisco is heading the charge, adapting its offerings to support key challenges, including unprecedented scale, self-service business empowerment, and analytic data sandboxing. What is your plan to adapt to today’s fast changing technology environment? Will you stick to the status quo and hope to survive? Or will you win out at the expense of your rivals, joining the fittest by adopting data virtualization? rediscoveringBI Special Edition • #rediscoveringBI • 31
CAN
BE
LEADERS?
Ted Cuzzillo | Writer
D
ATA SCIENTISTS ought to be on a march toward permanent seats in the c-suite about now. At least that’s what i’ve thought.
On paper, their broad, deep knowledge of organizations make them attractive candidates. They know how to wrap business stories around data, to suggest practical action, and to collaborate with decision makers. Jill Dyche, SAS vice president of best practices and a longtime, first-hand observer of such things, says, “If that’s not leadership, what is?”
rediscoveringBI Special Edition • #rediscoveringBI • 32
But actual experience seems to vary. Many incumbent leaders -- those who could bless the data-scientist-to-leader path -- instead see specialists as too narrow to see the big picture, and sometimes even too weird to put up with for long. Call it the Analytics Gap. Stanford University contributing professor Blake Johnson (another first-hand observer of data analysts) suggests that the gap may explain business intelligence’s failure to penetrate past the estimated five percent of potential users. One retired technology executive and longtime acquaintance of mine gives a glimpse of executive sentiment: “[Data scientists] are staff folks who, notwithstanding their well-meant efforts, really haven’t a clue about how businesses, organizations, or people operate. Data analysts are looking for the certainty, or at least the illusion of certainty, that numbers provide.” To lead well, he says, you have to understand you’re dealing with a messy and uncertain world. He cites the daily body count during the Vietnam War. By that KPI, the U.S. was winning, and it helped prolong the war. Does such overconfidence in data necessarily come with the data analyst? Probably not, or at least no more so than overconfidence in one’s “instinct” comes with the executive. Executives’ “guts” are sometimes flat wrong, such as when a data scientist’s personal style is different from the buttoned-down style that to them signals mature reason. An insurance company executive, who also asked not to be identified, sees the disconnect first hand all the time. His chief data scientist is “terrific at what he does,” he says, even “brilliant.” The company is lucky to have him. But he never sees the inside of the executive suite. “He is a quirky, quirky guy,” says the executive, “and he is a super powerful dude in what he’s doing for us.” But upper executives would judge him harshly. “There would be a terrible outcome.” “These [analysts], including me,” he says, “we’re different.” The brilliant data scientist -- who as a hobby builds computers from video cards because “‘there’s a lot of power in video cards’” -- sometimes makes a bad impression in person. He wanders off topic and,
Many incumbent leaders...instead see specialists as
TOO NARROW
to see
the big picture, and sometimes even
too weird
to put up with for long.
for example, has been known to digress into world politics and odd analogies. “When I hear something come out his mouth that’s a little off color,” the boss says, “I chalk it up to his brilliance.” The company’s top executives, on the other hand, are “a harsh crowd.” On the other hand, that crowd needs what it needs, and sometimes that’s just to be pointed in the right direction. Part of the problem, at least, is how data analysts have been trained, says Stephen McDaniel, co-founder of Freakalytics, a training company and publisher. He’s trained many data analysts and C-level executives over the years and is the co-author of several books, most recently The Accidental Analyst. “Analysts are very literal,” McDaniel says. “Part of it is in how they communicate.” Often, analysts respond to questions with full-blown analysis when all the business people wanted was a point in the right direction. McDaniel compares it to asking directions while driving. When you ask a passerby how to get from here to there, “do you want the whole, detailed description? Will you wait half an hour to hear it? No. You want the rediscoveringBI Special Edition • #rediscoveringBI • 33
So why can’t the data scientist and the leader go at it face to face? In fact, that’s far preferable according to David Botkin, vice president of business intelligence for Disney Interactive. He spoke in March at a forum led by Blake Johnson that was held at the Department of Management Science and Engineering at Stanford University, “Inside the Data and Analytics-Driven Organization.”
Sooner or later some analyst will
become famous
for breaking
through to the top floor. person to point you in the right direction.” But that’s not been the way data analysis has been taught. “The more I teach,” he says, “the more I am convinced how wrong-headed the traditional approach to data analysis has been.” Many data analysts will gasp at the thought. They’re skilled at the exhaustive analysis and all the rigor that goes with it. But the rock hard belief in the-waythings-are-done is just as poisonous as a leader’s belief that data analysts can’t see past the numbers. “It’s a matter of managing expectations on both sides,” says Jill Dyche. Most analysts need coaching. Many of them overestimate executives’ interest in their methods. An instruction before a meeting like “‘Tell them the what and the why, now the how,’” might go a long way. Executives would get more of what they need and analysts would feel more appreciated.
Organizations should avoid the two common “request and response” processes, he observed. In one process, data scientists try to answer leaders’ questions but do so without knowing the context, the needs, or the opportunities. In the other, business people ask for data without knowing all that’s available. Far better than either of these is simple: direct interaction along with self-service analytics. If there’s anything we know from even a brief observation of business, it’s that every business’s culture varies. The “quirky, quirky guy” who’s just weird in one place is the quirky, quirky, and revered leader in another. Those deemed “without a clue” in one place seem to know it all in another. Sure, data analysts can be leaders. Sooner or later some analyst will become famous for breaking through to the top floor. Asked in interviews for the secret, the enigmatic answer will be, “I just wanted to be CEO.” Meanwhile, the other analysts will be having too much fun just being data analysts. That’s the way they want to lead.
“I’ve seen executives chafe in meetings with analysts.” Analysts often leave feeling deflated, assuming that executives didn’t appreciate their rigor. “They’re often right.” In her experience, she says, the problem often has less to do with the analysts and more to do with the intermediaries. “They are the ones grating over the lack of polish and long-windedness.”
rediscoveringBI Special Edition • #rediscoveringBI • 34
Continued from P6 – that’s “any” as in none whatsoever – in BI adoption. This is in spite of more interactive, visual and compelling dashboards; more self-serviceable and inviting BI tools; and more mobile-optimized BI applications. To wit: BIScorecard’s “Successful BI Survey” found that BI adoption in 2012 mirrored that of 2005; in both cases, less than one-quarter of potential information consumers were regularly using or interacting with BI. Re-imagining Leadership It’s possible that BI simply can’t be pushed out to, or taken up by, more than one-quarter of potential users in any organization. This seems unlikely, however. It’s more likely that BI adoption is a hard problem. Perhaps not an intractable problem, but an exceptionally tough nut to crack, for all of that. On paper, BI adoption seems like the kind of challenge that screams for leader-y intervention. That it remains a challenge; that -- in spite of the best efforts, determined decision-making, and imaginative thinking of
and income inequality. They’re unlikely ever to “surmount” or “overcome” these challenges. The best that they can hope for is to meet them – if not head-on (as a text on business leadership might insist), then at an angle: obliquely. In this context, brilliant, daring, audacious, and impactful leadership can take the form of “winning” an equilibrium or of moving the proverbial needle. In other words, leaders in the fight against these intractabilities are those who make a difference, however statistically significant; they aren’t always clearly or indisputably “overcoming” challenges. Let’s look at another example: in the United States, energy policy is often framed as a choice between “dependence” or “independence.” (An alternative is to see energy policy as a choice between sustainable development – via renewable sources – or unsustainable “dependence” on nonrenewable resources, such as fossil fuels. Once again, the axis is dependent/ independent.) Some folks contend that by extracting energy from untapped petroleum deposits (such as
REAL AND EFFECTIVE BI LEADERSHIP TAKES THE FORM OF
ARBITRATION OR BROKERING, AS WELL AS OF PRAGMATIC CONSENSUS-BUILDING. leaders everywhere -- BI adoption has instead languished, is perplexing. Perhaps it would be helpful to distinguish between “challenges” and “intractabilities.” Over time, challenges can become intractabilities; however, the latter do not cease to be challenges – even though we no longer choose to frame them as such. Instead, the allure of the intractable as a challenge – i.e., its appeal to the human imagination – does diminish; far from celebrating the intractable as an ongoing or unmet challenge, we instead wish that it would simply...vanish. Intractabilities embarrass us; they make us uncomfortable. Today, plenty of brilliant and imaginative people are working to address intractabilities such as poverty
shale oil and tar sand), by radically expanding existing offshore drilling efforts, and by pushing for a resumption of drilling in the Arctic National Wildlife Refuge (ANWR), we can eliminate our dependence on foreign energy sources. There’s an analogy to BI here, believe it or not. Today, for example, some folks argue that improved data visualization and self-service capabilities, used in combination with new and emerging big data technologies, can help BI to lick its intractable adoption problem. Both cases make use of a drastic oversimplification to frame a complex problem – viz., an intractability – as a clearly defined “challenge;” both likewise have recourse to a technology silver bullet – let’s call it a deus ex techne – to address this oversimplified challenge. rediscoveringBI Special Edition • #rediscoveringBI • 35
standardize on a single BI stack, with centralization of all decision support data; prescribe the use of one-size-fits-all tools -- or of tools which presume a single, controlled interaction model; anoint the data warehouse as the end-all, be-all of data access; sanction access to the data warehouse only for select tools, for a suite of tools, or for a certain privileged group of users; identify and mercilessly extirpate any or all rogue BI practices, such as databases, BI discovery tools, and spreadmarts. All of these are variations on the top-down strategy we’ve been employing -and doubling-down on -- for years. How’s that working out for us?
The intractable aspect of energy policy doesn’t have to do with a dependence on non-renewable resources from non-domestic suppliers; nor with the environmental affects – demonstrable or merely hypothetical – of this policy; nor with the degree to which this policy might be vulnerable to geopolitical disruption or instability. (Little enough is said about it, but an Iranian move to menace the Straits of Hormuz could effectively cripple the international oil trade.) Instead, it has entirely to do with the fact that people disagree about all of these things. It has to do, then, with the people and process problems – e.g., with issues of self-interest, prestige, power, and so on – that are at the core of all disagreements. To oversimplify a complex problem by condensing it into a neatly-framed challenge or into a spectrum of challenges is unhelpful. It ignores the human and organizational problems that are at the root of any and every intractability. What’s more, it encourages polarization between and among stakeholders: the more we oversimplify an issue, the more we polarize it. Highly polarized issues can’t be easily or satisfactorily negotiated; they can, conversely, be conclusively and irrefragably decided. They also tend to look great on a resume or CV. For example, a classic response to the intractability of BI adoption is to mandate BI usage. Organizations
The success of the bring-your-own-device (BYOD) and BI discovery trends – to say nothing of the stubborn persistence of the spreadsheet – demonstrate that BI usage simply cannot be mandated. Not now. Probably not ever. If a putative “user” feels frustrated with a BI tool, she’s going to go out-of-band: she’s going to look for an alternative. If a person associates the use of a BI tool or the requirements of a BI standardization effort with a diminution of power or prestige (for herself, for her department, for her privileged or at any rate accustomed role in a process), she’s going to fight it. You can “mandate” her use of the BI system, but you can’t make her willingly or productively use it. This is a critical distinction. An efficacious approach to BI leadership involves what might be called “decisive” or “difficult” negotiating, as distinct to Walter Mittyesque decision-making. Real and effective BI leadership takes the form of arbitration or brokering, as well as of pragmatic consensus-building. It might indeed involve the use of (or alignment with) decision-making power – not as a means to impose one’s will but (instead) to bring resistant or uncooperative constituencies to heel: to drive or urge consensus, however grudgingly. An effective leader needs to be as much of a Jane Addams-type – i.e., a prototypical social worker – as a Napoleon- or Genghis Khan-type. Unfortunately, it is far easier to daydream about being Jack Welch than about being Jane Adams. BI leadership in this context is the stuff of routine, tedium, frustration, and exasperation. It’s a recipe for boredom – if one loses sight of what’s at stake. Or if one if one is given to Waltter Mittyesque daydreaming. rediscoveringBI Special Edition • #rediscoveringBI • 36
C O R C D D
CHADVISED OPRESEAR REARCHAD CHADVISED DVISEDEVE DEVELOPR RADIANT ADVISORS
R E S E A R C H . . . A D V I S E . . . D E V E L O P. . . Radiant Advisors is a strategic advisory and research firm that networks with industry experts to deliver innovative thought-leadership, cutting-edge publications and events, and in-depth industry research.
rediscoveringBI F o l l o w u s o n Tw i t t e r ! @ r a d i a n t a d v i s o r s