Radiant Advisors Publication
rediscoveringBI
THE BIG DATA HONEYMOON OVER ALREADY?
BI'S BIG QUESTION HAS THE BUBBLE BURST?
BIG DATA VS. DATA MANAGEMENT A ZERO-SUM SCENARIO
BI AND BIG DATA BRINGING THEM TOGETHER
04 APRIL 2013 ISSUE 7
AFTER THE BIG DATA PARTY
rediscoveringBI
April 2013, Issue 7
SPOTLIGHT
[P4]The Honeymoon is Over for Big Data
Big data, it turns out, means precisely nothing and imprecisely anything you want it to mean.
[By Dr. Barry Devlin] FEATURES
[P8] Has the Big Data
[P10]
[P16]Bringing BI and
The BI industry
Twilight of the (DM) Idols Big data is already a
is abuzz with one new question: is big
disruptive force: at once democratizing,
things that make a “big” difference when
data done?
reconfiguring, and destructive.
implementing big data.
[By Krish Krishnan]
[By Stephen Swoyer]
[By John O’Brien]
Bubble Burst?
EDITOR’S PICK
[P7]
Big Data Together
[P13]
Big Data Revolution Are we becom-
Big Data
ing no more than sentient founts
Three
Big Impact: Two use cases for big
data are having a Big Impact, at least
of Data? Mayer-Schönberger and
from a data management perspective.
Cukier put the pulse back in the
[P15] A Kludge Too Far?
Big Data Conversation.
[By Stephen Swoyer]
[By Lindy Ryan] SIDEBAR
2 • rediscoveringBI Magazine • #rediscoveringBI
FROM THE EDITOR “Big data” is being routinely paired with proportionally “big” descriptors: innovative, revolutionary, and even (in this editor’s humble opinion, the mother-of-all-big-descriptors) transformative. With the inexorable momentum of cyber-journalism keeping it affixed atop industry head-
radiant Advisors publication
rediscoveringBI
The Big DaTa honeyMoon Over AlreAdy?
Bi's Big QUesTion
lines, big data has indeed earned itself quite the reputation, complete with a stalwart following of industry pundits, papers, and conferences. In fact, the whole big-data-thing has mutated into a sort-of largerthan-life caricature of promise and possibility – and, of course, power.
hAs the bubble burst?
Big DaTa vs. DaTa ManageMenT A zerO-sum scenAriO
Bi anD Big DaTa
Let’s face it: big data is the Incredible Hulk of BI -- gargantuan, brilliant, and, yes, sometimes even a bit hyper-aggressive. For many the mildmannered, Bruce Banner-esque data analyst out there, big data is, for better or worse, the remarkably-regenerative, impulsive alter ego of the industry, eager to show with brute force just how much we can really do
bringing them tOgether
04
After the big dAtA pArty
April 2013 issue 7
with all our data – what the tangible extent of all that big data power
Editor In Chief
is, so to speak. Yet, as with the inaugural debut of Stan Lee’s destructive antihero’s in The Incredible Hulk #1, in early 2013 we still haven’t begun to really see what big data can do yet. Not even close. In this month’s edition of RediscoveringBI, authors Dr. Barry Devlin, Krish Krishnan, Stephen Swoyer, and John O’Brien each explore different facets of that very construct, asking has the big data bubble actually burst, or is its honeymoon phase just over? How much is just hype? And, how
Lindy Ryan lindy.ryan@radiantadvisors.com
Contributor Dr. Barry Devlin barry@9sight.com
Contributor
much is only a precursor to what we’ll continue to see buzzing around
Krish Krishnan
and reinventing the industry?
rkrish1124@yahoo.com
What’s next for BI’s Incredible Hulk antihero, Big Data?
Contributor John O’Brien john.obrien@radiantadvisors.com
Lindy R yan
Distinguished Writer Stephen Swoyer stephen.swoyer@gmail.com
Art Director Brendan Ferguson brendan.ferguson@radiantadvisors.com
Lindy Ryan Editor in Chief
For More Information: info@radiantadvisors.com
Radiant Advisors
rediscoveringBI Magazine • #rediscoveringBI • 3
OPINION Radiant Advisors Publication
rediscoveringBI EvEnt-drivEn architEcturEs thE shifting lAndscAPE
timE of rEckoning sElEcting thE right bi solution
An ARchitEctuRAl collision couRsE
tying goAls to REquiREMEnts
LETTERS TO THE EDITOR
On: Time for An Architectural Reckoning
arE data modEls dEad? thE REAl dEbAtE
03
shifting gEARs with ModERn bi ARchitEctuREs
MARch 2013 issuE 6
Evolution vs. Revolution This is an excellent, thought-provoking article. I believe that you are correct in the assertion that an Architectural Reckoning is underway. In fact, I believe it has been underway for at least 10 years. To focus on technology in general and Hadoop in particular is, however, to miss the point. The Reckoning is being driven by the intersection of business needs and technology advances. Both sides can be summed up as “faster and smarter” – and they are mutually reinforcing. I call this the “biz-tech ecosys-
yes, we are in a time of Architectural Reckoning but continuity of thinking and a mindset of evolution rather than revolution are vital." - Dr. Barry Devlin
tem”. On the technology side, Hadoop is and will be part of
Augmentation of Traditional DWs
it. So will a range of other data management technologies,
I totally agree with Claudia that “not all analytics now belong
including relational databases, for sure. And I believe that
inside the BI architecture” and that we are in a “very disrup-
the various approaches – column, in-memory and more – will
tive period of a lot of new technologies flooding in to busi-
be combined into a hybrid approach more powerful than
ness intelligence.” I also am not actually that far away from
any RDBMS we have today. And we will need that, because
the position Scott Davis takes: I agree that “Hadoop is a huge-
I am certain that the Data Warehouse – in a new, more cir-
ly transformational technology.” I just think for the short- to
cumscribed role, but central to consistency and reliability
medium-term Hadoop et al are going to augment, rather than
for information that must be of high quality – will continue
replace, traditional data warehouses. Will Hadoop replace a
to thrive. (And many thanks for the historical positioning of
traditional data warehouse database in the long term? Only if
Paul’s and my paper from 1988!) I call this new role “Core
it adds a lot of database like features, and then the argument
Business Information.”
becomes a lot less interesting – something akin to the “Will
As you also pointed out, it’s not just about data management.
Ingres/Informix/Sybase replace Oracle?” debate of yesteryear.
What is happening is also changing application development,
My main concern is how customers are going to embrace this
as well as process and business modeling and implementa-
new data landscape, rather than if they are going to. How are
tion. Collaborative and social computing are also vital com-
organizations going to build a data landscape that includes
ponents of the mix. So, yes, an inter-disciplinary approach will
Teradata, Aster, and Hadoop? How are they going to manage
be needed – not just within IT but across the business – IT
Analysis Services cubes and a smattering of legacy Oracle
divide.
data warehouses?
We are also in somewhat of a positive feedback loop – and as
Data warehouses currently take too long to build and are too
anyone who has ever put a microphone in front of speakers
hard to change. The new architectural changes are going to
knows, the result rapidly becomes very unpleasant. So, we do
make things worse not better.
need to step back from the hype of big data and recognize the
Yes, WhereScape does have a stake in the game –although
dangers as well as the opportunities.
not in the status quo. Regardless of the platform, design and
My bottom line: yes, we are in a time of Architectural
technology the need to deliver quickly without compromise
Reckoning (is this the same as a Paradigm Shift?) but continu-
remains the same. Who wants to manually build out a mul-
ity of thinking and a mindset of evolution rather than revolu-
tiple platform data warehouse? A data warehouse automation
tion are vital. I’m trying to capture this in my long-awaited (by
environment (such as WhereScape RED) helps simplify the
me, anyway) second book.
approach, and I believe is a key piece of the new architecture.
- Dr. Barry Devlin (Editor's Note: See The Honeymoon is Over for
- Michael Whitehead (Editor's Note: Michael Whitehead is the
Big Data by Dr. Barry Devlin in this month's issue)
CEO and Founder of WhereScape)
Have something to say? Send your letters to the editor at lindy.ryan@radiantadvisors.com 4 • rediscoveringBI Magazine • #rediscoveringBI
Upcoming Webinar Inside Analysis
MODERN DATA PLATFORMS
Inside Analysis with Dr. Robin Bloor and John O'Brien Hosted by Eric Kavanagh
http://www.bigdatabootcamp.net Don't miss Radiant Advisors' John O'Brien as he keynotes the upcoming Big Data Boot Camp.
May 21-22 New York Hilton
APRIL 17 3:00PM CST
REGISTER NOW
John will offer perspective into the dynamics and current issues being encountered in today's Big Data analytic implementations as well as the most important and strategic technologies currently emerging to meet the needs of the "Big Data Paradigm." Join John and other Big Data experts as they converge upon New York and be sure to save an extra $100 off the early bird rate by using this link. Early bird registra-
Agile and flexible -- those might well be the mantras of Modern Data Platforms. As organizations look to harness the latest advances in analytics and integration technologies, the focus turns quite sharply to architecture: the right data platform can empower companies to harness everything from Big Data to real-time, all without sacrificing data quality and governance. Register for this free Webcast to catch a preview of SPARK!: Modern Data Platforms, a three-day seminar series to be held in Austin, TX, from April 29 - May 1. The seminar will feature a tag-team of experts from Radiant Advisors and The Bloor Group, who will provide detailed instruction on the range of activities associated with modernizing and evolving robust data platforms. John O'Brien of Radiant will focus on Rediscovering BI, while Dr. Robin Bloor of The Bloor Group will discuss the Event-Driven Architecture. Attendees of the Webcast will receive a discount code for $150 off the in-person seminar. Follow the conversation #sparkevent
tion ends April 19.
SPOTLIGHT
THE IS OVER FOR BIG DATA
[The term big data has passed its use-by date]
DR. BARRY DEVLIN IG DATA IS tumbling into the “Trough of Disillusionment,” according to Gartner’s Svetlana Sicular. If you fear that this means the end of the road for big data, see Mark Beyer’s (co-lead for Gartner big data research) remedial education on the meaning of Gartner’s Hype Cycle curve, although they might have chosen a less alarmist phrase! Let me put it another way: the big data honeymoon is over. Let’s quickly review the history of the romance before looking to the future of the relationship. For commercial computing, big data “dating” really began in the mid-2000s, when technical people in the burgeoning web business began to consider new ways to handle the exploding amounts and types of data generated by web usage. Before then, big data had been the dream -- or nightmare, actually -- of the scientific community where, from genetics to astrophysics, instrumentation was spewing data. In early 2008, the commercial romance of big data really began to get serious when Hadoop, the yellow poster elephant child of big data, was accepted as a top-level Apache project. The marketing paparazzi began stalking the couple soon after and, true to paparazzi nature, have been publishing a stream of outrageous claims and pictures ever since. By 2012, a shotgun wedding with business
6 • rediscoveringBI Magazine • #rediscoveringBI
“
Big data, it turns out, means precisely nothing and imprecisely anything”
was hastily arranged. By then the gloss had begun to wear off
has caused me to revisit many underlying assumptions about
and the honeymoon was washed out in a brief trip to Atlantic
information and I now see that there exist three domains of
City at the height of a super storm.
information that future business intelligence/analytics must
Enough Of The Past: Let’s Look Forward!
Big data does offer real and realizable business benefits, but there is one major issue: what actually is big data? The “volume, variety, and velocity” nomenclature, claimed by Doug Laney from a 2001 Meta Group research note, is useful shorthand at best. In reality, each attribute opens up a question of how far on any scale must data be in order to be called big -- how vast, how disparate, how fast? Furthermore, what combination of these three factors should be used in making a call? Big data, it turns out, means precisely nothing and imprecisely anything the Mad Men want it to mean. And, with the various additional “v-words” vaunted by vendors, the value vanishes. (Oops, I veered into the v-v-verge there!) The extent of this terminology problem was made clear in a big data survey conducted last fall by EMA and myself. Participants were those who declared they were investigating or implementing big data projects yet almost a third of respondents classed the data source for their projects as “process-mediated data” -- data originating from traditional operating systems. My conclusion: the term big data has passed its use-by date. Big data and “small data” are conceptually one and the same: just data, all data. Or, to be more semantically correct, all information, as I’ll explain in a new book later this year. (Editor's Note: Business Unintelligence: Via Analytics, Big Data and Collaboration to Innovative Business Insight will be published in Q3 2013 by Technics Publications). To be clear, I don’t consider that big data has taken us into a dead end. Rather, it has usefully exposed the fact that our traditional business intelligence (BI) view of the information available to and used by business is woefully inadequate. It
handle, as shown in in the accompanying figure: humansourced information, process-mediated data, and machinegenerated data. These domains are fundamentally different in their usage characteristics and in the demands they place on technology. The terms are largely self-explanatory, but more information can be found in my white paper. (See Barry Devlin’s The Big Data Zoo - Taming the Beasts: The need for an integrated platform for enterprise information). The bottom line is that we need a new architecture for information -- all of it and its entire life cycle in business.
The Biz-Tech Ecosystem
Both challenges and opportunities emerge as we shift the view from IT to business. The biggest challenge in the big data/analytics scene is the alleged dearth of so-called “data scientists.” How different are data scientists from the power users we’ve known in BI for decades? Arguably, the only substantive difference is deep statistical skill. The other characteristics mentioned -- data munging, business acumen, and storytelling -- are all common to power users. Statistics, however, is a very specialized skill that should, in principle, be tightly supervised to ensure valid and proper application. The phrase “lies, damn lies, and statistics” indicates the problem: statistics are far too easy to misuse -- deliberately or otherwise. Moreover, we seem to have blindly accepted an assertion that the exponential growth in data volumes implies a similar growth in hidden nuggets of useful business knowledge. This is unlikely to be true. Most of the good examples of business value coming from big data illustrate this. Real value emerges from a new type or new combination of data; growth in volumes leads to incremental increases in value, at best. These challenges aside, a focus on novel (big) data use does
rediscoveringBI Magazine • #rediscoveringBI • 7
“
drive opportunities for new businesses, business models, or,
...at its heart are the collection, creation, and use of information, as opposed to data -- big or small -- as mandatory, core competencies of modern business.”
simply, ways to compete. A useful, cross-industry categorization (courtesy of IBM) of these opportunities is: • Big Data Exploration: analyze “big data” to identify new business opportunities • Enhanced 360° View of the Customer: incorporate humansourced information sources, such as call center logs and social media, into traditional CRM approaches • Security and Intelligence Extension: lower risk, detect fraud, and monitor cyber security in real-time, machine-generated data • Operations Analysis: analyze and use machine-generated data to drive immediate business results • Data Warehouse Augmentation: increase operational efficiency by integrating big data with BI This focus on (big) data is but the latest stage in the evolution of what I call the biz-tech ecosystem -- the symbiotic relationship between business and IT that drives all successful, modern businesses. Every business advance worth mentioning in the past twenty years has had technology, and almost always information technology, at its core. On the other hand, much of the advances in IT have been driven by business demands. The relative roles of business and IT people
Dr. Barry Devlin is Founder and Principal
may change as the process evolves, but that process is set to
of 9sight Consulting, and is among the
continue. And, at its heart are the collection, creation, and use
foremost authorities on business insight
of information, as opposed to data -- big or small -- as manda-
and big data. He is a widely respected
tory, core competencies of modern business.
analyst, consultant, lecturer, and author.
Share your comments >
8 • rediscoveringBI Magazine • #rediscoveringBI
EDITOR’S PICK
BIG DATA LINDY RYAN HILE IT’S INARGUABLE that the phenomenon
to ensure that they are not crowded out by data and machine-
known as “big data” is rapidly reknitting the very
made answers.”
fabric of our lives, what we are just now begin-
In our brief email exchange, Mayer- Schönberger elaborated
ning to see and to understand – to appreciate
a bit more on this idea. “[We] try to understand the (human)
– is how.
dimension between input and output,” he noted. “Not through
Yet, so often our conversations about big data focus on these
the jargon-laden sociology of big data, but through what we
“how’s” in the abstract – on its benefits, potentials, and oppor-
believe is the flesh and blood of big data as it is done right
tunities, and likewise, its risks, challenges, and implications –
now.”
that we overlook the simpler, more primordial question: what’s
With the elegance of an Oxford University professor and
not changing?
The Economist’s data editor – Mayer-Schönberger and Cukier,
It’s a simple question that requires a simple answer. Us. Sure,
respectively – Big Data’s authors remind us that it is our
we can assert that we’re becoming more data-dependent. We
human traits of “creativity, intuition, and intellectual ambi-
generate more data: last month, social media giant Twitter
tion” that should be fostered in this brave new world of
blogged that its over 200-million active users generate
big data. That the inevitable “messiness” of big data can
over 400-million tweets per day. We consume more data: a
be directly correlated to the inherent “messiness” of being
now-outdated University of California report calculated that
human. And, most important, that the evolution of big data
American households collectively consumed 3.6 zettabytes of
as a resource and tool derives from (is a function of) the dis-
information in 2008. Are we – the data-generating organisms
tinctly human capacities of instinct, accident, and error, which
that we are – becoming no more than sentient founts of data?
manifest, even if unpredictably, in greatness. In that greatness
In Big Data: A Revolution That Will Transform How We Live,
is progress.
Work, and Think, authors Viktor Mayer-Schönberger and
That – progress – is the intrinsic value of big data. It’s what’s
Kenneth Cukier effectively put the pulse back in the Big Data
so compelling about Big Data (both the book and the thing
Conversation: “big data is not an ice-cold world of algorithms
itself): it’s not always about the inputs or outputs, but the
and automatons...we [must] carve out a place for the human:
space – or, what Mayer-Schönberger calls the “black box,” of
to reserve space for intuition, common sense, and serendipity
in-between.
1
2
Share your comments >
Lindy Ryan is Editor in Chief of Radiant Advisors.
Big Data is available on Amazon and the Radiant Advisors eBookshelf www.radiantadvisors.com/ebookshelf 1 http://blog.twitter.com/2013/03/celebrating-twitter7.html 2 How Much Information? http://hmi.ucsd.edu/howmuchinfo.php
rediscoveringBI Magazine • #rediscoveringBI • 9
FEATURES
HAS THE BIG DATA BUBBLE BURST?
[The BI industry is abuzz with one new question: is big data done?]
KRISH KRISHNAN ECENT ARTICLES IN leading business publications, a hype-cycle presentation by Gartner, and a number of blogs have all startled the world of big data by asking one “big” question: are we done? Did the big data bubble burst even quicker than the “dot com” bubble? Has the big data bubble burst? The answer is: not really. If anything, the market for infrastructure is booming with more vendors distributing commercial versions of open source software (like Hadoop and NoSQL). We are seeing the evolution of new consulting practices focused on analytics and – perhaps most important – traditional database vendors have all either embraced or announced support for big data platforms. So, what is the basis of this notion of failure or disappointment around the big data space?
The Promised Land
Among the potential gaps not understood clearly by adopters: One size does not fit all: Big data technologies were developed to solve the problems of extreme scalability and sustained performance. While these technologies have certainly overcome the traditional limitations of database-oriented data processing, the same techniques cannot be directly extended to solve problems in the same realm. MapReduce skill availability: To effectively use most of the big data platforms one has to be able to write some amount of Map Reduce code; however, this is an area where skills are evolving and (still) scarce. Programming dependence: Many corporations are unable to adjust to the idea of having teams design and develop code (or data processing) – much like application software development. Standardization of programming techniques for big data are still maturing.
In 2004, Google’s announcement of the general availability of
Business case: Most early adopters did not have a robust
MapReduce and Google File System started a flurry of activ-
business case, or, in many cases, the right business case to
ity building platforms aimed at solving scalability problems.
implement on these platforms. The lack of an end-state solu-
One of these projects was “Nutch,” a parallel search engine
tion -- or usage and ROI expectations -- has led to longer
on the open source platform. The team at Nutch succeeded
development and implementation cycles.
in building the infrastructure that attracted Yahoo to sponsor
Hype: Continued hype about the technology has caused
and incubate the project under its commercial name: Hadoop.
unrest amongst executives, line of business owners, IT, and
Submitted to open source in 2009, Hadoop quickly gained
business users, leading to often misunderstood capabilities
notoriety as the panacea for all data scalability problems.
of the platform as well as incorrect ROI or TCO expectations.
Since then it has become a viable platform for large-scale
But wait: it is not “all over” when we talk about big data,
computing needs and has been adopted as a data storage
rather we have come to the point in time where the reality of
and processing platform at many companies across the world.
the platform – and how to drive its adoption within corpora-
Subsequently, the last four years have also seen the evolution
tions – has started settling down. The big data bubble is well
of NoSQL databases and multiple other additional technolo-
and alive; in fact, it’s even progressing in the right direction.
gies on the Hadoop framework.
The Reality
How to Integrate Big Data
As corporations begin to see beyond the hype of big data,
Hadoop’s early adopters did not fully understand the com-
everyone from the executive sponsor to the implementa-
plexities of the platform until they began implementing the
tion team is beginning to recognize the need to dig a better
technology, and this lack of understanding inevitably has
foundation for integrating big data. There are a few subtle yet
spurred a sense of failure (or disappointment).
invaluable pointers in this process:
10 • rediscoveringBI Magazine • #rediscoveringBI
“
The big data bubble is well and alive; in fact, it’s even progressing in the right direction."
The Future 1. Build the business case and keep it simple
Several technology providers have announced their support
2. Create a data discovery environment that can be used
of big data platforms, including Datastax (Cassandra), Intel,
by line of business experts 3. Identify the data and patterns that are needed to create a robust foundation for analytics
Microsoft, EMC and HP (Hadoop), 10Gen (Mongo DB), and Cray (YARC Graph Analytics DB). These vendors -- along with existing vendors -- will undoubtedly continue to provide more
4. Create the initial analytics based on the data discovery
options and solution platforms for deploying and integrating
5. Visualize the data in a mash-up platform using
big data technologies within the enterprise platform.
semantic data integration techniques
The big data bubble has not busted; it is still only begin-
6. Get the business users to use the outcomes
ning and will be reaching various levels of maturity over the
7. Gain adoption of the users
following years. There are many layers of complexities and
8. Create a roadmap for the larger program
intricacies that need to be defined and formalized, but this is where the evolution and opportunities exist.
Share your comments >
While the overall process of big data integration seems closely aligned to the integration of any other project, there are key differences that can define the success of the big data
Krish Krishnan is a globally recognized
bubble in your corporation: data discovery, data analysis, and
expert in the strategy, architecture, and
data visualization. These three integral pillars will clearly
implementation of big data. His new
identify the basis of how to implement big data and monetize
book Data Warehousing in the Age of Big
such an exercise.
Data will be released in August 2013. rediscoveringBI Magazine • #rediscoveringBI • 11
[Big Data Vs. Data Management]
TWILIGHT OF THE (DM) IDOLS STEPHEN SWOYER OME IN THE INDUSTRY ARE already writing epitaphs
of big data, like that of the Black Death, is indifferent to the
for big data. Others – a prominent market watcher
hopes, prayers, expectations, or innumerate prognostications of
comes to mind – argue that big data, like so many
human actors. It’s inevitable. It’s going to happen. It’s going to
technologies or trends before it, is simply conforming
change everything.
to well-established patterns: following a period of hype, it’s
Even as the epitaphs are flying, the magic quadrants being
undergoing a correction. It’s regressing toward a mean.
plotted, and the opinions mongering, big data is changing
That was fast.
(chiefly by challenging) the status quo. This is particularly the
This doesn’t concern us. Big data is an epistemic shift. It’s
case with respect to the domain of data management (DM) and
going to transform how we know and understand — how we
its status quo. Here, big data is already a disruptive force: at
perceive — the world. What’s meant by the term “big data” is
once democratizing, reconfiguring, and destructive. We’ll con-
a force for destabilizing and reordering existing configura-
sider its reordering effect through the prism of Hadoop, which,
tions – much as the Bubonic Plague, or Black Death, was for
in the software development and data management worlds,
the Europe of the late-medieval period. It’s an unsettling anal-
has to a real degree become synonymous with what’s meant
ogy, but it underscores an important point: the phenomenon
by “big data.”
12 • rediscoveringBI Magazine • #rediscoveringBI
“
FEATURES
Big data is an epistemic shift. It’s going to transform how we know and understand — how we perceive — the world.”
The Citadel of Data Management
By running amok in the countryside, pillaging, burning, and
Big data has been described as a wake-up call for data man-
managed to drag the Lords of DM into open battle.
agement (DM) practitioners.
At last year’s Strata + Hadoop World confab in New York, NY,
If we’re grasping for analogies, the big data phenomenon
a representative with a prominent data integration (DI) ven-
seems less like a wake-up call than...a grim tableau straight
dor shared the story of a frustrated customer that it says had
out of 14th France.
developed – perforce – an especially ambitious project focus-
This was the time of the Black Death, which was to function as
ing on Hadoop.
an enormous force for social destabilization and reordering. It
The salient point, this vendor representative indicated, was
was also the time of the Hundred Years War, which was fought
that the business and IT stakeholders behind the project saw
between England and France on French soil. The manpower
in Hadoop an opportunity to upend the power and authority of
shortage of the invading English was exacerbated by the viru-
the rival DM team. “It’s almost like a coup d’etat for them,” he
lence of the Plague, which historians estimate killed between
said, explaining that both business stakeholders and software
one- to two-thirds of the European population. Outmanned
developers were exasperated by the glacial pace of the DM
– and outwoman-ed, for that matter, once Joan D’Arc abrupted
team’s responsiveness. “[T]hey asked how long it would take to
onto the scene – the English resorted to a time-tested tactic:
get source connectivity [for a proposed application and] they
the chevauchée. The logic of the chevauchée is fiendishly
were told nine months. Now they just want to go around them
simple: Edward III’s English forces were resource-constrained;
[i.e., the data management group],” this representative said.
they enjoyed neither the manpower nor the defensive advan-
“[T]hey basically want Hadoop to be their new massive data
tages – e.g., castles, towers, or city walls – that accrued (by
warehouse.”
default) to the French. The English achieved their best outcomes in pitched battle; the French, on the other hand, were understandably reluctant to relinquish their fortifications, fixed or otherwise. The challenge for the English was to draw them out to fight. Enter the chevauchée. It describes the “tactic” of rampaging and pillaging – among other, far more horrific practices – in the comparatively defenseless French countryside. Left unchecked, the depredations of the chevauchée could ultimately comprise a threat to a ruler’s hegemony: fealty counts for little if it doesn’t at least afford one protection from other would-be conquerors. As a tactical tool, the chevauchée succeeded by challenging the legitimacy of a ruling power. Hadoop has had a similar effect. For the last two decades, the data management (DM) or data warehousing (DW) Powers That Be have been holed up in their fortified castles, dictating terms of access – dictating terms of ingest; dictating timetables and schedules, almost always to the frustration of the line of business, to say nothing of other IT stakeholders. Though Hadoop wasn’t conceived tactically, its adoption and growth have had a tactical aspect.
destroying stuff – or, by offering an alternative to the data warehouse-driven BI model – the Hottentots of Hadoop have
The Zero-Sum Scenario This zero-sum scenario sets up a struggle for information management supremacy. It proposes to isolate DM altogether; eventually it would starve the DM group out of existence. It views DM not as a potential partner for compromise, but as a zero-sum adversary. It’s an extremist position, to be sure; it nevertheless brings into focus the primary antagonism that exists between softwaredevelopment and data-management stakeholders. This antagonism must be seen as a factor in the promotion of Hadoop as a general-purpose platform for enterprise data management. Hadoop was created to address the unprecedented challenges associated with developing and managing data-intensive distributed applications. The impetus and momentum behind Hadoop originated with Web or distributed application developers. To some extent, Hadoop and other big data technology projects are still largely programmer-driven efforts. This has implications for their use on an enterprise-wide scale, because software developers and data management practitioners have very different worldviews. Both groups are accustomed to talking past one another. Each suspects the other of giving short shrift to its concerns or requirements. rediscoveringBI Magazine • #rediscoveringBI • 13
In short, both groups resent one another. This resentment
the conditions for change and transformation. Big data has
isn’t symmetrical, however; there’s a power imbalance. For a
had a similar effect in data management – chiefly by raising
quarter century now, the DM group hasn’t just managed data
questions about the warehouse’s ability to accommodate
-- it’s been able to dictate the terms and conditions of access
disruptions (e.g., new kinds of data and new analytic use
to the data that it manages. In this capacity, it’s been able to
cases) for which it wasn’t designed. Simply by claiming to
impose its will on multiple internal constituencies: not only
be Something New, big data raised questions about the DM
on software developers, but on line-of-business stakehold-
status quo.
ers, too. The irony is that the per-
This challenge was exploited by
ceived inflexibility and unrespon-
well-established insurgent cur-
siveness – the seeming indifference
rents inside both the line of busi-
– of DM stakeholders has helped to
ness and IT. The former has been
bring together two other nominally
fighting an insurgency against IT
antagonistic camps; in their resent-
for decades; however, in an age
ment of DM, software developers
of pervasive mobility, BYOD, social
and the line of business have been
collaboration, and (specific to the
able to find common cause.
DM space) analytic discovery, this
Few would deny that stakeholders
insurgency has taken on new force
jealously guard their fiefdoms. This
and urgency.
is as true of software developers
IT, for its part, has grappled with
and the line of business as it is of
insurgency in its own ranks: the
their counterparts in the DM world.
agile movement, which most in
Part of the problem is that DM
DM associate with project manage-
is viewed as an unreasonable or
ment, began as a software develop-
uncompromising stakeholder: e.g.,
ment initiative; it explicitly bor-
DM practitioners have been unable
rowed from the language of politi-
to meaningfully communicate the
cal revolution – the seminal agile
logic of their policies; they’ve like-
document is Kent Beck’s “Manifesto
wise been reluctant – or in some cases, unwilling – to revise
for Agile Software Development,” published in 2001 – in
these policies to address changing business requirements. In
championing an alternative to software development’s top-
addition, they’ve been slow to adopt technologies or meth-
down, deterministic status quo.
ods that promise to reduce latencies or which propose to
Agility and insurgency have been slower to catch on in DM.
empower line-of-business users. Finally, DM practitioners are
Nevertheless, insurgent pressure from both the line of busi-
fundamentally uncomfortable with practices – such as ana-
ness and IT is forcing DM stakeholders (and the vendors who
lytic discovery, with its preference for less-than-consistent
nominally service them) to reassess both their strategies and
data – which don’t comport with data management best
their positions.
practices.
However far-fetched, the possibility of a Hadoop-led chevau-
Hadoop and Big Data in Context That’s where the zero-sum animus comes from. It explains why some in business and IT champion Hadoop as a technology to replace – or at the very least, to displace – the DM status quo. There’s a much more
chée in the very heart of its enterprise fiefdom – with aid and comfort from a line-of-business class that DM has too often treated more as peasants than as enfranchised citizens – snagged the attention of data management practitioners. Big time.
pragmatic way of looking at what’s going on, however.
Reinvention
This is to see Hadoop in context – i.e., at the nexus of two
The Hadoop chevauchée got the attention of DM practitio-
related trends: viz., a decade-plus, bottom-up insurgency,
ners for another reason.
and a sweeping (if still coalescing) big data epistemic shift.
In its current state, Hadoop is no more suited for use as a
The two are related. Think back to the Bubonic Plague, which
general-purpose, all-in-one platform for reporting, discovery,
had a destabilizing effect on the late-Medieval social order.
and analysis than is the data warehouse. (See Sidebar: A
The depredations of the Plague effectively wiped out many
Kludge Too Far?)
of the practices, customs, and (not to put too fine a point on
Given the maturity of the DW, Hadoop is arguably much less
it) human stakeholders that might otherwise have contested
suited for this role. For all of its shortcomings, the data ware-
destabilization.
house is an inescapably pragmatic solution; (Contiued p21)
The Plague, then, cleared away the ante-status quo, creating
DM practitioners learned what works chiefly by figuring out
14 • rediscoveringBI Magazine • #rediscoveringBI
AUSTIN, TX
#sparkevent
April 29 - May 1
At the Omni Downtown in Austin
GET DIRECTIONS Day One | Designing Modern Data Platforms These sessions provide an approach to confidently assess and make architecture changes, beginning with an understanding of how data warehouse architectures evolve and mature over time, balancing technical and strategic value delivery. We break down best practices into principles for creating new data platforms.
Day Two | Modern Data Integration These sessions provide the knowledge needed for understanding and modeling data integration frameworks to make confident decisions to approach, design, and manage evolving data integration blueprints that leverage agile techniques. We recognize data integration patterns for refactoring into optimized engines.
Day Three | Databases for Analytics These sessions review several of the most significant trends in analytic databases challenging BI architects today. Cutting through the definitions and hype of big data in the market, NoSQL databases offer a solution for a variety of data warehouse requirements. Register now at: http://radiantadiantadvisors.com
CAN'T MAKE IT? Catch us in San Francisco from May 28-30. Registration opens April 22nd. Use the priority code ReBI to save $150
Featured Keynotes By:
Sponsored by:
John O’Brien
Dr. Robin Bloor
Founder and CEO
Co-Founder and Principal Analyst
Radiant Advisors
The Bloor Group
SIDEBAR
BIG DATA: BIG IMPACT [STEPHEN SWOYER] The most common big data use cases tend to be less sexy
other hand, tout libraries that they say can be used as
than mundane.
MapReduce replacements. The result, both vendors claim,
In fact, two use cases for which big data is today having a
is ETL processing that’s (a) faster than vanilla Hadoop
Big Impact have decidedly sexy implications, at least from
MapReduce and (b) orders of magnitude faster than tradi-
a data management (DM) perspective.
tional enterprise ETL.
Both use cases address long-standing DM problems;
This stuff is available now. In the last 12 calendar months,
both likewise anticipate issues specific to the age of big
both Informatica and Talend announced “big data” ver-
data. The first involves using big data technologies to
sions of their ETL technologies for Hadoop MapReduce;
super charge ETL; the second, as a landing zone – i.e. , a
Pervasive and SyncSort have marketed Hadoop-able ver-
general-purpose virtual storage locker – for all kinds of
sions of their own ETL tools (DataRush and DMExpress,
information.
respectively) for slightly longer. In every case, big data
Of the two, the first is the more mature: IT technologists
ETL tools abstract the complexity of Hadoop: ETL work-
have been talking up the potential of super-charged ETL
flows are designed in a GUI design studio; the tools them-
almost from the beginning.
selves generate jobs in the form of Java code, which can
Back then, this was framed largely in terms of MapReduce,
be fed into Hadoop.
the mega-scale parallel processing algorithm popular-
Just because the technology’s available doesn’t mean
ized by Google. Five years on, the emphasis has shifted
there’s demand for it.
to Hadoop itself as a platform for massively parallel ETL
Parallel processing ETL technologies have been available
processing.
for decades; not everybody needs or can afford them,
The rub is that performing stuff other than map and
however. David Inbar, senior director of big data products
reduce operations across a Hadoop cluster is kind of a
with Pervasive, concedes that demand for mega-scale ETL
kludge. (See sidebar: A KLUDGE TOO FAR?.)
processing used to be specialized.
However, because ETL processing can be broken down
At the same time, he says, usage patterns are changing;
into sequential map and reduce operations, data integra-
analytic practices and methods are changing. So, too, is
tion (DI) vendors have managed to make it work. Some DI
the concept of analytic scale: scaling from gigabyte-sized
players – e.g. , Informatica, Pervasive Software, SyncSort,
data sets to dozens or hundreds of terabytes – to say
and Talend, among others – market ETL products for
nothing of petabytes – is an increase of several orders
Hadoop. Both Informatica and Talend – along with ana-
of magnitude. In the emerging model, rapid iteration is
lytic specialist Pentaho Inc. – use Hadoop MapReduce to
the thing; this means being able to rapidly prepare and
perform ETL operations. Pervasive and SyncSort, on the
crunch data sets for analysis.
16 • rediscoveringBI Magazine • #rediscoveringBI
Just because the technology's available doesn't mean there's demand for it."
SIDEBAR:
A KLUDGE TOO FAR? [STEPHEN SWOYER]
Nor is analysis a one-and-done affair, says Inbar: it’s itera-
The problem with MapReduce – to invoke a shopworn
tive.
cliché – is that it’s a hammer.
“What really matters is not so much if it uses MapReduce
From its perspective, any and every distributed processing
code or if it uses some other code; what really matters is
task wants and needs to be nailed. If Hadoop is to be a
does it perform and does it save you operational money –
useful platform for general-purpose parallel processing,
and can you actually iterate and discover patterns in the
it must be able to perform operations other than synchro-
first place faster than you would be able to otherwise?” he
nous map and reduce jobs.
asks. “It’s always possible to write custom code to get stuff
The problem is that MapReduce and Hadoop are tightly
done. Ultimately it’s a relatively straightforward [proposi-
coupled: the former has historically functioned as paral-
tion]: [manually] stringing together SQL code [for tradi-
lel processing yin to the Hadoop Distributed File System’s
tional ETL] or Java code [for Hadoop] can work, but it’s not
storage yang.
going to carry you forward.”
Enter the still-incubating Apache YARN project (YARN is
However, one of the data warehouse’s (DW) biggest selling
a bacronym for “Yet Another Resource Negotiator”), which
points is also its biggest limiting factor.
aims to decouple Hadoop from MapReduce.
The DW is a schema-mandatory platform. It’s most comfort-
Right now, Hadoop’s Job Tracker facility performs two
able speaking SQL. It uses a kludge – i.e. , the binary large
functions: resource management and job scheduling;
object (BLOB) – to accommodate unstructured, semi-struc-
YARN breaks Job Tracker into two discrete daemons.
tured, or non-traditional data-types. Hadoop, by contrast, is
From a DM perspective, this will make it possible to
a schema-optional platform.
perform asynchronous operations in Hadoop; it will also
For this reason, many in DM conceive of Hadoop as a virtual
enable pipelining, which – to the extent it’s possible in
storage locker for big data.
Hadoop today – is typically supported by vendor-specific
“You can drop any old piece of data on it without having to
libraries.
do any of the upfront work of modeling the data and trans-
YARN’s been a long time coming, however: it’s part of
forming it [to conform to] your data model,” explains Rick
the Hadoop 2.0 framework, which is still in development.
Glick, vice president of technology and architecture with
Given what’s involved, some in DM say YARN’s going to
analytic discovery specialist ParAccel. “You can do that [i.e. ,
need seasoning before it can be used to manage mission-
transform and conform] as you move the data over.”
critical, production workloads.
At a recent industry event, several vendors – viz. ,
That said, YARN is hugely important to Hadoop. It has
Hortonworks, ParAccel, and Teradata, – touted Hadoop as
support from all of the Hadoop Heavies: Cloudera, EMC,
a point of ingest for all kinds of information. This “landing
Hortonworks, Intel, MapR, and others.
zone” scenario is something that customers are adopting
“It feels like it’s been coming for quite a while,” concedes
right now, says Pervasive’s Inbar; it has the potential to be
David Inbar, senior director of big data products with data
the most common use case for Hadoop in the enterprise.
integration specialist Pervasive Software. “All of the play-
“Before you can do all of the amazing/glamorous/ground-
ers … are in favor of it. Customers are going to need it. If
breaking analytical work … and innovation, you do actually
as a sysadmin you don’t have a unified view of everything
have to land and ingest and provision the data,” he argues.
that’s running and consum[ing] resources in your environ-
“Hadoop and HDFS are wonderful in that they let you [store
ment, that’s going to be suboptimal,” Inbar continues. “So
data] without having predefined what it is you think you’re
YARN is a mechanism that’s going to make it easier to
going to get out of it. Traditionally, the data warehouse
manage [Hadoop clusters]. It’s also going to open up the
requires you to predefine what you think you’re going to
Hadoop distributed data and processing framework to a
get out of it in the first place.”
wider range of compute engines and paradigms.” rediscoveringBI Magazine • #rediscoveringBI • 17
FEATURES
3
[Three things that make a “big” difference when implementing big data.]
WAYS TO BRING
BI AND BIG DATA TOGETHER
JOHN O’BRIEN ERHAPS YOUR ORGANIZATION IS hearing the buzz
affordably and easily exploit big data sets, and sometimes go
about big data and business analytics creating value,
even further with Cloud implementations. Gleaning insights
transforming businesses, and gaining new insights. Or,
from these vast data sets requires a completely different type
perhaps you’ve spent some time and resources during
of data platform and programming framework for creating
the past year reading publications or attending industry
insightful analytic routines.
events, or even launched a small scale “big data pilot” exper-
Analytics is not new to BI: the ability to execute statistical
iment. In any case, if you’re at the early stages of your com-
models and identify hidden patterns and clusters of data
pany’s journey into big data, there are some important con-
has long allowed for better business decision-making and
versations to keep in mind as you continue your path to
predictions. What these new BI analytic capabilities have
bringing business intelligence (BI) and your company’s big
in common is that they work beyond the capabilities of SQL
data together.
1.
Big Data and the Business Intelligence Program
statements that govern relational database management systems to execute embedded algorithms. No longer are we constrained to sample data sets; advanced analytic tools can now execute their algorithms in parallel at the data layer. For
For the most part, big data environments are those that adopt
many years, data has been extracted from data warehouses
Apache’s Hadoop or one of its variants (like Cloudera, MapR,
into flat files to be executed outside the RDBMS by data min-
or HortonWorks) or the NoSQL databases (like MongoDB,
ing software packages (like SPSS, SAS, and Statistica). Both
Cassandra, or HBase with Hadoop). These data stores have
traditional capabilities -- reporting and dimensional analysis
massive scalability and unstructured data flexibility at the
– have always been needed, along with what is now being
best price. No longer
called “Analytics” in today’s BI programs.
reserved for the biggest IT shops, the democratization of big
Big data analytics are another one of the several BI capabili-
data comes from Hadoop’s ability to enable any company to
ties required by the business. And, even when big data is not
18 • rediscoveringBI Magazine • #rediscoveringBI
“
...the democratization of big data comes from Hadoop’s ability to enable any company to affordably and easily exploit big data sets”
so “big” there are other reasons why Hadoop and NoSQL are
them? Do you provide a semantic layer over both of them for
better solutions than RDBMSs, or cubes. Most common is when
users or between the data stores?
working with the data is beyond the capabilities of SQL and
Most companies are moving forward recognizing that both
tends to be more programmatic. The second most common
environments serve different purposes, but are part of a com-
is when the data be captured is constantly changing or is an
plete BI data platform. The traditional hub and spoke archi-
unknown structure, such that a database schema is difficult to
tecture of data warehouses and data marts is evolving into a
maintain. In this scenario, schema-less Hadoop and key value
modern data platform of three tiers: big data Hadoop, analytic
data stores are a clear solution. Another is when the data
databases, and the traditional RDBMS. Industry analysts are
needs to be stored in various data types, such as documents,
contemplating whether this is a two-tier or three-tier data
images, videos, sounds, or other non-record like data (think,
platform, especially given the expected maturing of Hadoop
for example, about the metadata to be extracted from a photo
in the coming years; however, it is safe to say that analytic
image, like date, time, geo-coding, technical photography data,
databases will be the cornerstone of modern BI data platforms
meta-tags, and perhaps even names of people from facial rec-
for years to come.
ognition). Most company big data environments today are less
The analytic database tier is really for highly-optimized or
than ten terabytes and fewer than eight nodes in the Hadoop
highly-specialized workloads -- such as columnar, MPP, and in-
cluster because of the other “non-bigness” requirements.
memory (or vector based) -- for analytic performance, or text
2.
Data Platform = Big Data + Data Warehouse
analytics and graph databases for highly-specialized analytic capabilities. Big data governance and analytic lifecycles would encompass semantic and analytic discoveries made in Hadoop,
You might have already discussed what to do now that you
combined with traditional reference data, and then be migrat-
have both a Hadoop and data warehouse system. Should the
ed and productionized in a more controlled, monitored-- and
data warehouse be moved into Hadoop, or should you link
accessible -- analytics tier.
rediscoveringBI Magazine • #rediscoveringBI • 19
3.
Determining Access
Apache “Hive” is sometimes called the “data warehouse application on top of Hadoop” as it enables a more generalized access capability for everyday users with its familiar Hive-QL format that SQL-familiar users can understand. Hive provides a semantic layer that allows for the definition of familiar tables and columns mapped to key-value pairs found in Hadoop. With virtual tables and columns in places, Hive users can write HQL to access data within the Hadoop environment. More recently, has been the release of “HCatalog,” which is making its way into the Apache Hadoop project. HCatalog is the semantic layer component similar to Hive, and allows for the definition of virtual tables and columns for communication with any application, not just HiveQL. Last summer, data visualization tool Tableau allowed users to work with and visualize Hadoop data for the first time via HCatalog. Today, many analytic databases are allowing users to work with tables that are views to HCatalog and Hadoop data. Some vendors also choose to leverage Hive as access to Hadoop data by leveraging its semantic layer and converting user SQL statements into HQL statements. Expect more BI vendors to follow suit and enable their own connectivity to Hadoop. There are emerging new agile analytic development methodologies and processes that enable the iterative and agile nature of analytics in big data environments for discovery, then couple that with data governance procedures to properly move the analytic models to a faster analytic database with operational controls and access. In this model, companies can store big data cheaply until its value can be determined, and then move it to
“
While big data has come a long way in just a short amount of time, it still has a long road ahead as an industry, as a maturing technology, and as best practices are realized and shared."
the appropriate production and valued data platform tier. This could be a map-reduce extract to a relational database data mart (or cube), or this could be executing the analytic program in an MPP, columnar, or in-memory high-performance database.
More to Come
While big data has come a long way in just a short amount of time, it still has a long road ahead as an industry, as a maturing technology, and as best practices are realized and shared. Don’t compare your company with mega e-commerce companies (like Yahoo, Facebook, Google, or LinkedIn) who live and breathe big data as a part of their mission critical core business functions for many years already. Rather, think of your company as the other 99% of companies -- small and large -- found in every industry exploring opportunities to unlock the hidden value in big data on their own. These companies typically already have a BI program underway, but now must grapple with the challenge of maintaining BI delivery from structured operational data combined with the new integration of big data platforms for business analysts, customers, and internal consumers. Share your comments >
John O’Brien is the Principal and CEO of Radiant Advisors, a strategic advisory and research firm that delivers innovative thought-leadership, publications, and industry news.
(Continued from p12) what doesn’t work. The genealogy of the data warehouse is
helped to democratize – in the guise of Hadoop. Aster and
encoded in a double-helix of intertwined lineages: the first is
Greenplum effectively excised MapReduce from Hadoop and
a lineage of failure; the second, a lineage of success born of
implemented it (as one algorithm among others) inside their
this failure. The latter has been won – at considerable cost –
massively parallel processing (MPP) database engines; this
at the expense of the former. A common DM-centric critique
gave them the ability to perform mapping/reducing opera-
of Hadoop (and of big data in general) is that some of its sup-
tions across their MPP clusters, on top of their own file sys-
porters want to throw out the old order and start from scratch.
tems. Hadoop and its Hadoop Distributed File System (HDFS)
As with the chevauchée – which entailed the destruction of
were nowhere in the mix.
infrastructure, agricultural sustenance, and formative social
It was, however, a big part of the backstory. Let’s turn the clock
institutions – many in DM (rightly) see in this a challenge to
back just a bit more, to early-2008, when Greenplum made a
an entrenched order or configuration.
move which hinted at what was to come – announcing API-
They likewise see the inevitability of avoidable mistakes –
level support for Hadoop and HDFS. In this way, Greenplum
particularly to the extent that Hadoop developers are con-
positioned its MPP appliance as a kind of choreographer for
temptuous of or indifferent to the finely-honed techniques,
external MapReduce jobs: by writing to its Hadoop API, devel-
methods, and best practices of data management.
opers could schedule MapReduce jobs to run on Hadoop and
“Reinvention is exactly it, … [but] they aren’t inventing data
HDFS. The resulting data, data sets, or analysis could then be
management technology. They don’t understand data manage-
recirculated back to the Greenplum RDBMS.
ment at all,” argues industry veteran Mark Madsen, a principal
Today, this is one of the schemes by which many in DM
with information management consultancy Third Nature Inc.
would like to accommodate Hadoop and big data. The differ-
Madsen is by no means a Hadoop hater; he notes that, as a
ence, at least relative to half a decade ago, is a kind of frank
schema-optional platform, Hadoop seems tailor-made for the
acceptance of the inevitability – and, to some extent, of the
age of big data: it can function as a virtual warehouse – i.e.,
desirability – of platform heterogeneity. Part of this has to do
as a general-purpose storage area – for information of any
with the “big” in big data: as volumes scale into the double-
and every kind.
or triple-digit terabyte -- or even into the petabyte – range,
The DW is schema-mandatory; its design is predicated on
technologists in every IT domain must reassess what they’re
a pair of best-of-all-possible-worlds assumptions: firstly,
doing and where they’re doing it, along with just how they
that data and requirements can be known and modeled in
expect to do it in a timely and cost-effective manner. Bound
advance; secondly, that requirements won’t significantly
up with this is acceptance of the fact that DM can no longer
change. For this very reason, the data warehouse will never be
simply dictate terms: that it must become more responsive to
a good general-purpose storage area. Madsen takes issue with
the concerns and requirements of line-of-business stakehold-
Hadoop’s promotion as an information management platform-
ers, as well as to those of its IT peers; that it must open itself
of-all-trades.
up to new types of data, new kinds of analytics, new ways of
Proponents who tout such a vision “understand data process-
doing things.
ing. They get code, not data,” he argues. “They write code and
“The overall strategy is one of cooperative computing,”
focus on that, despite the data being important. Their ethos
explains Rick Glick, vice president of technology and archi-
is around data as the expendable item. They think [that] code
tecture with analytic discovery specialist ParAccel Inc. “When
[is greater than or more important than] data, or maybe [they]
you’re dealing with terabytes or petabytes [of data], the chal-
believe that [even though they say] the opposite. So they do
lenge is that you want to move as little of it as possible. If
not understand managing data, data quality, why some data is
you’ve got these other [data processing] platforms, you inevi-
more important than other data at all times, while other data
tably say, ‘Where is the cheapest place to do it?’” This means
is variable and/or contextual. They build systems that pre-
proactively adopting technologies or methods that help to
sume data, simply source and store it, then whack away at it.”
promote agility, reduce latency, and empower line-of-business
The New Pragmatism
Initially, interest in Hadoop took the form of dismissive assessments. A later move was to co-opt some of the key technologies
users. This means running the “right” workloads in the “right” place, with “right” being understood as a function of both timeliness and cost-effectiveness. Share your comments >
associated with Hadoop and big data: almost five years ago, for example, Aster Data Systems Inc. and Greenplum Software
Stephen Swoyer is a technology
(both companies have since been acquired by Teradata
writer with more than 15 years of
and EMC, respectively) introduced in-database support for
experience. His writing has focused
MapReduce, the parallel processing algorithm that search
on business intelligence and data
giant Google had first helped to popularize, and which Yahoo
warehousing for almost a decade.
rediscoveringBI Magazine • #rediscoveringBI • 21
CHADVISED OPRESEAR REARCHAD CHADVISED DVISEDEVE DEVELOPR ABOUT RADIANT ADVISORS R E S E A R C H . . . A D V I S E . . . D E V E L O P. . . Radiant Advisors is a strategic advisory and research firm that networks with industry experts to deliver innovative thought-leadership, cutting-edge publications, and indepth industry research.
v i s i t w w w. r a d i a n t a d v i s o r s . c o m F o l l o w u s o n Tw i t t e r ! @ r a d i a n t a d v i s o r s