Radiant Advisors Publication
rediscoveringBI EVENT-DRIVEN ARCHITECTURES THE SHIFTING LANDSCAPE
TIME OF RECKONING SELECTING THE RIGHT BI SOLUTION
AN ARCHITECTURAL COLLISION COURSE
TYING GOALS TO REQUIREMENTS
ARE DATA MODELS DEAD? THE REAL DEBATE
03
MARCH 2013 ISSUE 6
SHIFTING GEARS WITH MODERN BI ARCHITECTURES
rediscoveringBI
March 2013, Issue 6
SPOTLIGHT
[P4] Shifts in the Architectural Landscape Different technology vendors offer different perspectives on what big data means, but all of them tip their hat to the fact that the volumes of data that companies are gathering and analyzing can be big.
[By Dr. Robin Bloor] FEATURES
[P8]
[P12]
[P16]
Today’s BI and DW platforms are highly
specific questions to be asked in order to
debate should be about how semantics
adapted to their environments; however,
select the right software and hardware
should be analyzed or discovered and where
they are less suited outside of these envi-
to optimize BI use and align with broader
that definition should be maintained for
ronments.
business goals.
data going forward.
[By Stephen Swoyer]
[By Lyndsay Wise]
[By John O’Brien]
Time for an Architectural Reckoning
Selcting the Right BI Solution There are
EDITOR’S PICK
Why Data Models Are Dead The real
[P7]
[P15]
Signal and the Noise is as interest-
seem to be betting on fit-for-
The Signal and The Noise The
Vendor Free for All? Vendors
ed in the predictive power of sta-
purposive platforms as the best
tistics as it is in the human ability
response to selection pressure --
to comprehend probabilistically.
not a single platform, but the right
[By Lindy Ryan]
platform for the right purpose. SIDEBAR
2 • rediscoveringBI Magazine • #rediscoveringBI
[By Stephen Swoyer]
FROM THE EDITOR Welcome to Rediscovering BI, Radiant Advisors’ monthly eMagazine featuring articles from leading names in today’s BI industry and other new voices we’ve discovered innovating in the BI community. Today’s BI environment is all about rethinking how we do BI and imagining new, innovative way to approach BI at large. The goal of Rediscovering BI is to continue growing as a leading industry publication that challenges readers to rethink, reexamine, and rediscover the way they approach business intelligence. We publish pieces that provide thought-leadership, foster innovation, challenge the status quo, and inspire you to rediscover BI. This month we are excited to “shift gears” and debut our new PDF edition of the eMagazine. In fact, “shifting gears” is the focus of this month’s issue of Rediscovering BI, and with contributions from Dr. Robin Bloor, Stephen Swoyer, Lyndsay Wise, and John O’Brien, this month’s issue explores the shift in today’s architectural landscape and what that means for data models and the unfolding architectural reckoning.
Editor In Chief Lindy Ryan lindy.ryan@radiantadvisors.com
Art Director Brendan Ferguson
Enjoy!
Lindy R yan
brendan.ferguson@radiantadvisors.com
Distinguished Writer Stephen Swoyer stephen.swoyer@gmail.com
Contributor Dr. Robin Bloor robin.bloor@bloorgroup.com
Contributor Lyndsay Wise lwise@wiseanalytics.com
Contributor John O’Brien john.obrien@radiantadvisors.com
Lindy Ryan Editor in Chief Radiant Advisors
SUBSCRIBE rediscoveringBI Magazine • #rediscoveringBI • 3
SPOTLIGHT
SHIFTS IN THE ARCHITECTURAL LANDSCAPE DR. ROBIN BLOOR
E
VERYONE KNOWS THAT BIG DATA is a real and growing phenomenon. Different technology vendors offer different perspectives on what big data means, but all of them tip their hat to the fact that the volumes of data that many companies are gathering and analyzing can be big.
When we talk of big data, we also often hear of Hadoop being associated with it in some way. It’s not that everyone is using Hadoop in earnest yet. Most companies I’ve talked to are experimenting (or doing something fairly limited) at the moment, but nearly everyone sees Hadoop as a component of their future software stack – they are just not entirely sure of the role it will play. But whatever evolves, the game is up for the previously dominant business intelligence (BI) architecture, which can be summarized as: operational systems -> data ingest -> data warehouse -> data marts -> desktop tools and databases. In my opinion, the big data trend represents the early stages of the emergence of Event-Driven Architecture.
Event-Driven Systems For decades, we built operational systems that were characterized by the idea of transactions. Transactions corresponded to the events that changed the business: receiving deliveries, paying invoices, placing orders, and so on. We built most of the systems that do such things a long time ago. Since that time, we have expanded BI software from its early days as a general reporting capability to conduct trend analysis, create dashboards, and monitor key performance capabilities of the business. Nowadays, monitoring the business usually involves gathering both transactional data and event data that is not transactional. With the advent of big data, we have seen the expansion of data used by businesses to include machine generated and log data, social media data, web-based data, mobile data, and external data streams. If we examine this data, we discover that very little of it is, in fact, transactional – almost all of it is simply event data. We are gradually moving toward viewing events as the fundamental atoms of business activity. Of course, event-based systems are not entirely new. The High Frequency Trading (HFT) systems used by investment banks are fundamentally eventbased. Internet companies provide many examples of event processing, and they have led this trend. Web retail sites interact with the customer entirely through the web browser,
4 • rediscoveringBI Magazine • #rediscoveringBI
“
The game is up for the previously dominant business intelligence (BI) architecture”
whether the customer is merely viewing
analytics, but they could capture the data
First of all, a fully event-driven envi-
products or actually buing. This began
they wanted to examine as a natural part
ronment will involve event flows that
with web sites tracking the behavior of
of their business process.
are captured from various sources and
users from web logs, but it has gradu-
Other businesses (transportation, brick
analyzed in flight in order to respond as
ally evolved into capturing and analyzing
and mortar retail, and health care) may
swiftly as possible to whatever the data
everything any customer does, or did:
have to deploy embedded chips to cap-
reveals. Consider this as a new layer of
how they arrived on the site, what links
ture some of the event data that interests
BI software, built to either inform people
they clicked on, how long they stayed on
them. And, in time they will – that is one
(or programs) of trends it sees or triggers
any given page, what they searched for,
of the next steps toward the emergence
that demand action. When the latency
what advertisements were presented to them, and so on.
of event-driven businesses.
between receiving data and taking action needs to be very low, think of this as real-
nizations that could, with little effort,
Event-Driven Architecture
capture and analyze most of the event
While we do not intend to define what
think of event-driven architecture as
data of a customer interaction. They
an Event-Driven Architecture might turn
closer to – but not necessarily the same
may have had to invest in appropriate
out to be, we can discuss some of the
as – traditional BI. Some of this event
computer technology to do the specific
features that it will inevitably involve.
data might come from within the busi-
Internet businesses were the first orga-
time operational intelligence (OI). Where it does not require prompt action,
rediscoveringBI Magazine • #rediscoveringBI • 5
“
For all intents and purposes, the data reservoir is the data warehouse for event data”
ness: RFID data, for
performing scale-out database to help with the data analysis,
example, that is moni-
but, if you do, it becomes a data analysis mart.
toring the movement of
For all intents and purposes, the data reservoir is the data ware-
goods between ware-
house for event data. It doesn’t have to be Hadoop, of course. It
houses. Some of the
could be one of the new scale-out NoSQL databases, many of
event data might come
which don’t impose structure on the data in the way that tradi-
from suppliers, custom-
tional databases and data warehouses do.
ers, or potential custom-
Does this mean that the old data warehouse can be retired?
ers. Some data might
In theory, yes, but in practice it is unlikely. There will be many
be marketing statistics
legacy applications that depend on it, and it may not be worth
relating to advertising
replacing them.
campaigns, or it might be social media data. Nowadays
“corporate
data” is any data that might affect the business, including meteorological data, transport information, stock market and commodity price data, and so on. The point is that we do not necessarily know what data the business may suddenly become interested (and maintain interest) in.
In Summary I’ve just brushed the surface of this topic in this article. Nevertheless, I hope you can see the way that the IT industry is drifting. The issue with big data is less likely to be its volume than the simple fact that it is new data, possibly inconveniently structured data, and, most importantly of all, it is event data. Share your comments >
As such, we cannot define database structures to accommodate it ahead of time. This is one of the reasons that the loose structure of Hadoop quite attractive: it is a data reservoir that doesn’t require you to define the metadata until you want to use it. Naturally, data analysis will be carried out on such captured data, otherwise there would be no point in capturing it and storing it. If we imagine that a business has built all the transactional applications it needs to build, then none of the events it captures give rise
Dr. Robin Bloor is co-founder and principal analyst with The
to transactions. The only thing to do with such data is either report
Bloor Group. He has more than 25 years of experience in the
it – via a dashboard, perhaps – or analyze it. You may require a fast
world of data and information management.
6 • rediscoveringBI Magazine • #rediscoveringBI
EDITOR’S PICK
Check out our eBookshelf
THE SIGNAL AND THE NOISE
“E
LINDY RYAN
The Signal and the Noise is as interested in
the predictive or disclosive power of statistics as it is in the human ability to comprehend – to think – probabilistically.” VEN IF YOU HAVEN’T HEARD of
might be called the human tone-deaf-
Nate Silver’s The Signal and the
ness to probabilism. Indeed, many of the
Noise, you’ve probably heard of
“problems” Silver describes are actually
Nate Silver. His hybrid predictive
products of the human misapplication or
model didn’t exactly predict the outcome
misunderstanding of statistical concepts
of the 2012 Presidential race, but it was
and methods. He discusses this idea in
shockingly close; it was much closer,
the construct of the “prediction paradox,”
in fact, than the predictions of many
in which he says the more humility we
television- and print-media pundits. By
have about our ability to make predic-
Election Night, the divergence of Silver’s
tions – as well as learn from our mistakes
predictive model – along with those
– the more we can turn information into
of Sam Wang at the Princeton Election
knowledge and data into foresight.
Consortium and Drew Linzer of Emory
To be succinct, as Silver says in his
University – had set up a kind of show-
“Introduction” to The Signal and the Noise:
down between new-fangled statistics-
“We face danger whenever information
and model-driven predictive methods, on
growth outpaces our understanding of
the one hand, and old-school, horserace-
how to process it.”
style prognosticating, on the other.
Much of what is happening in informa-
Silver’s book is in a sense a meditation
tion management right now focuses on
on the promise of statistics and the limits
making vast (and growing) volumes of
of human understanding. The two aren’t
information intelligible or comprehen-
necessarily a neat fit. The Signal and the
sible. But the human brain is still the
Noise is as interested in the predictive
primary site of analysis and synthesis. As
or disclosive power of statistics as it is
the Election of 2012 demonstrated, our
in the human ability to comprehend – to
brains are lagging behind the statistical
think – probabilistically. Silver doesn’t
concepts and methods that might help
set up an infallible statistical strawman,
us achieve greater clarity and insight
either; he’s as alert to the misuse of sta-
– better understanding – of our world.
tistical or predictive models; to flawed
Silver’s book is a wake-up call for us to
assumptions; to insufficient data (or to
get cracking.
the impossibility of ever being able to
Share your comments >
have “sufficient” data) as he is to what
Visit the Radiant Advisors eBookshelf at www.radiantadvisors.com/ebookshelf to see all Editor’s Pick titles
Lindy Ryan is Editor in Chief of Radiant Advisors. rediscoveringBI Magazine • #rediscoveringBI • 7
TIME FOR AN ARCHITECTURAL RECKONING STEPHEN SWOYER The DW and big data: Two platforms, two very different purposes…seemingly on a collision course.
T
ODAY’S BUSINESS INTELLIGENCE (BI) and data warehouse (DW) platforms are highly adapted to their environments; however, they are less suited to use outside of these environments. The same might be said for big data platforms, too.
The DW and big data: Two platforms, two very different purposes, two agendas – seemingly on a collision course. It isn’t so much a question of which platform vision will triumph, but of how two such different visions can be reconciled.
8 • rediscoveringBI Magazine • #rediscoveringBI
“
FEATURES
...the DW-driven status quo is not sufficient to address the reporting and analytic needs of the enterprise.” tus quo is not sufficient to address the reporting and analytic needs of the enterprise. Nor, for that matter, is its big data counterpart, which has emerged as a platform for highly scalable real-time processing. The data warehouse is a query platform par excellence; it excels at aggregating business “facts” out of queries. Big data platforms – the most heavily hyped of which is Hadoop – excel at data processing. Big data systems can process data of all kinds; they are schema-optional platforms. They likewise have the ability to perform complex analytics at close to real-time intervals. They can scale with staggering ease. In fact, as one prominent BI industry technologist argues, scaling out on Hadoop is – for all intents and purposes – “free.” The DW-driven BI and the emerging big data paradigms aren’t the only platforms that need to be reconciled. There’s also the OLTP database and application stack, which is both conceptually and operationally distinct from the DW and BI. Nor is that all: over the years, BI has gradually usurped analytic functionality unto itself; it’s undeniable, however, that most enterprises also play host to dedicated high-end analytics platforms – e.g., entrenched SAS, SPSS, or (more recently) R statistical and data mining practices. What’s more, several other nominally discrete platforms – for example, enterprise data archiving and information lifecycle management (ILM) – must likewise
New Synthesis In 1975, entomologist E.O. Wilson published his landmark Sociobiology, subtitling it: The New Synthesis. Wilson’s book argued for a multidisciplinary reconceptualization of behavior and social interaction. His claim — provocative, controversial, and to this day tendentious — was that human and animal behaviors must be understood as products of natural selection; that ethology, sociology, and psychology, considered by themselves, are not sufficient to account for the diversity of behaviors and social adaptations among both humans and animals. What was needed, Wilson argued, was a kind of reconciliation or synthesis; sociobiology, on Wilson’s terms, is this reconciliation: it describes a synthetic approach to the understanding of behavior – one that’s informed by evolutionary theory, population ecology, and other disciplines. Experts who work with BI platforms say we’ve arrived at a similar moment, at least with respect to the architectural hub of BI and decision support: the data warehouse. By itself, the DW-driven sta-
be reconciled or synthesized in any Architectural Reckoning. So, too, must specialty systems, including dedicated graphing databases and legacy, non-relational data sources – such as the vast volumes of information stored in flat file, VSAM, IMS, Adabase, and other mainframe-bound systems. The problem isn’t that we lack for potential solutions; some technological futurists would have us throw everything into Hadoop, after all. It’s rather that we don’t yet have a Sociobiology-like vision of what a post-DW-driven architecture might look like. There’s no single, synthetic, vendor-neutral vision that reconciles the still-viable DW-driven BI platform with the emerging big data platform with the dedicated statistics and data mining platform with the data archiving and ILM platforms with the specialty niche platforms. Thanks to Wilson’s project, however, we have a criterion for undertaking such an Architectural Reckoning: viz., selection pressure, the engine of natural selection. rediscoveringBI Magazine • #rediscoveringBI • 9
Call it fit-for-purpose; call it using the best tool for the job; call
inside a hub-and-spoke architecture” says industry luminary
it supporting and maintaining each platform in the “habitat” for
Claudia Imhoff, CEO of information management consultancy
which it’s best suited and adapted. The criterion for Architectural
Intelligent Solutions Inc. “There are many valid and good
Reckoning is selection pressure – i.e., what works best and
[kinds of] analytics that belong outside that architecture.”
why. The product of Architectural Reckoning will be the New
Imhoff, of course, is almost as much a part of the history of
Synthesis.
the DW space as are seminal technologists Devlin, Murphy,
The (Un)Making of the Platform Status Quo A quarter of a century ago, Barry Devlin and Paul Murphy published “An architecture for a business and information system,” the seminal paper in which they outlined the conceptual underpinnings of the classic DW-driven BI architecture. Within a couple of years, Ralph Kimball and Bill Inmon had begun implement-
Inmon, and Kimball. If she didn’t conceive of or build the first DW, she got in at the ground floor. But Imhoff thinks the case for an Architectural Reckoning – for a New Synthesis of some kind – is irrefragable: “I think part of it is that we are right now in a very, very disruptive period of a lot of new technologies flooding in, absolutely flooding in, to business intelligence. What’s proved to be most disruptive [to the DW status quo] is this issue of real-time analytics.” The data warehouse
ing physical systems
simply cannot “do”
based on the Devlin/ Murphy
real-time
paradigm.
Since then, BI has
its core, its design
evolved as a more or
philosophy embodies
less straightforward
assumptions about a
expression of Devlin/
static world – i.e., a
Murphy’s foundation-
world in which data
al DW-driven archi-
and requirements can
tecture. One upshot
be known and mod-
of this is that today’s
eled in advance; a
BI and DW platforms
world in which (more-
are highly adapted to their
over)
environments.
change. This assump-
and they deliver sig-
tion is at odds with
nificant value — in But an emerging consensus says that the traditional DW-driven BI architecture simply cannot be all things to all information consumers; that DW-driven BI cannot adapt (or cannot be adapted) to the selection pressures of the information enterprise. For two decades, the data warehouse and its enabling ecosystem of BI tools functioned as the organizing focus of information management and decision-support in the enterprise. In other words, the DW was able to effectively dictate the conditions of its own
– is exploded by – the vicissitudes of business and human events. “If all analytics used static data, then we could pull them in[to the DW] … and analyze the heck out of them. What’s changed is that we now have the capability to analyze data on-the-fly,” she says. “That’s caused significant disruption. Now we have to accept that not all analytics belong inside the BI architecture — or else the BI architecture has to embrace and extend to address those analytics as well.”
environment. It was likewise able to adapt on its own terms.
Not so Fast?
and it arguably hasn’t been for the last half-decade.
WhereScape Inc., isn’t quite convinced.
With the emergence of BI discovery, big data, and the real-time, mega-scale data processing capacity of Hadoop, the data warehouse finds itself inhabiting a kind of micro-climate: a habitat or environment in which, yes, it still delivers compelling value, but outside of which its lack of adaptability – its limitations – can no longer be ignored. Its habitat is shrinking – and there’s plenty of disruption all around it. “It took me a long time, but I like to joke that after several therapy sessions I’m now able to say that not all analytics belong
10 • rediscoveringBI Magazine • #rediscoveringBI
requirements
do not significantly
They make sense —
the context of these environments.
analytics,
Imhoff concedes. At
Michael Whitehead, CEO of data warehousing software vendor Perhaps we do need to do some architectural tweaking, Whitehead concedes; but shouldn’t we first do what we can to fix our bread-and-butter data warehouse-driven BI programs? It isn’t as if most BI programs are perfect, let alone optimal; in painful point of fact, Whitehead maintains, most BI programs under-perform. In this context, Whitehead sees the issue of Architectural Reckoning as a distraction. “We haven’t solved the basic problems yet. What’s happening at the edge is all very exciting, but at the core it still takes far
“
Architectural Reckoning is about which platform works best for which purpose” too long to build a basic data warehouse that is changeable and
“I’m not convinced that it’s the best tool for querying a known
just works,” he says.
set of data for a known set of reports. It isn’t going to uni-
Of course, WhereScape is a provider of data warehouse soft-
formly smoke the business intelligence and data warehousing
ware tools; it has an undeniable stake in the DW status quo.
world on all counts after just half a decade of people tinkering
Whitehead concedes as much, but argues that his company’s role
around with it. But people need to realize that even if a lot of
in any New Synthesis would look a lot like it does now. “Even if
the [Hadoop-related] BS is wrong, this is still a transformative
you start with the big data platforms, at some point, you’re going
technology.”
to need to bring data together in a repeatable way and [you’re going to need to] do it consistently. You’re going to need to be able to materialize [it] and report on it. You’re going to want to persist it: it’s just a natural way to answer a set of questions,” he argues. “You just always end up at the same point: it’s natural to have a repository of data that’s materialized that you maintain for [addressing] a certain set of problems.” As a case in point, consider Facebook’s come-to-DW about-face. Yes, it’s true: Facebook is building itself a data warehouse, chiefly because it needs repeatability and consistency. Industry veteran Scott Davis, founder and CEO of information discovery and collaboration specialist Lyzasoft Inc., stakes out a position at the opposite end of the spectrum from Whitehead’s essentialism. As Davis sees it, Hadoop is a hugely transformative technology. It solves the distributed compute problem; it solves the distributed storage problem: it solves the problem of inexpensively scaling workloads of all kinds across massive compute and storage resources. More to the point, Davis argues, Hadoop is already being used to host traditional data warehouse workloads – including data integration (DI) and ETL jobs. “I think that staging ETL and highend analytic queries in Hadoop — I think it’s just very, very difficult for any other technology to compete with Hadoop” in this
Conclusion: Who’s up for it? A vendor-neutral New Synthesis needs to be conceived, however; it’s awaiting both conceptual demonstration and physical implementation. We’ve seen some possible candidates: Radiant Advisors, for example, has its Modern Data Platforms Framework . Gartner Inc. analyst Mark Beyer has an architectural vision of his own, as do respected BI industry thought-leaders Wayne Eckerson (a principal with BI Leader Consulting) and Shawn Rogers (a principal with Enterprise Management Associates). Most of these visions are focused by – or have for a frame of reference – the DW, either as a starting point or as a point of departure: they’re inescapable products of data management-centric thinking. Architectural Reckoning is about which platform works best for which purpose. The synthetic architecture (the New Synthesis) that is its product won’t be organized or managed – won’t be understood – primarily on DM-like terms. Architectural Reckoning is a multi-disciplinary project, involving stakeholders from different factions across IT and the line of business. That this entire article has assessed this project from a DM-centric perspective is testament to just How Hard of a task it is. Who’s up for it? Share your comments >
respect, says Davis. Davis comes close to suggesting that Hadoop has the potential to become an all-in-one platform for enterprise information management; he stops just short of saying it, however. Nevertheless, he argues, to the degree that an Architectural Reckoning does take place, Hadoop will likely be big beneficiary.
Stephen Swoyer is a technology writer
“The beautiful thing about Hadoop is that it wasn’t built for a
with more than 15 years of experience.
broad array of purposes. Once you understand that, it starts to
His writing has focused on business
look beautiful. If you judge it by the ability to do everything
intelligence and data warehousing for
elegantly, you’re going to find it wanting,” he concedes.
almost a decade. rediscoveringBI Magazine • #rediscoveringBI • 11
FEATURES
SELECTING THE RIGHT BI SOLUTION
TO MEET ORGANIZATIONAL NEEDS AND STRATEGIC GOALS LYNDSAY WISE
C
ONDUCTING SOFTWARE evaluations and aligning business goals to solution capabilities are not easy tasks. Organizations constantly struggle to identify how to optimize business intelligence (BI) investments
by expanding current use, or by looking at new solutions to address business pains. Irrespective of desired BI use and areas being looked at (such as call center management, operations, financial reporting, etc.), there are specific questions that need to be asked in order to select the right software and hardware to optimize BI use and align these choices with broader strategic business goals. These questions identify the types of solutions businesses should consider. For instance, if looking to expand a BI investment and infrastructure it might be possible to use current hardware and data warehouse expenditures as a base. Instead of adding a new server, new data sources may be added to the current environment, with considerations for new solutions being relegated to front-end dashboarding and analytics. Identifying technical considerations and infrastructure requirements are essential because some businesses require real-time access or high-powered analytics, while others only need weekly or monthly reporting and multi-dimensional analysis. Such technical- and business-oriented questions are first steps to making sure BI platform decisions support the overall goals being achieved through BI adoption or expansion.
12 • rediscoveringBI Magazine • #rediscoveringBI
Understanding Goals Different organizations look to business intelligence to meet a variety of business-oriented goals. Some businesses want to align technology use with overall strategy, while others are looking for greater insights into customers, or a more efficient supply chain. The end goal will affect any software selection focus. And, even if multiple goals are identified, solution capabilities — and the type of applications selected –need to properly support the goals identified as the bottom line. When identifying the BI mission, ask questions like:
1.
What is the end goal? This might seem simple, but it requires identifying what the organization needs to achieve to make this iteration of BI a success.
Is this better visibility into marketing initiatives, an increase in lead generation by 15%, better data quality management, etc.?
2.
Which stakeholders are involved? Aside from the development of business rules and new algorithms, the people using the tools will determine
the level of interactivity and self-service capabilities, and this in turn may limit the types of solutions considered.
3.
How do these goals align with required metrics? This aspect requires moving beyond simple business rules and developing metrics that will
support overall goals. Although industry metrics exist, most
businesses will still need to develop
between business units and IT depart-
customized metrics to tailor analytics to
ments are essential. Latency, process-
individualized needs.
ing power, storage, and APIs are just
4.
What are the strategic
some of the considerations that require
goals the company hopes
an understanding of the business and
to achieve, and what busi-
what it hopes to achieve. To understand
ness processes and information points
the breadth of requirements, a broader
are required to support these efforts?
evaluation of business needs is required
Developing the link between process
— including identifying how the right BI
management and required data sources
infrastructure can support BI
helps identify how data can support busi-
related goals.
ness functions.
Infrastructure Considerations
Tying Goals Into Business Requirements In general, three types of information
Developing the right infrastructure is
are required for business intelligence
important when looking at long-term
success:
viability and scalability. In the past, many
-BI purpose, or, the overall goal align-
IT departments were in charge of design-
ment between business entities and BI
ing and maintaining a BI platform on
delivery.
their own. Now, due to the diversity in the market and the different ways in
-Technical infrastructure that supports the business needs.
which business users need to access and
-Business requirements that address
interact with their data, collaboration
challenges and gaps within information
“
Different organizations look to business intelligence to meet a variety of business-oriented goals”
rediscoveringBI Magazine • #rediscoveringBI • 13
visibility and analytics access. All of these areas relate to (or support) business requirements and are the basics required when evaluating BI offerings. For instance, understanding customers better and identifying patterns and opportunities can include looking at several different customer access points. Looking at account information, demographics, accounts receivables, support, etc. that may exist across multiple data sources can provide insight into the lifetime value of the customer, identify customers that are influencers, and recommend multiple friends and family members to apply for services or buy products, and so on. Understanding how information connects to each other can provide broader access points to data that may have been inaccessible or overlooked in the past.
Connecting the Dots Asking the right questions and identifying goals when evaluating software solutions are common sense; however, making sure that the right BI solution is selected is another story. Because so many solutions with similar overlapping capabilities exist, understanding key market differentiations might not be intuitive. Consequently, companies need to make sure they identify business and technical requirements first and have a strong understanding of the link between these requirements and the underlying goals. Without this link, decision makers may select solutions based on features and not on how those capabilities support business needs. The following questions can help break down the barriers that lead to some of the confusion that exists within the marketplace: 1) What are the gaps that exist within our current BI tool? 2) What do we need to bring BI to the next level? i.e. technical, new features, new algorithms and business rules, etc. 3) How have the essential BI requirements shifted? For instance, is the organization moving from historical trends identification towards operational intelligence, or is there a new focus on unstructured data content? 4) What is required to meet the new needs of the organization – business, technical, 14 • rediscoveringBI Magazine • #rediscoveringBI
and cultural? 5) What is the justification for this expansion and potential evaluation of a new software offering? These are initial questions to get BI decision makers on the right track, and that organizations can use a starting point to justify expenditures and to support the transition from traditional BI to strategic analytics. Although company requirements will differ depending on industry and purpose, overarching requirements in relation to infrastructure and goal alignment will
Lyndsay Wise is the president and founder of WiseAnalytics. She pro-
be similar. These can then be used to sup-
vides consulting services for SMBs
port broader software choices.
and conducts research into BI mid-
Share your comments >
market needs.
SIDEBAR
VENDOR FREE-FOR-ALL? STEPHEN SWOYER
W
E GOT AN INDICATION of
port; Teradata’s Aster Discovery platform,
Corp., and Microsoft Corp.) seem to be
just
transformative
which aims to address information dis-
betting on fit-for-purposive platforms as
Hadoop might prove to be
how
covery and business analyst-y use-cases;
the best response to selection pressure.
late last month, when EMC
and Teradata’s Connector for Hadoop,
In other words, not a single platform
Corp. unveiled Pivotal HD, its new pro-
which addresses the big data use case.
to rule them all – a la EMC and Pivotal
prietary distribution of Hadoop. Pivotal
(Teradata has also been an early and
HD — but the right platform for the right
HD includes a technology called “Hawq,”
active proponent of Hcatalog, a metadata
purpose.
which EMC describes as an ACID-
catalog for Hadoop.) UDA itself includes
While the ambitions of DI-only players
compliant MPP RDBMS running across
logical and management amenities (such
such as Informatica Corp. aren’t quite
HDFS.
as Teradata Viewpoint and Teradata Vital
as far-reaching, they’re no less fit-for-
Unlike Hive, Hawq isn’t an overlay or
Infrastructure) designed to knit together
purpose-y: Informatica markets a line of
translation layer: it’s an MPP RDBMS run-
its fit-for-purpose pieces into a single
data archiving- and ILM-related products,
ning on top of Hadoop. In EMC’s calculus,
synthetic architecture. Teradata isn’t
and is working to improve integration
Hadoop by itself is a New Synthesis; in
exactly alone. IBM Corp. markets a vast
and interoperability between these and
other words, EMC thinks Hadoop is flex-
middleware portfolio of fit-for-purpose
its bread-and-butter DI technologies.
ible and adaptable enough to withstand
database systems; DI assets – includ-
Informatica could credibly position its
any and all kinds of selection pressure.
ing ETL, data quality (DQ), master data
DI technology as a tissuing substrate for
It’s as if EMC asked itself “What works
management (MDM), data replication,
a would-be synthetic architecture. So,
best and why?” and came up with a kind
and data mirroring tools; several data
too, could Composite Software Inc., the
of compound answer: Hadoop – with a
virtualization (DV) technology offerings;
theme of whose 2012 Data Virtualization
big boost from Hawq. Pivotal HD is argu-
and connectivity into Hadoop and big
Day event in New York, NY was “The
ably the most audacious such strike from
data. Big Blue hasn’t unveiled as coher-
Logical Next Steps.” Composite cham-
the big data side of the aisle.
ent an architectural vision as Teradata,
pions the equivalent of a logical data
Across the aisle, Teradata Corp. with its
but it does have all of the pieces to
architecture: a virtual abstraction layer,
Unified Data Architecture (UDA) touts a
do so. Likewise for SAP AG, which mar-
enabled by its Composite Information
similar one-stop-platform – albeit one
kets a single engine for traditional DW,
Server DV platform. Composite competi-
that’s based on a very different calculus:
NoSQL, text analytic, and graphing data-
tor Denodo Technologies Inc. markets
a Teradata-centric fit-for-purpose-ness.
base operations (its HANA in-memory
similar DV software, as does Red Hat
UDA comprises the traditional Teradata
platform), along with a full DI, DQ, and
Inc.
enterprise data warehouse – the long-
DV stack. These vendors and others (e.g.,
standing linchpin of BI and decision sup-
Dell Inc., Hewlett-Packard Co., Oracle rediscoveringBI Magazine • #rediscoveringBI • 15
FEATURES
WHY DATA MODELS ARE DEAD JOHN O’BRIEN
T
IT application development, business intelligence (BI)
Living More and More Without Data Models
teams, and data management vendors, and is brought
From the programmer-centric view, accessing data in key-value
about by the confluence of several recent major trends in IT,
formats matches the objects they are loading data into for appli-
BI, and technology that are challenging the classic data mod-
cation execution, yet object databases never really became the
eling paradigm. The real debate, however, should be about
mainstream as many hoped, and the “object-to-relational” layer
how semantics should be analyzed or discovered and where
gained traction with incumbent relational databases. Primarily,
that definition should be maintained for data going forward.
it’s been the flexibility, adaptability, and speed that have driven
One major driver in this debate is the current technology adop-
many application developers to use key-value stores. This is
tion shift: the rise in data technologies such as NoSQL data
because it moves the semantic definition away from the rigid
stores – that are flexible with schema-less key-value stores
structured physical data model to the application layer, wherein
– along with the mainstream acceptance of analytic databases
a developer can control simple changes or additional data ele-
that leverage similar columnar and in-memory architectural
ments in code, then simply compile and redeploy it. Depending
benefits for BI. These technologies allow data elements to be
on the application at hand, many developers also are embracing
arranged into “tuples” (or, records based on a programmer’s defi-
document stores as their data repository.
nition) outside of the physical data model, and simultaneously
BI developers, on the other hand, have been finding value with
enable the ever increasing drive by the business that applica-
what key-value stores (like Hadoop) have to offer from both
tions be built quicker and more flexible for competitive advan-
an information discovery and analysis perspective. Once again,
tage (i.e. first to market and the ability to adapt quicker than the
when you remove the semantic definition – or, perspective bias
competitor). There is also more acceptance – realization – of the
– from data, analysts are able to discover and test and witness
fact that the business and analysts don’t always know what is
new relationships among data elements: analysts can work
needed, but want to discover what is desired
with the semantic definitions in very quick and iterative fash-
through user interactions.
ions through the use of abstracted or data virtualization layers
HESE DAYS, the phrase “data models are dead” seems to find its way into high-debate conversations with
16 • rediscoveringBI Magazine • #rediscoveringBI
above the data. Testing semantic defini-
worse yet, may misrepresent operational
tions early on in BI projects are proving
data in the DW.
to be invaluable in attaining a more
When working with application and BI
complete understanding of data quality
development teams, we have seen two
and avoiding business rules issues that
approaches (or a hybrid of these two
could be disruptive and cause significant
approaches) that work well. First, we
impact later. Finally, the interactive pro-
argued that “an order is an order” for
cess involving business users alongside
well-understood entities used in opera-
analysts and modelers is proving to cre-
tional data models, basically encouraging
ate more accurate and faster BI products,
application teams to only make parts of
similar to the agile BI process.
their data models “dynamic” where they
Can’t Live Without Data Models
As we see more applications move the semantics of data into their application layer and away from physical data models, we must also recognize that those applications are the source systems for many data warehouses (DW). If the business use of operational data sits in the application itself and not the physical database, then BI analysts and integration specialists are flying blind – or
needed them – like in sub-typing. This process would have a super-type for product, with respective sub-types, and one specialized sub-type to allow for dynamic creation of new sub-producttypes that could be migrated later to a formal sub-type. This approach satisfies the need for flexibility and speed. Second, the use of metadata models encourages
“
There will always be a strong need for a reference data warehouse”
the desire for meta-driven applications, while providing the BI team with a “key” to unlock data semantics and receive warning beforehand of dynamic data
rediscoveringBI Magazine • #rediscoveringBI • 17
18 • rediscoveringBI Magazine • #rediscoveringBI
changes. However, mostly important is the distinction that data models exist not only for an application’s use, but rather to persist data within context for many information consumers throughout the enterprise. BI and information applications not only deliver reports and information, but also support (and should encourage) ad-hoc requests and analytics within the proper context. Data models, especially in BI, are becoming a part of the data governance umbrella that govern whether data is made available to the right people, at the right time, and used properly. There will always be a strong need for a reference data warehouse. With good data governance, this data platform will enable business users to have self-service capabilities and prevent the misuse of information that could cripple an organization.
Where Data Models Are Born What is being discussed today is not really about whether the data model itself is dead, but rather how analysis is being conducted, discovery-oriented, and where the results of analysis – context – should be persisted. (Sometimes context should reside in application code that can deal with change faster, and sometimes instead in physical data models that can ensure that as many business users as possible can leverage a commonly agreed upon and proper context for decision-making in the business.) Modern data platforms balance and integrate the use of
“
Modern data platforms balance and integrate the use of both flexible and structured data stores”
both flexible and structured data stores through Hadoop and RDBMSs, but it’s the analytics lifecycle methodologies that will enable information discovery and the governance to decide whether to migrate and manage analytics throughout the enterprise. Modeling is about performing thorough analysis and understanding of the business; the resulting data models should represent the data persisted by the business in databases or virtual data layers. Key-value stores may be where a discovery process – as a form of analysis – leads to the “birth of data models,” which then can be properly persisted for business information consumers to share and leverage. Share your comments >
John O’Brien is the Principal and CEO of Radiant Advisors, a strategic advisory and research firm that delivers innovative thought-leadership, publications, and industry news.
rediscoveringBI Magazine • #rediscoveringBI • 19
CHADVISED OPRESEAR REARCHAD CHADVISED DVISEDEVE SEVELOPR ABOUT RADIANT ADVISORS R E S E A R C H . . . A D V I S E . . . D E V E L O P. . . Radiant Advisors is a strategic advisory and research firm that networks with industry experts to deliver innovative, cutting-edge educational materials, publications, and indepth industry research.
v i s i t w w w. r a d i a n t a d v i s o r s . c o m F o l l o w u s o n Tw i t t e r ! @ r a d i a n t a d v i s o r s