Big Data Insight Group 2012

Page 1

Big Data Insight Group 1st Industry Trends Report

Sponsored by:

Understanding the business benefits and strategic implications of big data


team

H E A D O F R E S E A RC H & S T R AT E GY: Caroline Boyd caroline.boyd@nimbusninety.com E D I TO R: Dominic Pollard dom.pollard@nimbusninety.com A S SO C I AT E E D I TO R: Mark Young mark.young@nimbusninety.com M E D I A PA R T N E R S H I P S : Hannah Mitchell hannah.mitchell@nimbusninety.com B U S I N E S S PA R T N E R S H I P M A N A G E R S : Please contact for details of upcoming events Milly Blundell milly.blundell@nimbusninety.com Owen Gregory owen.gregory@nimbusninety.com Charlotte Tite charlotte.tite@nimbusninety.com R E S E A R C H A S S I S T A N T: Tosin Arogundade tosin@nimbusninety.com DESIGN: Optic Juice design@opticjuice.co.uk M A N AG I N G D I R E C TO R: Please contact for sponsorship opportunities Emma Taylor emma.taylor@nimbusninety.com D I RE CTO R: Ranald Lumsden ranald.lumsden@nimbusninety.com

www.bigdatainsightgroup.com Search for us on Twitter and LinkedIn Email: info@bigdatainsightgroup.com Telephone: +44 (0) 207 960 6551 Registered company and publisher name: Nimbus Ninety Ltd Registration Number: 06803745 Registered in England & Wales Office address: 10 Greycoat Place, London, SW1P 1SB Registered business address: 16 Northfields Prospect, Putney Bridge Road, London, SW18 1PE Copyright © Nimbus Ninety Ltd 2011 While every action is taken to ensure the information within this report is accurate, the publisher accepts no liability for any loss occurring as a result of the use of that information. All rights reserved. No part of this report may be published or stored in a retrieval system without the written prior consent of the publisher.

W

elcome to the 1st Big Data Insight Group Industry Trends Report, focusing on ‘understanding the business benefits and strategic implications of big data’.

Big data analytics promises to revolutionise the way organisations gain insight and value from their data. The capability is here for organisations of all shapes and sizes to exploit the everincreasing amount of data we collect. Yet many are still struggling to realise the full value of the data they have at their disposal. This report is designed to help senior executives and decision makers to understand the potential of big data. It aims to achieve this with a range of advisory, thought-provoking articles as well as valuable insight into the state of the marketplace. Throughout the spring of this year, the Big Data Insight Group conducted its own independent research to establish just what people are doing with big data. The 300 responses came from a range of senior personnel, including individuals in business, IT, finance and marketing functions, and across a variety of different industry sectors – offering a broad cross-section of big data users. The results of the survey, with detailed analysis, are included in this report. The report also includes two columns from leading big data academics, interviews with the CTO of leading app company Shazam, Jason Titus, and David Boyle, of EMI Music and zeebox. There is also a feature on the democratisation of data, leading case studies from the social media website Tagged and the ‘data cloud’ provider doubleIQ, as well as a snapshot of how different industry sectors are exploiting the power of big data analytics.

Whether you’re already using big data tools or keen to learn how to squeeze maximum value out of the data you generate and store, the 1st Big Data Insight Group Industry Trends Report is sure to help you on your journey. Thank you to everyone who took the time to complete our survey. This has helped to form a critical piece of independent research which provides an in-depth perspective of the latest trends within the rapidly developing world of big data. Please contact us if you would like to discuss any of the successes or challenges of your own big data projects, or if you have any opinions you would like share. We’d love to hear your thoughts. Yours sincerely,

Emma Taylor, Founder and managing director, Big Data Insight Group April 2012


contents 05

The Introduction

07

The Survey

Big data is still confusing for many, cut through the hype and understand the benefits of it with an introduction to our 1st Industry Trends Report

The Big Data Insight Group presents the findings of our 1st Industry Trends Survey, with in-depth analysis from a panel of industry experts

12

The Experts’ View: David Chan and Mark Whitehorn

14

Interview One: Jason Titus – Pitch perfect

17

Interview Two: David Boyle – The self confessed data geek

Two columns by leading academics on the rise of big data and what organisations will need to do to exploit all it has to offer

Jason Titus is the CTO of the song identification app Shazam. With the app receiving over a million downloads a week, he tells Dominic Pollard how the company is now using the data at its disposal to create opportunities for expansion and new revenue streams

Having held senior insight positions in a range of different organisations, David Boyle shares with Mark Young how to make the most out of your data

19

The Feature: The democratisation of data

22

Snapshot: Industry leaders

Mark Young explains how the emergence of new avenues of data, combined with the mass availability of the advancing technologies for analysis, has seen data become democratised – providing a range of new opportunities for all enterprises, large and small alike

Dominic Pollard examines how some organisations are leading the way with big data within their respective sectors

Sponsor Case Studies

24

Social network: Tagged

26

Information management systems provider: doubleIQ

When its data analytics system struggled to keep pace with its expansion, Tagged turned to a big data solution

Having adopted EMC’s Greenplum Database, doubleIQ can answer data analysis queries up to 300 per cent faster

www.thebigdatainsightgroup.com

3


The Community of Data Scientists, Passionate About Data Science. As a non-profit organization, we are dedicated to the free, open, dissemination of data science. Created by Data Scientists for Data Scientists, we act as a forum for discussions and the exchange of ideas. To learn more, visit: datasciencelondon.org

Data Science London Where Data Scientists meet Data Scientists


INTRODUCTION

An introduction to... The 1st Big Data Insight Group Industry Trends Report

Cut through the hype and realise the benefits

T

he rise in the amount of data we store and manage has been exponential. Every day, we supposedly create 2.5 quintillion bytes of data which means that 90 per cent of the data in the world today has been created in the last two years alone. Quite simply, the data which organisations are storing and managing is increasingly significantly day-by-day. But, the question remains – ‘what exactly is big data?’ Like any new technological term there is an element of hype that must be disposed of so that the real benefits and implications can be understood. For the purpose of simplicity, the widely accepted definition of big data is ‘any amount of data which is too large for your existing IT Infrastructure to be able to store’.

THE THREE ‘V’S But big data is not just about size. There are often cited to be three ‘Vs’ to big data which distinguish it from traditional business intelligence (BI). The first is volume, as just stated, and the sheer size of the data sets is a critical

Big data is the buzz term in the business world. But what the term actually means is still a mystery to many. The report’s editor Dominic Pollard offers an introduction into 2012’s hottest market trend.

factor. However, the variety of the data and the velocity of the analysis must also be acknowledged as being fundamental aspects of its definition. Today organisations are gathering data in multiple forms from multiple sources. The rise of social media along with the analysis of images and YouTube videos have opened up new realms of possibilities to extract value from data sources that have previously been too large or complex to have been exploited. Complex, unstructured data can now be analysed and visualised in engaging ways, allowing you to make sense of data which had previously been incomprehensible to the human brain. Velocity refers to the speed at which the analysis can take place. Where previously it may have taken hours, days or even longer to garner any useful answers, big data – because of developments in the infrastructure, tools and techniques used for analysis – allows real time or near real time responses to vastly more intricate data queries.

AC T I O N A B L E I N S I G H T Big data is now recognised as a term for using masses of structured and unstructured data to gain actionable insights which had hitherto not been possible. Cheaper storage, distributed file systems which allow you to spread computationally intensive processes over potentially limitless machines, cloud computing, and advancing open source software have all contributed to big data’s emergence in the mainstream. The benefits big data can deliver, as this report will illustrate, have the potential to revolutionise a business. Through such things as improving the efficiency of internal processes, predicting trends, gaining unparalleled insight into existing or potential customers, getting an immediate and accurate view of the market, and opening new revenue streams, big data can give an organisation competitive advantage in what remain very difficult economic times. However, despite everything big data promises, it remains an enigma to many individuals. This report will examine why this is the case, explore the maturity of the marketplace and establish just whatcan be achieved through big data analytics.

www.thebigdatainsightgroup.com

5


HIGH-PERFORMANCE

Big data has met its match. There’s a better – and faster – way to get value from the constant barrage of big data that’s been hitting you. SAS® High-Performance Computing simplifies the analysis of very large data sets so you can solve complex business problems with a triple whammy of advanced analytics, processing power and speed.

SAS and EMC Greenplum are taking on big data in a big way. SAS High-Performance Analytics will soon be available on the EMC Greenplum Data Computing Appliance.

sas.com/bigdata

for a free white paper on big data

SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. © 2012 SAS Institute Inc. All rights reserved. S90205US.0312


SURVEY BIG DATA

The 1st Big Data Insight Group Industry Trends Survey

Understanding the business benefits and strategic implications of big data The survey was conducted in February and March 2012. It was completed by 300 senior Business, finance and IT personnel from a broad range of industry sectors including financial and professional services, retail, manufacturing, telecommunications, pharmaceutical, charities and the public sector. They represent companies of all sizes, from single site SMEs to global, industry leading blue chip organisations.

Fig

1

How much data does your organisation manage and store? (%)

Currently By 2015

B

ig data, although not entirely about the size of data sets as the introduction to this report outlined, has come about because of the exponential rise in the amount of data organisations are storing and managing. It comes as little surprise, therefore, to see in fig. 1 that over the next three years there is going to be a shift towards more organisations storing over 1,000 terabytes (one petabyte) of data. Of the 300 people surveyed, the number of respondents which stated that they were managing and storing over one petabyte rose from 16% currently to nearly double that (29%) in three years’ time. In essence this highlights the trend that the data we are collecting and storing is increasing day-by-day. In explaining this rise in data and the sources organisations now collect it from, Chris Roche, a senior director at EMC, says: “If you look at your individual life, you are on social media websites, you might put videos up on YouTube, you are 67% are either unfamiliar probably sending more emails with big data or are still in than you ever have done in the education process your life, and you could be using storage products like The ownership of big data Dropbox. Then you realise initiatives currently rests just how much data you’re predominantly with the IT generating on your own.” department (47%) For businesses, all of Less than a quarter of this extra data affords new respondents (23%) believe opportunities for extracting they will require new insight and value. However, personnel to execute a big David Boyle, senior vice data strategy president of insight at EMI Music and director of insight at zeebox, warns that the sheer volume of data can be dangerous. “The more data you store the more complex the systems become for managing it. It can also become slower, more costly and more confusing to analyse,” Boyle says. “We don’t have many problems storing mass quantities of data anymore, but there are certainly difficulties in ascertaining which bits to interrogate and analyse.” When establishing the status of the respondents’ big data initiatives, fig. 2 illustrates how the majority of people (83%) are actively researching, sourcing, implementing or running big data solutions – this is split between 50% of respondents who are researching big data in one form or another and 33% who have projects implemented or are in the process of being implemented. Only 17% said they were unfamiliar with what big data is. This may be a reflection of an audience who are likely to complete a survey about big data. Nevertheless, it does indicate that big data is on the radar of most organisations regardless of their size or sector.

Key findings:

www.thebigdatainsightgroup.com

7


SURVEY BIG DATA

Fig

2

What is the current status of your big data initiative?

Still unfamiliar with what big data is - 17% Researching and sourcing solutions - 50% Implementing or implemented - 33%

PERSONNEL When asked whether or not their organisation would require new personnel to embrace big data strategies (fig. 3), the most common selection by respondents was that they were ‘not sure’ (46%). This is indicative of the relative immaturity of the market. Only 23% stated that they believed they would need new personnel whereas a significant proportion (31%) appear confident they can execute a big data strategy with their existing staff. Graham Oakes suggests that this is because people are doing “techled pilots and that IT people are the ones who are experimenting with the technologies.” As such many are unaware if they will require new personnel for a business focused strategy. “Everything I am seeing with big data,” Oakes says, “shows that you will only start to see real benefits when you get collaboration between the technical people, the data scientists and the business functions – this is when you can start asking sophisticated business questions and get a solid understanding of how you can get answers to the right people.”

Fig

4

Examining what the main barriers are

to big data However, as Graham Oakes, business process efficiency expert and principal at Main barriers by industry sector Graham Oakes Ltd, says: “Many of those Retail Public sector Financial Professional services services who are researching are actually in the early stage of working out what is meant Lack of relevant Lack of Lack of relevant Costs skills understanding skills 35% by big data rather than researching 49% 56% 42% specific projects.” Lack of Compliance Lack of tools and Costs David Boyle reiterates this point as technologies 52% understanding 33% he states: “I would clump the 50% who 39% 40% are researching with the 17% who said Costs Lack of relevant Lack of tools and Lack of they were unfamiliar with big data. In 36% skills technologies understanding 28% 51% 37% reality this says to me that 67% are unsure about big data and are still in Chris Roche stresses that it is ultimately going to be more a question of the educational process.” This could skills than personnel. He says: “In terms of how you get these mean that the big data market is not new skills there is a myriad of different models. You as advanced as the graph may seem to can train up your own people – and there are suggest at first glance. Fig initial training programmes around data Nevertheless, fig. 2 science emerging – or, alternatively, you demonstrates that although can recruit new people. Otherwise, many organisations are you could also get those skills from still at the start of their Will/did your organisation a managed service provider. big data journeys – require(d) new personnel to Organisations will need new something which is execute your big data strategy? skills but it could be their not surprising, given current staff that is retrained that big data only to deliver them.” This may recently emerged prove to be a popular as a mainstream approach; many analysts business tool – the predict that we will face a growth of interest in Yes - 23% shortage of individuals with the subject has been No - 31% relevant skill sets (discussed on rapid. Roche believes Not sure - 46% page 13) as demand increases. that the results here Indeed this is supported show that “many are by fig. 4 which examines the seeing it as important to perceived barriers people have to their organisation, whereas adopting big data. ‘Lack of relevant only a few months ago it wasn’t skills’ and ‘lack of understanding’ are even a term people had heard of”.

3

8

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


SURVEY BIG DATA

Fig common across the industry sectors. The fact that the retail and financial services sectors cite skills as the biggest issue suggests that they are further ahead with their big data initiatives and have now got to the stage where the skills and personnel they have within their organisation are preventing them from achieving maximum value from their data analytics. As Oakes maintains: “The further on people get, the more they will realise that skills are the issue.” David Boyle believes that ultimately the biggest barrier to big data across all industry sectors will be a combination of both personnel and skills. He says: “There are few people who have the skills to understand, shape and run the whole project. People need to ask the right questions and then have the ability to identify, gather, store and analyse the relevant data to get the answers to these questions.” Graham Oakes’ suggestion that those who are experimenting with big data are running predominantly technology-led IT projects is supported by fig. 5. Nearly half (47%) of respondents said that big data analytics lay within the remit of the CIO or IT personnel of their organisation. The other 53% was divided among various departments and job roles including marketing, business intelligence (BI) personnel and data architects and scientists. This certainly indicates that it is IT departments who are having a strong influence on the adoption of big data by being the first to embrace the new tools and technologies. David Boyle expects the responsibility to shift toward business departments over the next year or two – something he believes is essential if organisations are to get the most out their data. “As businesses begin to understand and embrace big data you can expect to see a more even spread of responsibility across the business functions,” he says. “The IT department might be the first to hear about the technology and test it, but when the benefits of big data become clear to the organisation the projects will become properly organised and you will see other personnel starting to get involved.” Chris Roche supports this view as he emphasises: “The future of analytics is not any one person’s remit. Everybody needs to be involved – the data scientist, the data architects, business people, marketing and IT. It isn’t just one department either; to get the value out of your data it has to be a team sport.”

5

Who is responsible for owning big data analytics within your organisation? Marketing

9

CTO

10

BI personnel

15

Data scientists or data architects

15

CIO or IT personnel

47

DRIVERS The opportunities which big data presents to an organisation differ depending on various factors. The sort of data they have at their disposal and the value they wish to extract from it will be dictated by an organisation’s business objectives. Fig. 6 shows how the driving factors to adopting big data vary by industry sector. What stands out is both what the different sectors want to achieve from their data analytics but also the different stages of maturity each sector finds itself in.

Fig

6

Examining what the main drivers are for big data Main Drivers By Industry Sector Retail

Public Sector

Financial Services

Professional Services

New insight into customers 64%

Better planning and decision making 78%

Extract more value from old data 68%

Extract more value from old data 67%

More targeted marketing campaigns 57%

Extract more value from old data 76%

Competitive advantage 51%

New insight into customers 57%

Better planning and decision making 46%

Competitive advantage 52%

Better planning New insight and decision into customers making 62% 53%

The retail industry appears far more mature as the majority of respondents from organisations in this sector stated that they wanted ‘new insight into customers’ (64%) and ‘more targeted marketing campaigns’ (57%). Graham Oakes feels that this shows that “retailers are asking the right questions”. “This is an area where you can see ROI because if the effectiveness of a campaign goes up by two per cent then you’ve proved its benefits,” he says. The public sector, the findings would suggest, is at an earlier stage of its big data journey as ‘extracting more value from old data’ features prominently (76%). This is indicative of a desire to adopt new methods as

www.thebigdatainsightgroup.com

9


SURVEY BIG DATA

T H E TO O L S A N D T E C H N O LO G I E S Big data is not, of course, just about the rise in the amount of data we have, it is Fig about the ability we now have to analyse these data sets. It is the development in open source tools and technologies, including such things as Distributed Files Systems (DFSs), which deliver this ability. The survey findings (fig. 7) illustrate that people are beginning to adopt these new tools, What tools does your organisation again supporting claims that IT departments are experimenting with use to perform data analytics? the new technologies in the early stages of their big data journeys. Half of the respondents said they are using big data tools within their organisation, either exclusively or in combination with their existing business intelligence tools. The other half of organisations are still solely using traditional and existing BI tools to perform their data analytics. The impression EMC’s Chris Roche gets from these results is that people want choice. He says: “When it comes to tools, I think people want to be able to use any analytic tool in their kit bag, whether they’re writing code or using an analytics partner or some form of SaaS, they want to have the choice. Existing BI tools - 50% “It comes down to whatever the best tool is for the job in hand. I am a New big data tools - 15% great believer that if you’ve made an investment in something and it does A combination of existing BI the job why would you buy a second tool? People want to make the most out and new big data tools - 35% of their investment and big data should not change that.” David Boyle states that it is important not to become overly fixated on the technology and urges that a pragmatic, business focused approach remains of a means of getting the maximum value paramount importance. He believes: “If you’re not disciplined about the questions from existing data and assets which you ask at the beginning of the project and you’re not careful about how you have hitherto not been exploited. analyse the data then you will just use new tools to throw masses of data at. Alternatively, it could show that many “You have to ask the right business questions and you do that by being smart organisations are in an experimental about which data you use and how you analyse that data – you can’t just implement stage of a project in which they are new tools and expect them to act as a silver bullet.” using old data to test new tools, Currently over half (59%) of respondents stated that it takes them hours or even technologies and techniques. longer to get responses to their data analytics queries (fig. 8), although these David Boyle states: “Both fig. 4 and queries are not necessarily big data ones. Only 19%, meanwhile, claimed to be able 6 illustrate that people aren’t taking an to get real time insight from their data. active or thorough enough approach to Graham Oakes is confident that as testing progresses further we can expect to their data analytics. This is not a surprise see the time it takes to get responses drop. He says: “This shows that people have but just highlights that this is a new still got projects in pilots and the figures are a reflection of the market which people are trying time it is taking for them to find the data they want and Fig to get to grips with.” to load and analyse it. But I would expect that people are going to want to get more of the responses to their data queries down to minutes at most.” Typically, how quickly do you get For Chris Roche, speed is only responses to your data queries? (%) relative to the task in hand. He says it is important to acknowledge that there are situations, such as performing analysis for annual reports, in which you can afford to wait hours or even days to perform analysis on data sets. He believes people need to ask ‘if you got quicker responses, could that time differential help your company perform more successfully?’ “If the answer is yes,” he states, “then you have a case for

7

8

10

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


Fig

9

SURVEY BIG DATA

Has your organisation yielded tangible benefits from big data analytics?

Yes - 20% No - 39% Not sure - 41%

adopting new big data tools and technologies.” Far greater speed of responses is one of the key, although not defining, characteristics of big data. The fact that less than half of respondents (41%) can get responses to their data analytics in minutes or less supports what we see elsewhere in the survey results – people are still in the very early stages of trying to understand how they can exploit big data tools. As such, they are yet to realise the benefits it has to offer, namely real time insight into data through the use of DFSs and intelligent visualisation and dashboarding. All the survey analysts were confident that as people know the ability is there to do this we can expect to see more and more organisations achieve much quicker responses to their data analytics, if they deem it is something which would add value to their business processes.

P R OV I N G T H E R E S U LT S The immature state of the big data market can be seen in fig. 9 which demonstrates just how few respondents (20%) have seen tangible benefits from big data analytics within their organisation. This could be the result of a number of factors. Firstly, as established, there are only a small number of organisations who have big data projects implemented from which they can receive any benefits. Secondly, it illustrates the nature of the projects which people are currently implementing. If they are, as suggested, executing technology pilots as a means of familiarising themselves with the new tools that are available then tangible benefits to the organisation are not yet the objective. More so, it is to gain a better understanding of the technology. Therefore, the lack of business focus to big data projects makes it far more difficult to realise any business benefits. Furthermore, only 14% of the respondents who stated they had big data projects implemented said they are yet to see positive return on investment (ROI) from them (fig. 10). As Graham Oakes says: “This all suggests people are struggling to show benefits from projects because they are in the early stage of implementation.” Chris Roche adds: “There are so many people saying ‘no’ and ‘not sure’ in figs. 9 and 10 that this would suggest that they didn’t have a clear goal when they began their projects, while many have simply not started any big data projects yet. But for those who have but aren’t seeing any

benefits you must ask ‘do they have clear business benefits they are trying to achieve or are they just playing around with the technology?’” Only when the market matures and the people with the relevant skills from a range of business functions begin to play a role in asking the right questions will people begin to realise the potential advantages big data has to offer. There is a risk to this approach, though, as Oakes surmises: “With a tech pilot your aim is to understand the technology but it is hard to prove the business benefits from that. Hopefully people will come out of this first round of piloting saying that they understand the technology, they know more about what skills they’ll need and can then set up some business focused pilots. Then the second round of pilots will offer far better results. “The worst case scenario is people come out of these pilots without being able to show clear ROI and they then think ‘big data is not for us’.” Reflecting on the results of the survey, David Boyle says that at the moment the state of the market reads “like a story of missed opportunities”. “Big data should and can be quick, cheap and powerful when done properly,” he emphasises. “If the right people can ask the right question then people can start sifting through all this data to choose the correct, clean parts of it and analyse and display this in an interactive way.” The findings of this research suggest that the market, though rapidly evolving, is still an immature one. There may still be issues with defining big data and beyond this uncertainty remains over the skills, tools and personnel required to execute a successful big data strategy. With many trying to educate themselves in these Fig areas we can expect to see the market changing dramatically over the Has your organisation coming months seen a positive return on and years as investment from your people gain big data project? a far greater understanding of what the benefits of big data are as well Yes - 14% as best practice No - 26% tips on how to Not sure - 60% overcome the obstacles to realising them. What shines out from these findings is that big data is not simply about new tools and technologies to deal with increasing amounts of data. It is about taking an intelligent approach to using that data to answer clear, predefined, business orientated objectives from which an organisation can reap the rewards.

10

www.thebigdatainsightgroup.com


COLUMN DAVID CHAN

THE WORLD IS CHANGING

– our mindsets must change with it

B

The rise in the amount of data we are generating and storing has been meteoric of late. But while the world of data is developing rapidly, the ways that we manage it are not keeping pace. David Chan says our paradigms need to change.

A column by David Chan, director, Centre for Information Leadership, City University London david.chan.1@city.ac.uk

12

ig data has arrived; there can Unfortunately, these small changes be little doubt about that. Its are becoming increasingly common arrival has come not only with as our data sets get bigger and a meteoric rise in data but with bigger. Furthermore the sources the advances in the tools and that we now capture all this data technologies we use to analyse from consist of complex, mutually it. However, technology is never interacting systems and this makes a silver bullet and before we can reap the benefits of big them prone to feedback. We think data we must change our mindset for how we approach it. of the problem as ‘noise’ – if you’re Paradigms are the way we make sense of the world. listening to a conversation and then They are the perspectives that we use to understand what somebody else talks over them, it’s is happening around us. But, of course, if we don’t have the easy to mistake what the person right information or enough of it, our understandings can you’re speaking to has said. be built on falsehoods. For instance, the ancient Greeks This ‘noise’ is especially common understood thunder and lightning to be the god Zeus now that we are using unstructured throwing his thunderbolts. They didn’t know about the data. For instance, if we look at interaction of warm air, water vapour and static electricity. surveys or polls, to what extent are The prevailing paradigm for data analytics is static the answers affected by the way modelling – the search for rules that explain variation in the respondents wish to appear? data. We examine randomised data sets Although we can account for and look for patterns between the identified factors, how do Technology is key points of difference. If we can we know there are not never a silver find these same patterns in other other factors that have bullet and before significant impacts? The data sets then we know we have a model. We then use that model we can reap the effects we are trying to make predictions. A ‘real world’ benefits of big data to predict may well be example of this can be seen in the we must change swamped by way that banks weigh up the risk other signals. our mindset of giving somebody a loan. They Therefore we need to analyse the applicant’s financial change the way that we history and lifestyle and try to see whether the patterns fit manage our data and the paradigms somebody that usually meets their repayments on time or that we use to understand it. New somebody that defaults. technologies like Complex Adaptive Static modelling like this is fine, in theory. Indeed, it Systems are specifically designed to works perfectly well when there are fixed relationships help with noisy data. But without the that are discoverable through detailed analysis. But what right approach, they aren’t enough. happens if what we are seeking to model does not work We need to use big data analytics in the same way? as a more accurate means of adapting The ‘butterfly effect’ is common in mathematics. This is a system to cope with the data being where small changes in the original conditions have hugely received from these constantly disproportionate effects on the outcome. So if you went evolving markets. It’s new, and not back in time a thousand years and changed something generally well known, but it is the very small, the world could potentially be a very different answer. The data for extremely inplace today. In data analysis, the effect can often be that depth insight has arrived, and so have the model we use to gain reliable information out of our the tools; we now just need to change data no longer works when applied to the rapidly evolving our mindsets to unlock the value they environment around us. can offer.

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


COLUMN MARK WHITEHORN

NEW SKILLS FOR THE NEW TOOLS

T The age of big data is upon us. In order for organisations to revolutionise their data analytics they will need the right personnel. Mark Whitehorn explains.

A column by Mark Whitehorn, professor and chair of analytics at the School of Computing, University of Dundee

he idea that business intelligence terms of assembling a team rather (BI) solutions have to be tailored than seeking all of these skills in one to a business’ requirements is individual. old news – every organisation is Big data is a new field and not yet unique and as such their analytics well understood in the way that, say, strategies must be customised. transactional data is understood. The same is true for the effective As you read this, organisations in a handling and analysis of big data, only more so. range of sectors, private and public, Hardware, software and personnel are the three major are working on the fundamental components. Obviously you need the kit to hold the research that will crystallise into big data and the software to analyse it. Exactly what new techniques and applications for this might be will depend on the data you are going to handling big data. In ten years’ time it work with and the kind of analysis you want to perform. will be obvious exactly how big data However, a much greater challenge is likely to be finding was going to be exploited but it’s far the right people to execute your big data analytics. from obvious now. It is easy to think of big data However, even if we are as completely separate from not yet exactly sure how to normal data, that being extract the best results, it’s The challenge smaller sets of structured data. already abundantly clear is likely to be After all, we’ve given it a whole finding the right that the greatest rewards new name. In reality, big data go to the pioneers – of people to execute will is almost always cross-analysed whom there are broadly two your big data with existing tabular data. So, types. There are the early analytics in order to make the most adopters of new techniques of big data analytics, you’ll and applications as they are need all the usual suspects in developed, commercialised terms of BI skills - a good understanding of relational and and packaged. They will do well. multi-dimensional design and structures, programming But the true pioneers are the ones languages for databases, and information presentation who blaze the trail, who develop skills. Excellent communication skills are also high on the the techniques – the richest rewards list, both verbal and written. will be theirs. The challenge, difficult So, that’s the BI side covered, but what else should you though by no means impossible, have on your big data skills wish-list? A good background is to find those people with these in data, statistics, data mining and algorithm design is also skills and experience to enable your going to prove essential. organisation to reap these rewards. The statistics are vital because we need to know, not Ultimately, we need people who only whether two sets of data are different, but whether are switched on, clued up and able to that difference is really significant. Data mining covers a see opportunities where no-one else collection of techniques for finding patterns in data and has seen them. Can you, for example, can reveal information hidden deep in a sea of data, such analyse Twitter data and spot potential as clusters of customers with similar buying behaviour. customers who don’t even know they In addition you need the skills to deal with the particular need your product? I don’t know, technology you intend to use, likely to be a NoSQL system and in all probability you don’t either, of some kind - perhaps MapReduce or Hadoop. All of and there’s no formula to tell you these relatively specific skills should be coupled with whether it’s even possible. However, experience and, last but certainly not least, imagination. the skills, attitudes and aptitudes listed Such is the range of skills and experience required to above, coupled with a good dose of execute big data analytics, it makes sense to think in imagination, will let you lead the way.

www.thebigdatainsightgroup.com

13


INTERVIEW JASON TITUS

Pitch perfect For a decade now Shazam, the song identification mobile phone app, has been settling bets, discovering new artists and unearthing guilty pleasures for its users. With over a million and a half new users each week, the company’s recent growth has been exponential and its big data analytics have revealed a range of new possibilities for expansion. Dominic Pollard visited the company’s London headquarters to speak to its CTO Jason Titus and find out about the magic behind the technology. 14

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


INTERVIEW JASON TITUS

With over 1.5 million new people using the Shazam app every week across all of the major mobile phone operating systems, it’s fair to say that the company is hitting all the right notes among its user base. And with all that activity comes masses of data, which Shazam analyses to make the product even better while opening up entirely new revenue streams. Titus explains Shazam’s technology in layman terms. “To simplify,” he says, “the app ‘listens’ to the frequency and amplitude of the music, finding interesting points within the music to create a unique signature and then sends that to the server to be compared to all the records of songs we have. “It’s highly optimised; we’ve spent a lot of time trying to make what is a computationally intensive thing very time efficient. We are resistant to noise or pitch shifts. On the radio, for example, the music is up to five per cent faster or slower. We have to account for this in the core algorithm.”

What’s hot, what’s not?

The data that Shazam generates and analyses is of great importance, too. Shazam’s song-tagging charts are an accurate precursor to the US Billboard charts a few weeks later. “There are a lot of people who want to know what the latest musical trends are,” explains Titus. “We provide regular updates to record labels on what is being tagged and at the end of the year we do a report hazam is a mobile phone to give our predictions for up-and-coming bands. These application that allows the user are commercial arrangements that we have instigated. to find out the artist and name For the partner it’s about seeing who’s had a trajectory of any song they hear. Within which would suggest they are going on to big things.” just a few seconds, that nagging Through the collection of billions of geographic question of ‘oh, what is tags, Shazam can also identify what is popular this song?’ can be in a specific area and can therefore answered at the touch of a button. CTO analyse what particular subcultures Jason Titus says Shazam is are emerging around the world. “like a unique, magical experience” Shazam’s partners can then use – in reality, the service is built on this information to tailor their We provide regular the innovative use of data offerings. The company can updates to record labels on and technology. also report back to advertisers, what is being tagged and at the music promoters and other Released back in 2002, the end of the year we do a report service reached one billion media producers on how ‘tags’ (the term the British born successful their musical to give our predictions for up and based company uses when accompaniments have been. and coming bands someone ‘Shazams’ a song) nine So, for instance, if a band Jason Titus, CTO, Shazam years on. The second billion arrived performs on a talk show Shazam less than 12 months later. Product can report in near real time how development, enabled by big data, many of its users tagged the song has also increased considerably since and feed this information back to the start of the decade. The most recent the show, helping advertisers define the advances include the introduction of real time audience and its inclinations. play-along lyrics, a halving of the time it takes to identify Titus and his colleagues are now looking to expand a song and the ability to connect the app with Facebook the product further into advertising as well. Examples the and share tags with friends. Once a song is tagged, CTO gives include a car advert or a movie trailer. If you the user is also now given options including watching a were to ‘Shazam’ one of these adverts, the app would related YouTube video or buying the song on iTunes. be able to identify what you were watching and provide

S

www.thebigdatainsightgroup.com

15


INTERVIEW JASON TITUS

and, in turn, the company gets a steady influx of new tools and ideas across the entire development team. “Demo Day is something that’s been particularly successful and the developers seem to like it a lot,” Titus says. “Over the last year there have been many product improvements which have come out of it. They might not have been on our roadmap but from what they’ve done we have gone away and said ‘we should really look into doing that’.” The app itself is, of course, technologically rich. But beyond the wizardry that makes sounds ‘Shazamable’, the company is also harnessing other progressive technologies, including cloud, in a We are definitely a wealth of other ways. business in transition. Approaching new tools We are starting to do some and technologies in an very interesting analysis open and experimental way has been an integral with real time tools part of allowing the Jason Titus, CTO, Shazam company to exploit the true value and insight from the user with additional information, release the data it collects. dates and purchasing options for that product. “We are definitely a business The company has already covered 50 advertising in transition,” Titus expresses campaigns in the US and is looking to become even openly. “We are starting to do more active in this arena. some very interesting analysis with Clearly then, Shazam is building an increasing influence over real time tools. Going forward the development of the music and media industries, and, thanks to its data, we are evaluating which ones we its opportunities for expansion and monetisation within horizontal markets will use – whether that’s Hadoop are vast. It is a shining example of how you can take the data you have at your or a mix of other tools. Our pace disposal and use it to elevate the company to new heights. However, Titus of development has improved stresses that making a great user experience remains the company’s absolute dramatically over the last year, in part priority. The predictive data that is collected inconsequentially is an added because we have started to leverage bonus which they are able to use analytical tools to exploit – to both Shazam much more of the open source code and the music industry’s benefit. that’s out there.” Only through clear business objectives and the right tools to support them can the company get Shazam uses an initiative called ‘Demo Day’ to help it find new technologies, such a range of people across the an initiative that Titus carried over from his previos role; for the six years globe singing from the same song previous to joining Shazam in January 2010, Titus worked as vice president of sheet in such perfect harmony. The communications at Yahoo!. He left after coordinating the development of the direction and focus of the company, new unified Mail and Messenger experience across desktop. Demo Day was enabled by the intelligent and a particularly popular technique the American-based Internet giant used and strategic use of its data, is assisted Titus has taken across the pond to Shazam’s London headquarters. by that sprinkling of magic that The initiative allocates each developer 15 per cent of their working week underpins the whole operation. to explore new technologies and experiment on their own projects. Then, Thanks to this, Shazam has ensured once every couple of months, the developers present what they’ve created that it won’t be hearing the dulcet to the rest of the company. Whether it’s an improved user interface or an tones of the fat lady singing any optimisation to the app’s backend, the initiative lets the creative juices flow time soon.

An orchestra of harmonic technologies

16

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


INTERVIEW DAVID BOYLE

Howdy PARTNER DAVID BOYLE:

D

Having held senior insight positions in a wide range of different industries, David Boyle has learnt a few things about the best ways to use data. He talks to Mark Young about the avid Boyle is what you might call a data fanatic. importance of partnerships, At the moment he holds two jobs: senior vice scalable infrastructure president of insight at the record label EMI and and transparency. director of insight at start-up company zeebox

– creators of an application which combines TV viewing with live data about the show and social media. With Facebook and Twitter functionality built into the application, users can comment on the TV programmes they watch and also see what their friends are watching. But the really smart bit is that the application brings up a host of hyperlinks for more information on a myriad of different things that are happening at that very moment. And its scope is boundless. Say, for instance, a soap storyline is playing out, the user might be presented with links to the actors’ profiles, more information on the discussion points they are engaging with or the music playing in the background. zeebox can monitor users’ ‘journey’ through these links, the comments they make on the social media sites, and their overall TV viewing and engagement patterns. Yet while many companies would look to use this information solely to tempt advertisers onto the application with increased demographic insight, Boyle sees wider opportunities. “We look to set up partnerships with people

that can get actionable insight from the data,” he says. “That might be an advertiser, but equally it might be a charity. Someone like an aid agency would, for instance, stand to benefit greatly from knowing firstly where a crisis is featured within television shows and secondly how different audiences react to the information or portrayal.” The premise, like the dashboards zeebox creates for presenting the information, is the same for any type

www.thebigdatainsightgroup.com

17


INTERVIEW DAVID BOYLE

of organisation – advertiser, charity or anything else alike – and that means the process is easily replicated for multiple different partners and projects. “The only difference is that some organisations are going to pay a premium for that information and others aren’t,” says Boyle. Other than that, Boyle is coy when it comes to talking about the commercialisation of this data. Instead of cold hard cash he prefers to bring the conversation back round to one key word: ‘partnership’ – the modus operandi that he sees as the only truly You don’t get much effective way to operate with data. value from data unless you At any rate, “I’m not sure that individual journeys (without really understand it and you personal details) through Spotify. direct monetisation is the correct thing to do with data,” he says. Unlike the label’s executives know how to make use of it “Partnerships to make use of the that preceded him a few years David Boyle, senior vice president of insight, data in the right way are a lot ago, Boyle rejects the notion that EMI and director of insight, zeebox more productive. It’s far better to digital music channels result in help people understand and draw fewer revenues. “The Spotify model conclusions. Then they’ll have a lot more engages people with music and they benefit and they will then help us, whether then go out and buy more of it and a that’s with promotion, advertising or money bigger range than they would have before. itself. You don’t get much value from data unless And we have the data to help them find things you really understand it and you know how to make use they want to engage with. Everybody wins.” of it – a cold data transaction won’t yield anywhere near as much value as a true partnership.” Boyle also advises that data projects need to be scalable This ideology certainly served him well at EMI. As little and repeatable. “All too often insight is a slow process,” as five years ago the music label was struggling. Like he says, “and the project is a big one-off with only certain most of its peers, EMI initially resisted the effects of the people and departments involved. Every time you do digitalisation of music instead of creating new revenue something there will be big set up costs.” streams in this evolving environment. A high profile Boyle’s team at EMI sends out packs of research results takeover by a venture capitalist firm was seen by many as to a wide range of people around the business every a last-ditch effort to save the company and a long shot day. These are also available to all staff everywhere in the at that – after all, what do a bunch of suits know about world online. Included is a wealth of different information running a record label? How to stop one going under, relating to lifestyle, attitudes, media engagement apparently. Terra Firma introduced a culture within EMI “in the widest possible sense”, affinities, songs, price where data is at the heart of every decision. It was then expectation, reactions to different versions of songs and Boyle introduced partnerships. anything else that any individual around the business “When I joined in 2009 we weren’t really talking to has asked for. “If you’ve got enough scale you can ask consumers and weren’t using them to help us make anything you want in research,” he says. decisions,” he explains. “We were collating data and then His final piece of advice is that companies need to sending that to the business decision makers and telling ensure that they aren’t ‘creepy’ with people’s data. them how things should be done. So we reinvented it The best way to do that is transparency – not selling from scratch with a focus on buy-in and partnering with personalised data, only sharing trends, being open about the business – helping them to make decisions based on where data is going and who it is going to. their needs, rather than tell them they are making wrong When you get good at working with data, there may ones based on the data.” be a cross to bear. Making visualisations for data is Boyle’s If EMI was slow on the uptake of Internet music trends favourite part of his job but it’s also the least important to begin with, it is fully on board now. Take the popular and he now gets to do it less and less. Still, with untold song streaming service Spotify through which EMI, like levels of data to play with as zeebox continues its assault the overwhelming majority of record labels, makes all of on the social TV market with its partner-led paradigm, its recordings available at practically no cost to the user. there should be enough to keep his love affair with EMI does get a monetary return for this but, perhaps consumer insight alight for a while yet. more importantly, it garners swathes of customer trend (An extended version of this interview is available data. In the same way as Google can create profiles at www.thebigdatainsightgroup.com) for the people using its search engine, EMI can track

Rinse, repeat

18

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


FEATURE DEMOCRATISATION OF DATA

Data is now available on a mass scale to all enterprises, large and small alike. Those that make the most intelligent use of it to piece together the puzzle will thrive, not just survive. Mark Young explores.

O

ur world has been enriched by data in innumerable ways. However, while its effects are far reaching, there has only been an exclusive list of companies who have been at the forefront of big data. Up until now, complex data capture and analysis has been the reserve of governments, academics and mega rich corporations. They alone have had the resources necessary to implement the intricate and expensive infrastructure big data requires. But things are changing. Data is becoming democratised – a wealth of it is now available for one and all. For instance, with user-interface-friendly web analytics tools, many of them available for free, anyone running a website can review the hit rates and demographics of their audience. Moreover, they can analyse what users do during their visits, how long they stay, what they click and where they move onto next. In the same vein, there’s a host of social media monitors that can report exactly what’s being said about a company by its audience across the online ‘chatosphere’. Furthermore, we are now in the age of the ‘Internet of things’. Countless everyday objects are now equipped with radio frequency identification (RFID) tags and wireless Internet technology. This allows them to continuously log data to a

www.thebigdatainsightgroup.com

19


FEATURE DEMOCRATISATION OF DATA

CHALLEN GE: Size

With the fl oodgates op en, you ca from havin n quickly g no data go to finding ocean of it yourself in . This can a n re nder your productive. efforts cou Because yo nteru need to time sortin spend so m g through uch the data yo u’ll never get any value from it . A strong p lan of actio n that is li business ob nked to jectives is essential. “What sets companie s like Tesco Bernard M apart,” says arr, “is tha t while they tonnes of data, they may have use less th of it. What an 10 per they do use cent is tightly a their strate ligned to g y. They sp en d a lot of ti themselves me asking ‘what do w e need to k now?’ befor they go off e and drown in it.”

network on their performance and their surroundings – data that can be used for wider contextual understanding. It’s not just their own data that a company can benefit from either. Many other organisations, including government bodies, are making their data publicly available in the interests of transparency. In fact, the UK government has appointed a Public Sector Transparency Board with an agenda to make all public data accessible and open for commercial use, in any lawful way. Part of this is the midata scheme – a voluntary set of standards with 26 corporate founding signatories including the likes of British Gas, Google, Royal Bank of Scotland and Visa, which includes a commitment to sharing data for commercial gains. This is actively encouraged as part of the government’s private sector growth agenda. With data sources markedly increased, the democratisation is furthered by the mass availability of low cost technologies for working with the data. The advent of cloud-based computing and storage, complemented by distributed file and computing systems like Apache Hadoop and Google MapReduce, which spread computational tasks over potentially limitless processors across the globe, means that elaborate mathematical queries can be carried out in near real time on a pay-as-you-go pricing model, ostensibly opening it out to the masses. For John Paul Tointon, managing director of fast growing cloud recruiter JPE Recruitment, the rising mobility and interoperability of data is driving democratisation. “Now you can enter or monitor data within the palm of your hand,” he says. “Since the 1980s people have been talking about the next “killer app”. Now we are going to see the “mega app” which will allow people to automate the entry of data across all platforms, filter relevant data, and automatically disperse it to the right people in the organisation so they can action it. Now we are seeing that come to fruition.”

20

Widening the scope

The advances in big data allows any business of any size – rather than just those that can invest millions into it – to make decisions based on facts, rather than perceived wisdom. It therefore offers competitive advantage to those that embrace it. As well as improving processes, data allows companies to introduce completely new revenue lines and even begin to influence the shape of the vertical industries it touches. Shazam and zeebox, as you will have read in this report, are perfect cases in point. These companies partner with others that can benefit from their data by understanding what users react to and how they converse about it.

ntextalso offer you false just o C : E G LEN at your fingertips colwualdys find data to provoeu CHAdL ata at y na ance of d s. You ca ensure th

on ust An abun r convicti ut you m ce in you make, b to t n a confiden w ou . y point y bjectively about an hnolog y nalysis o a r u o y ation tec h m r c a fo ro in p ap ys that ader in ide, a Re ersity, sa r iv B n c U M t context r il Dr Ne Montfo sing the e s D e t s s a a t n e ful in managem st be care tions mu a is . n g ide. a in rg s o re u Dr McBr ta they a on,” says ti wn ta o r re of the da p u r xt of o e is inte the conte led ical issu in t it r h a c it th e w s h “T things reason d e n th ta r d rs n e s ntion fo ndersta “We und their inte have to u d in n u o a ta Y r . ta e a c re port cultu their d at data stance, re to create th in y t r d u fo o b b t, e ts h som g targe ital mig n p ti s e o other e h d m A e . they are isregard t d doing so a e v th a ns based h re ensu or may ic decisio g te te ra a u c tr figures to s c ya to kes be strictl iness ma You need may not . If a bus neously. rs ro r to e c o fa s g g n oin extenuati may be d g may be ata, they d t .” a re th tu n erstandin o ic d p n r u e l eed, a id c the w port. Ind analyti consider in this re d of deep s e in million k in .5 t tl 1 a u ately, th rtage of ehorn o o n it h h to s tu r W a fo e rk n U s Ma will b ig data in supply, a to turn b 018 there 2 ls il y k b s t y a in short r th ecessa y predicts ith the n McKinse nalysts w a d n a managers on. informati strategic

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


FEATURE DEMOCRATISATION OF DATA

Data is an extremely valuable commodity – there will be plenty of takers should an organisation wish to commercialise the data it captures, though there are legal and ethical implications that need to be considered.

No turning back

Now, with the wheels fully in motion, many believe that intelligent use of data is fast turning from an opportunity to steal a march on rivals to an obligatory practice simply to keep pace. “There won’t be any industry that stays untouched by this,” says Bernard Marr, an organisational performance expert who has worked with a string of the world’s leading corporations, including AstraZeneca, HSBC, and Royal Dutch Shell. “It will bring enormous changes to every facet of our business and social behaviour, and any company that doesn’t embrace data – from the billion dollar company to the individual person in a shed – risks becoming uncompetitive, or even irrelevant, very quickly.” How big could the returns be? McKinsey, the global management consultancy firm which has been one of the key proponents of research into big data, found that US retail firms using big data to its full extent can increase their operating margins by up to 60 per cent. Meanwhile, European governments with developed economies can save up to €100bn (£84bn) in operational improvements alone.

How equal is equal?

The democratisation process is not complete, though. Clearly large companies still have more resource to work with; the skills issue described earlier in this report being particularly pertinent. One answer to this could be crowdsourcing, something in which the US company Kaggle is taking a leading role. The organisation runs

competitions among its network of PhD level data scientists with cash prizes available for finding answers to real world problems put up by enterprises. A Kaggle spokesperson described its role in the democratisation of data: “Competitions are not only a faster and more accurate way to analyse data, they’re also much cheaper. For less than it costs to hire a single data scientist for, in some cases, just a couple of months, a Kaggle competition gives access to around 30,000 number crunchers. “And not just any number crunchers - we’ve had glaciologists make breakthroughs on problems in astronomy, English majors make breakthroughs in HIV research and physicists predict which used cars are good buys and which aren’t.” Still, with current big data leaders focused chiefly on consolidating their positions and investing heavily in data in order to do so, nobody is suggesting that market oligopolies will become a thing of the past. However, the most successful companies that use data do now have an opportunity to become the new dominant forces, and even revolutionise their industries. “Small companies that succeed with big data strategies are likely to become big companies themselves, very quickly,” says Marr. “Google and Amazon are a perfect case in point.”

CHALLEN GE: Creati vity

Recently, the Big Da ta Insight CTO of Min Group inte d Candy – rviewed Tob the London created th y Moore, ‘Tech City e global ch ’ company ildren’s so Monsters. that ci a l network cr Moore voic aze Moshi ed concern opt for tota s that com l strategic panies wh reliance on their creati ich data do so ve output at the cost and the ‘h pointed to u of m a online gam n elements ’ involved. es compan with repet He ies that cr itive psych eate conte ological ho nt rich oks instea d of entert as an exa ainment mple. As Bernard Marr says , “data shou oracle. Big ld be an en data shou abler, not ld not sou an n d th e death kn feeling’ in ell for ‘gutth e w or k place.” Somewha t counter intuitively, of freedom given the , a democr connotation acy often re individual s sults in th voice. The e loss of th will of the the minor e majority is ity is abso dominant; rbed within business is th a t of it . T h e trick to a to give the successfu sometimes customer l it’s you, not what they want; but them, tha t has to dec ide what th at is. Indeed, as John Paul Tointon asserts: “The ability to harness the power of previously uncontrollable data will define the industry leaders, not the size of the organisation.” The democratisation of data has the potential to take its place alongside the great business enablers from history, such as the printing press, the railways, telecommunications, and the Internet – things that have dramatically increased enterprise innovation and productivity. The world of business will be changed irrevocably by the democratisation of data, and it’s certainly for the greater good.

www.thebigdatainsightgroup.com

21


SNAPSHOT INDUSTRY LEADERS

Snapshot: INDUSTRY LEADERS

As this report’s survey illustrates, a large number of organisations are struggling to exploit the value in the data they have at their disposal. What form this value may take will vary – it may be improving employee efficiency, revolutionising business practices, gaining customer insights or creating new revenue streams. Here is a snapshot of how some organisations across different industry sectors are leading the way in extracting value through big data analytics.

T

RETAIL

he retail industry is built on understanding your customer. It is important to have a strong relationship with target buyers, hone advertising campaigns accordingly, and ensure you reach the right people, in the right way, with the right products. Traditional market research with a pinch of consumer psychology may help with this cause but the leading retailers today are using far more advanced methods with the help of big data. One area in which big data analytics is becoming increasingly prevalent in the retail industry is the use of social media channels, offering you immediate insight into the marketplace and consumers’ views of your company. Marshall Sponder, analyst, speaker and author of the book ‘Social Media Analytics: Effective Tools for Building, Interpreting, and Using Metrics’, offered his thoughts on the power of exploiting social media data within the retail industry. He says: “Social media data is vital to understanding a customer’s journey, via the various digital touch points where customers come in to contact with a brand. Big data analytics is very valuable in that it is being used by a number of companies and organisations to spot real time patterns among consumers. “Since most social media data is unstructured, it takes considerable work to model and organise it in a meaningful way, and this is the biggest challenge businesses face today,” Sponder claims. The difficult task of exploiting vast mounds of unstructured data is one area in which the retail industry in particular is excelling. It offers them invaluable insight into their target market, providing them with instant feedback on their brand, products and advertising campaigns. “Retail sites like Dell,” Sponder offers as an example, “are using social listening, via a combination of Social Media Management Systems to monitor online chatter and respond to it.” Moreover, online retail giants like eBay and Amazon have long used the masses of data they get through web logs and monitoring tools. By assessing the way a user behaves while on their websites, they can create customer profiles, enabling them to cater their marketing promotions more effectively to each individual.

22

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T


SNAPSHOT INDUSTRY LEADERS

BANKING AND FINANCE

T

he financial world is using big data analytics in a number of ways, one of the most innovative of which is in fraud detection and risk assessment. American data scientist Jesper Sparre Andersen, formerly of Visa, explained how Visa uses data intelligently to help spot fraudulent behaviour. He says: “Visa is ideally positioned to identify fraud because they have the largest stream of transactional data in the industry. “We created systems which would monitor transactions to flag fraud. For example, I could select transactions in Brazil, at gas stations, under $50 and at night, if we noticed that this spending pattern correlated with fraudsters.” This is a prime example of bringing together various forms of data from around the world and creating complicated algorithms to identify and alert an organisation of important information. The scale of the operation has meant that such a task has previously been too expensive or computationally intensive for most companies’ to execute. Cheaper tools and storage mean this is no longer the case.

UTILITIES

T

he utilities industry has been a big data forerunner for some time. Whether it is finding, extracting, delivering or selling the resource, utilities companies must be able to mine their massive data sets to ensure every aspect of the way they deliver their products is as efficient as possible. Anglian Water is the largest water company in England and Wales; its 80,000km of pipes could take you twice round the circumference of the earth. Chris Watts, the company’s enabling project manager, explains how the company monitors its quarter of a million assets, from sewage pumps to water treatment plants, 24 hours a day. Coupled with this performance data, the company also has archives of work management data; that being a record of all the work the employees carry out on the assets. From checking the water treatment plants are running efficiently to ensuring that none of the pipes that stretch from Southend-on-Sea to the Humber bank have

Robin Doran, CTO of the technology-based financial solutions company British Pearl, outlines how his company examines unstructured data to find the best ways of attracting and retaining customers as well as ensuring they can spot those likely to miss loan repayments. British Pearl does not only use its own data sets but works with partners operating in the financial sector to get a far more comprehensive and accurate view of the market. Doran says: “We engage with other companies who provide us with data about potential customers and their presence online. By doing so we can tailor our products and services more accurately to specific individuals and then we can find the best way to retain customers. “The other aspect is working with various data sources to measure people’s likelihood to be fraudulent. We can identify patterns and trends which tell us which individuals can be considered reliable enough.” The lucrative financial sector may have been ahead of the curve in understanding the invaluable insight data can provide if managed and analysed in the correct way, now other sectors can follow suit.

burst, Anglian Water collects huge amounts intelligently, Anglian Water is able to act in a of data from the sensors on its assets. With prompt, effective and agile manner when a the introduction of more advanced big data problem arises. “Now with just three clicks,” analytics tools and a more mature approach Watts says, “the company is able to see to data, Anglian Water is now using its data the problem, filter through all its data to in more intelligent ways, enjoying savings in understand the root cause of the problem both time and money as a result. By using and then make a decision on how to big data analytics, Watts says, the company solve it.” can see if “the work they are doing is improving the performance of the assets”. He adds: “We are moving to a position where we are getting the right reports As these sector snapshots have in the right timeframe, presenting highlighted, whether it is gaining the most relevant information to customer insight and translating that the user so we can make the most into increased revenue from improved informed decision. Previously, our advertising, creating predictive patterns workforce spent the vast majority for fraud and risk detection, or finding of their time on performing ways to address problems before they analytics which systems could do even arise, harnessing big data analytics for them. With the systems now for real time actionable insight can be doing the analysis, our teams can the difference between you and your spend their time making business competitors, success or failure. critical decisions.”

CONCLUSION

Although still generating regular sets of 2,500 performance reports, by deciphering and analysing its data

www.thebigdatainsightgroup.com

23


CASE STUDY TAGGED

Tagged Case study:

T

When its data analytics system struggled to keep pace with its expansion, the popular social network Tagged turned to a big data solution to get instant insight from a rapidly changing business landscape.

agged is a social network for meeting new people. It enables social discovery through shared interests, allowing users to connect with like-minded individuals. Member activities include building user profiles, searching for people to meet and playing social games. Based in San Francisco, California, Tagged consistently ranks among the largest social networks and counts 100 million members and 200 million page views per day. Working with its old data mart, Tagged’s business team was analysing member usage statistics and the impact of website changes but the system couldn’t keep pace with the rapid growth of the database, the company, and its online community. Tagged also wanted more predictive and advanced analytics, which the existing database could not support. John Schleier-Smith, Tagged co-founder and CTO, explains: “Our data warehouse took a few hours or even a day to process straightforward questions from our analysts. In a business that changes as quickly as ours, that’s way too long. “We needed to ask more complex, targeted questions and get answers in just a few minutes. The goal was to make quick decisions and changes to our website so we could drive up traffic and higher quality use of our features.”

24

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T

Capturing richer data sets

Tagged was looking for a solution to take it beyond simple query and reporting to deeper insights that drive significant business results. The company closely evaluated other solutions but decided the EMC Greenplum Data Computing Appliance (DCA) met its requirements best because of several key features: proven scalability, data processing throughput, foward-thinking technology, competitive pricing, and positive references. With Greenplum DCA, Tagged can analyse richer data sets which include user profiles, demographics, user activity logs, social media interactions and technical data, such as the product version being used. Richer data sets and more advanced analytic techniques provide Tagged’s business analysts with timely information that improves processes, drives product development and ultimately helps retain valued customers and increase revenue.


CASE STUDY TAGGED

Tagged was provided with significantly increased compute power (eight-node Greenplum solutions including 48,500 gigabyte hard drives per node and two quad core processors for the more technical among us). Greenplum Professional Services also assisted Tagged with design and implementation. “Greenplum’s Professional Services got us up-and-running, and migrated all of our data, very quickly,” says SchleierSmith. “With some excellent knowledge transfer, our database administrators had the confidence that they could easily run this new technology.”

Website ‘stickiness’ increases

Since the adoption of Greenplum’s big data solution, Tagged has moved far beyond its previous limitations. The company can now continuously load data, execute analytics in real time and generate more detailed reports. Business analysts now ask about any data variable and get results in a few minutes. User activity can be analysed by geography, features used, number of messages posted, returns to the site, and other variables. This enables analysts to look much deeper into data and gain the insight to make decisions. “Greenplum reveals what our members like and dislike in real time, enabling us to fine-tune the website to keep them satisfied,” explains Schleier-Smith. “We know very quickly if new features are performing well and increasing engagement. In fact, members now spend 50 per cent more time on the site, which increases advertising revenue.” With constantly updated fresh data, productivity for database administrators and development staff has also increased, accelerating the rollout of new features. They now produce new code every day compared to their previous weekly release cycle. “Faster time to market is essential because it helps us to strengthen our competitive position,” says Schleier-Smith.

Match maker

Tagged is not a dating site, but with five million new user connections daily, it’s no surprise that many members use it that way. The ‘Meet Me’ social media feature enables members to search for people who share their interests. To deliver the best possible search results, Tagged uses Greenplum to not only analyse a broader set of data points, but also to crunch through roughly 70 million user Keep up with the rapid growth of accounts more effectively. As a data volumes and complexity result, members receive three

Challenge

Quickly load terabytes of data times as many high-quality matches as before. Perform simple and complex queries Data points analysed include for intraday analysis and response how many matches a member Analyse complete data sets, not receives and how many of those samples or summaries matches a user contacts, as well downstream activity such as interactions among connected members and comments posted. Using these statistics and content, analysts examine user behaviour and make decisions about how to improve the experience. “As our Meet Me feature becomes more valuable, user keeps coming back,” Schleier-Smith explains. “We also work with our gaming partners to make social Ultra-fast data processing games more fun, so and analysis users want to keep Scalable platform for advanced, playing.” predictive analytics “Increased use of Meet Me means that our Reduced time to market of new members are exposed to more features from weekly to daily ads and opportunities to spend An improved user experience Tagged Gold – our virtual increases the time members spend currency – which drives the on Tagged.com by 50 per cent business foward. Greenplum analysis is critical to making improvements that keep our members happy.” Greenplum also provides cost-effective scalability to handle company growth. Since 2007, Tagged has grown from 20 million to 100 million members and continues to add tens of thousands of new users daily. With a rapidly growing database to crunch EMC Greenplum Data Computing and analyse, Tagged had to Appliance and Greenplum successfully tackle big data. Professional Services Greenplum’s technology serves as a foundation for Tagged to unlock the business value inherent in big data. “You can put strategic technology like Greenplum in place, but utilising its full capabilities requires you to reshape thinking in the organisation,” says Schleier-Smith. “Imagine what you can accomplish when you have at your fingertips literally anything you need to know in sharp detail, and faster than ever before. “As we’ve learned at Tagged, the result is high value business intelligence that’s allowing us to thrive in today’s rapidly changing marketplace. For us, big data isn’t a problem, it’s one of our most important assets.”

Key Benefits

Solution

www.thebigdatainsightgroup.com

25


CASE STUDY DOUBLEIQ

Australia-based doubleIQ has harnessed the power of a suite of new technologies, including the cloud, to deliver big data solutions. Using EMC’s Greenplum Database the company can now offer its clients responses to data analytics queries up to 300 per cent faster.

F

or the last 10 years, doubleIQ has provided information management systems to clients in the banking, insurance, telecommunications, retail and utilities sectors. It has helped these customers to bring together and then distribute data from and to a range of sources. doubleIQ also provides a consulting service offering advice on how to integrate and present data using the clients’ existing software and tools. Recently, the Melbourne-based organisation has added its own hosted data warehouse infrastructure for the storage

26

T H E 1 S T Big Data Insight Group I N D U S T R Y T R E N D S R E P O R T

and management of companies’ data. The aim is to provide clients with a secure storage environment for fast and efficient access and analysis of big data via the cloud. doubleIQ decided to deploy EMC’s Greenplum Database as the foundation for its cloud-based hosted data warehouse and real time analytics big data services. “As a company, we develop skills, techniques and technology to build the latest generation of information systems,” says Dennis Claridge, business director at doubleIQ. “This has enabled us to solve a range of business problems quickly and cost-effectively for some of the largest companies in Australia.”

Data in the cloud

When deciding on a database to underpin its hosted data warehouse, it was a simple decision for doubleIQ. Claridge says: “We were already using a PostgreSQL open source database, so we could have considered variations of that. We reviewed a PostgreSQL database system for data warehousing, but decided not to formally evaluate alternative databases. “One of the attractions of EMC’s Greenplum Database was that it is a similar build to the PostgreSQL model and we already had a good understanding of that system,” he states. “We were also attracted by its scalability and its


CASE STUDY DOUBLEIQ

price. It definitely offered a better performance at a lower price than the other the billions of rows of data that can databases we considered.” come from its clients, the company EMC’s Greenplum Database uses a parallel processing architecture designed would have to break the data up into to support business intelligence and analytical processing. The database is smaller packages to process it through structured so that large data sets are automatically split across several nodes, the database. The solution has allowed or points of access for the data you are storing and managing. The servers whole data sets to be processed process every query in parallel, use all disk connections according to the analysis being simultaneously, and send data between performed, rather than by segments as dictated by query plans. This volume. This has made the divides the intense data analysis across a host of processing much more different machines, enabling it to be actioned logical and also reduced far more efficiently. the time and effort taken Therefore, using EMC’s Greenplum Database to run analytics. Dramatically reduced time to meant the doubleIQ could keep development “Breaking down data market and cost of implementation and maintenance costs low, enable fast query set takes as much as Enable distribution of data to large and minimise extract, load, transfer (ELT) 50 per cent more time number of business users processing times, and provide increased capacity and effort,” says Dennis and scalability. Claridge. “And because Enabled data queries to be it takes so much time processed at least two to three and manpower to get an times faster After deploying the EMC Greenplum Database, effective process working Reduced data transforming and doubleIQ has been able to process data and then maintain it, loading times from potential three queries much more quickly, in line with client it really erodes days to three hours requirements across terabytes of big data. our productivity. “One of the key things we wanted to see “EMC removes the Enabled the handling of large after deployment was how fast we were able necessity for this. We volumes of data without having to to generate a query and deliver the data back don’t have to run such a break up data sets to the end user, regardless of the volume complex set of processes, Cut in half the time and effort of data involved,” says Claridge. “So far the which means the system is taken to process data speeds have been very good. I’d say it’s at least much easier to maintain.” two or three times faster than any comparable Improved productivity by reducing alternative system.” time spent on system maintenance EMC’s Greenplum Database data warehouse Enhanced scalability capabilities Although doubleIQ has infrastructure contains three terabytes of maintained its staff levels performance data and has approximately over the last 18 months, 14 terabytes of disk space attached. The providing new hosted data database handles all the ‘heavy lifting’ of the client warehousing means the company data , particularly transaction processing. By using the database for its data has effectively doubled in size. As warehousing infrastructure, doubleIQ has demand was growing at a rate of three found ELT tasks are completed much billion transactions a year, it was vitally faster than by other systems. important that the database met Claridge states: “We’ve run doubleIQ’s scalability requirements. very similar ETL processes for “We’ve already scaled the one of our clients using a environment once in the last 12 different database and I’d say EMC’s Greenplum months and we will need to do it again I know that they take Database is at least two in the coming year,” Claridge explains. around three days or three times faster “EMC’s Greenplum Database offers to complete. On our very simple scaling abilities; it worked own infrastructure, than any comparable very well for us the first time and I’m essentially the same alternative system expecting the whole process to be processes takes Dennis Claridge, business director very smooth again. three hours.” at doubleIQ “With a traditional database it’s doubleIQ has also harder to predict what scaling the been able to use its data environment will do. With Greenplum warehouse infrastructure it’s very linear; if we double the size, our to handle big data more processing times will typically halve.” efficiently. Previously, to deal with

Benefits of doubleIQ’s Greenplum Database solution:

Analysis with hast

Supreme scalability

www.thebigdatainsightgroup.com

27


BIG DATA

TRANSFORMS BUSINESS EMC2, EMC, and the EMC logo are registered trademarks or trademarks of EMC Corporation in the United States and other countries. Š Copyright 2011 EMC Corporation. All rights reserved.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.