Issue 15 | theinnovationenterprise.com
BIG DATA MISCONCEPTIONS Big data has many myths surrounding it, we debunk some of the most popular.
THE DATA 30
WE LIST THE TOP 30 MOST IMPORTANT PEOPLE IN BIG DATA AND ANALYTICS.
LETTER FROM THE EDITOR Welcome to Issue 15 of Big Data Innovation. We have seen such huge leaps in the way that Big Data is used across the last 5 years that we often forget about the things that started and grew this now burgeoning industry. Despite this, it has hit the big screen and even won an Oscar this year after Alan Turing (arguably the original data scientist) had his work on breaking the enigma code made into ‘The Imitation Game’. I doubt many were thinking about the future implications when watching Benedict Cumberbatch though. There has also been huge media coverage from the use of data in the NSA, GCHQ to the hacking of some of the world’s largest companies. As well as the plights of Chelsea Manning and Julian Assange after their efforts using data that was leaked or stolen. With this background we have put together a list of the 30 most influential people in modern Big Data. We could have gone back further than we did and I am sure that many of you would argue that others should be included. However, from working within the industry and talking to several key players in the field
we are happy with the list and present it to you within this issue of the magazine. We discuss data governance, the issues with Big Data use and the NSA vs Facebook and Google debate. Laura Denham also revisits the predictions we made at the start of the year about how we thought Big Data would change in 2015. As we are now nearly 5 months in, she wanted to highlight some additional insights that we believe will be making an impact in the next year. As always, if you are interested in contributing or have any feedback on the magazine, please contact me at ghill@theiegroup.com George Hill Managing Editor
Are you are looking to put your products in front of key decision makers? Contact Giles at ggb@theiegroup.com
Managing Editor: George Hill Assistant Editor: Simon Barton Art Director: Oliver Godwin-Brown Contributors: Laura Denham Chris Towers Chris Pearson Gabrielle Morse General Enquiries: ghill@theiegroup.com
CONTENTS
10 GOOGLE & FACEBOOK VS THE NSA THE BIG DATA PRIVACY WAR Chris Pearson tells us about the tension between social media companies and the NSA
CAUTION
4
BIG DATA MISCONCEPTIONS Big data has many myths surrounding it, we debunk some of the most popular.
21 WHAT MAKES GOOD DATA GOVERNANCE? Data Governance is vital in today’s business environment, what is the best way to implement it?
7 THE PROBLEMS WITH BIG DATA As Big Data grows the buzz increases, but are we really mature enough as an industry?
23 MORE BIG DATA PREDICTIONS As we move into the second quarter of 2015, we look at new trends that have emerged.
14 THE DATA 30 We list the top 30 most important people in Big Data and Analytics.
Innovation Enterprise
WANT TO WRITE FOR US? Contact our Editor, George Hill at ghill@theiegroup.com for more information.
BIG DATA
MISCONCEPTIONS BIG DATA HAS MANY MYTHS SURROUNDING IT, WE DEBUNK SOME OF THE MOST POPULAR.
George Hill Managing Editor
CAUTION
“
The way to look at it is Big Data is the Haystack and Analytics is the means to find the needle
5
BIG DATA MISCONCEPTIONS
In the last few years we have seen that everybody seems to have an opinion on Big Data. Some believe that it is a way for businesses to expand and make the most of their resources. Others believe that it allows companies to dehumanize people and simply improve profit margins whilst undermining privacy. Amongst all of these there are several myths that surround Big Data that are simply not true. We have decided to have a look at some of the most popular and give you the truth behind them:
1 2
BIG DATA IS ONLY USED BY BIG COMPANIES Big Data is not something that is exclusive to Fortune 500 companies. It is not even exclusively for companies, many people have utilized Big Data and Analytics for other uses apart from making money. Charities have been using Big Data to help with animal conservation, festivals have used it to keep bars stocked and disaster relief teams have used it to help the most vulnerable during natural disasters. Big Data has a considerable use outside of money making or exploiting new markets, it has the potential to change people’s lives for the better.
THE BIG TECH COMPANIES ARE RICH BECAUSE THEY HAVE THE MOST DATA It is true to an extent, but they are not rich purely because they have billions of users. The insight they have managed to gain from data has allowed them to make good decisions, which has given them a considerable advantage over their competitors. Their culture has then allowed them to implement the correct changes quickly in order to make the most of the potential that their data has given them. Google is not a multi-billion dollar company because they know where I like to shop. They are rich because they know how to use data properly and have a structure that allows them to make changes based on it.
“
Google is not a multibillion dollar company because they know where I like to shop
DATA ALWAYS MEANS UPHEAVAL
BIG DATA INNOVATION
6
3
BIG DATA MISCONCEPTIONS
DATA ALWAYS MEANS UPHEAVAL When people think about Big Data, they believe that it replaces systems and that the entire company will be turned upside down in order to make changes. Although sometimes this is the case, it will only ever be at companies who are either reading the data wrong or who have been doing things badly for a number of years.
4
Pure data is completely unbiased, so if the company is performing well, the data will show this. If a department is successful, it doesn’t mean that it will suddenly be doing badly because of data. It is simply a way of measuring, if there are small experiments that are shown to work then changes may be made to maximize this. However, any company who needs wholesale changes from looking at their data, is a company that needed changing anyway.
THE MORE DATA YOU HAVE, THE BETTER It’s not about the size of your data, it’s how you use it. If you have terabytes of data in an unstructured format, it will be nowhere near as useful as one gigabyte of data that is well formatted and stored in an easy to use way. Insight is essentially gained from finding patterns or trends within the data, it is often therefore easier to find these patterns in smaller samples and will be more difficult to find in larger sets. The way to look at it is Big Data is the Haystack and Analytics is the means to find the needle. If the Haystack is smaller, you are far more likely to find the needle. ie.
“ BIG DATA INNOVATION
Big Data has a considerable use outside of money making or exploiting new markets
The Problems With Big Data Gabrielle Morse Producer Chief Data Officer Summit San Jose
“
Several important patterns could be missed literally through human error
8
We are all aware that Big Data is now an important part of businesses across the world. From the ways that they are making decisions, to the split second choices that machines are making through the sensors attached to them. However, we need to remember that Big Data, unlike many other business related trends, is still in its infancy and our generation are the founders of the movement. This means that it is not yet fully grown and the results of its use are not always perfect. So what currently makes Big Data imperfect? Big Data Sets Do Not Mean Big Information
“
We need to look at data sets in a scientific way. They are not just the pools from which information can be gathered, because much like fishing in a lake, if there are several different types of fish, you could catch any one of them.
We are not yet at the top of the curve of Big Data The same could be true with data, with several uncontrollable variables often included within the data that is being analyzed.
BIG DATA INNOVATION
THE PROBLEMS WITH BIG DATA
One of the most common ways to get rid of these is through filtering the data, which in itself can skew the analysis. When removing some of the variables, you may be taking out the most important one or if you are trying to make the data strictly controlled, it could make the sample so small that the relevance of using Big Data becomes pointless as the results seem relatively obvious. Data Is About Patterns Finding correlations in data is about finding patterns within it. This by its very nature, means that there is going to be a certain degree of bias within the system as the ability to find patterns is inherently biased. This is because these are patterns perceptible to the human brain. With complex data systems it is possible to find some of these without this bias affecting it too much, but in reality we see that many patterns are found through visualization, something that certainly requires mainly human interaction with the images to find a pattern. This may mean that several important patterns could be missed through human error.
You Need To Know More Than The Numbers Identifying a particular common element or trend across different areas is the basic way that patterns are formed. Being able to see that a certain object is being sold at a certain time of year or that people in a specific area are more likely to use a particular website is great, but in order to make any impact from this information it is vital that the underlying causes of this pattern are known. Why are people buying this? What makes people from that area visit that website? Simply knowing that they are doing this may provide insight,
9
THE PROBLEMS WITH BIG DATA
“
Big Data is in its infancy but this is very shallow and does not allow a company to make the most of the opportunities presented to them through this information. If you don’t know why the data shows what it does, how are you going to make changes based on it? There needs to be an understanding of the analysis beyond seeing that it exists, which at the moment is often something that data alone cannot do.
B I G DATA
However, it is worth mentioning again that Big Data is in its infancy. As it is not yet fully established, the chances are that many of these issues will be resolved in the years to come. We are not yet at the top of the curve of Big Data, in fact despite the massive strides we have taken in the last few years, we are still surprisingly near the bottom. ie.
Drive Your Business Success Through Big Data Science
April 21 & 22 Hong Kong, 2015
theinnovationenterprise.com
ie.
& Analytics
Innovation Summit
Ryan Yuan
ryuan@theiegroup.com
+852 8199 0121
Google & Facebook VS The NSA THE BIG DATA PRIVACY WAR Chris Pearson Co-Founder, Big Cloud
“
Decryption also leaves us wide open to hackers
THE BIG DATA PRIVACY WAR
January was not a good start to 2015.
“
The massacre at the Charlie Hebdo offices in Paris in January was a stark reminder of how vulnerable we all are to acts of terrorism. Whoever and wherever you are in the world, you should never go to work, and not get to go home afterwards.
As people from all around the world were glued to the events unravelling in Paris, Big Data was thrown under the spotlight. After all the positive promise, the hype and the countless predictions made about Big Data’s role in 2015, we were instead having to debate privacy and specifically the unanswered question of whether or not internet service providers and social media companies should be allowed to encrypt their users data. David Cameron (the British Prime Minister), fresh from marching in front of the cameras in Paris in support of freedom of speech, made the unusually contradictory move upon returning to the UK of threatening social apps such as WhatsApp and Snapchat by saying ‘In our country, do we want to allow a means of communication between people which we cannot read?’ Critics and ‘privacy-pro’ supporters quickly moved to
11
The NSA claim that they identified up to 100 terrorism related “activities” since the beginning of their surveillance programme slam Cameron’s comments, calling them ‘ludicrous’ and even accusing Cameron of ‘living in cloud cuckoo land’. Besides the many breaches of basic human rights to privacy, decryption also leaves us wide open to hackers who want to meddle and do damaging things which ironically, handing even more power to terrorists. But if he’s right, where does that leave companies like Google and Facebook when it comes to their data? The US & UK government already have the right to make data requests from internet service providers and other social media companies, but if they had their way, Google and Facebook would agree to a ‘revolving door’ policy, when it comes to access to their users’ data, including real time analytics on the conversations, status updates and posts happening across their sites. The Edward Snowden leaks revealed the enormity of wholesale surveillance by government agencies such as the NSA. From their operational home in Utah,
a complex sprawling a head-spinning 100,000 square feet, the NSA are receiving and processing the metadata of around 200 million text messages a day. These messages are then fed through trained algorithms to detect anything from untoward sentiment to full scale terror plots. The NSA claim that they identified up to 100 terrorism related ‘activities’ since the beginning of their surveillance programme in 2006. However, due to ‘classified’ status, little is know about the details, and only 4 ‘declassified’ arrests have been accredited to the NSA. Critics call this the German secret police on steroids and argue that not enough prosecutions have been made to warrant the need for such wide-scaling spying programmes. Government officials will argue that this is a necessity in the fight against homeland and global terrorism. Despite the amount of pressure from government officials in Washington DC,
BIG DATA INNOVATION
12
surmounted onto the boys from Mountain View and Cupertino into giving the NSA more access to their Big Data, there is strong resistance. In November 2014, the ‘Reform Government Surveillance’ coalition was formed between Facebook, Google, Twitter, Apple and Microsoft, in an aid to stop the NSA from hoovering mass email and Internet metadata. In an open letter sent to the Senate, the coalition wrote ‘The Senate has an opportunity this week to vote on the bipartisan USA Freedom Act,’. They also added ‘We urge you to pass the bill, which both protects
BIG DATA INNOVATION
THE BIG DATA PRIVACY WAR
national security and reaffirms America’s commitment to the freedoms we all cherish.’ However, with the USA Freedom Act being scrutinized by a congress controlled by surveillance favouring Republicans, it looks increasingly likely that metadata transparency deals with the big Internet companies will become increasingly clearer. If we are to see an increase in attacks happening across the world, similar to the one that happened in Paris, the question companies like Google and Facebook need to answer is; How long can they
fight the privacy war for? As Google puts it ‘don’t be evil’, but when it comes to the subject of eavesdropping NSA officials versus plotting terrorists, to whom do they intend it the most? ie.
Chief Data Officer Summit
April 28 & 29 San Jose, 2015
ie.
theinnovationenterprise.com
+1 415 992 7598
jdunne@theiegroup.com
14
THE DATA 30
The Data 30 Having worked in and around the Big Data and Analytics space for the past five years, we like to think that we have our fingers fairly firmly on the pulse in terms of who are the movers and shakers in this area. Therefore, we have decided to create the 2015 top 30 in Big Data & Analytics list. This is a list not only of those who are contributing directly to the industry, but those who have had important parts to play in its growth and popularity. As with all lists, others will have their opinions on who we have missed and who we shouldn’t have included. As an industry that is still relatively young and with many of the important players still working behind the scenes, leaving little trace of the work they have done, this is inevitable.
30
Paco Nathan, Liber 118
Paco is one of the best known data bloggers, bringing the latest in data news to his legion of followers. With around 4,500 twitter followers, numerous speaking sessions at conferences and even some key books released, he makes this list with the endeavours he has made to spread information about Big Data. In addition to his media work, he backs up his ability to write with a ‘player/coach’ approach to Big Data, being able to not only write about subjects, but also have an in depth working knowledge of them. This comes from a career in data spanning more than 30 years, including work at NASA and Motorola and even founding his own e-commerce company along the way.
29
Gil Press, Forbes
Gil Press has been the pre-eminent voice of Big Data at
BIG DATA INNOVATION
Forbes, bringing data knowledge to one of the most popular news magazines in the world. Before his work with Forbes, he did considerable work surrounding the estimation of data sizes and with this managed to gain considerable traction within the mass media, spreading the message of Big Data to many, as early as 2000. He makes this list due to his dedication to not only reporting on what is currently happening within data, but also his knowledge and communication of the historical background of it. Giving effective foundation to many of the innovations we have seen today has allowed the wider community to truly understand how we have arrived at this point.
28
Arijt Sengupta, Beyondcore
Big Data today is being driven by innovations that are often coming from the newer players in the market. Arijit, as the CEO of Beyondcore, has created a
company that is helping people who may not be revolutionary Data Scientists, but who still want to be able to compete against companies who have them. It is going to be these kinds of companies who help to democratize Big Data to the masses, and with the skills gap that we are currently seeing, this is going to be even more important in the next few years. Warren Buffett, Investor The success of Big Data does not just come from the business results that companies claim. In order for people to believe that this is going to be a truly revolutionary change in how we do business, it needed a big investor to have the confidence to buy into in a company spearheading the revolution. Step forward Warren Buffett, who after years of refusing to invest in tech companies, backed IBM with $12 billion. This single move from one of the world’s most
15
THE DATA 30 famous investors gave confidence to others to do similar. Since then Big Data companies have seen billions trusted with them.
26
Vadim Kutsyy, eBay
For the past 8 years Vadim has been working at eBay. First as part of, then leading, their data team. In this time eBay has become one of the biggest websites in the world in its own right. In addition it has acquired and grown significant companies such as Skype, Craigslist and PayPal. Through these, as well as the huge numbers of visitors to the original site, they process around 50 petabytes of data every day. Much of their success has been based on their use of data and they have even had an academic paper written about their success. With Vadim’s input as Head Scientist, this was made possible. His achievements are impressive and he more than merits a place on this list.
25
Tim O’Reilly
As the owner of O’Reilly Media, who host the largest data conference, Strata, Tim has become one the major players in Big data. In addition to Strata, Tim has an impressive record in predictions and also in lobbying for internet rights, patent disputes and has even gone head-tohead with Amazon over it. O’Reilly has become the go-to company for Big Data, Analytics and Computing books as its publishing division has some of the best known minds in the industry. Not only this, but Tim was also the man who coined the phrase ‘Web 2.0’.
Billy Beane, Baseball Coach Until recently, Billy Beane was the face of popular analytics. His vision of using basic analytics at the Oakland A’s in baseball was the basis for the Moneyball book, then the film of the same name. Brad Pitt and Jonah Hill helped to publicize the film and what it represented, but it all started with Billy Beane, and his team’s ideas. We still hear the Moneyball analogy at every single conference, in-fact, the media still often discusses Analytics as Moneyball. It has become synonymous with the spread of data and the popularization of data and how many people perceive it.
23
Chris Towers, Innovation Enterprise
After studying computer science at university Chris has been driven to put together the Big Data Innovation summit series, which has spread not only across the US and Europe, but also to Australia and much of Asia. Creating situations where Data Scientists can talk to some of the leading minds in the industry has seen partnerships formed and ideas forged. Chris makes this list not due to his association with Innovation Enterprise, but because few have attempted to spread Big Data across the world and bring this kind of education to areas where it has been lacking quite so successfully.
22
Anmol Modan, Ginger.io
As CO-Founder and CEO at Ginger.io Anmol has helped to bring Big Data from simply a money making endeavour
within a large company, to one which can genuinely help people. According to the company ‘We built our company to empower researchers, physicians and healthcare providers to improve patient care’. With his idea to use data to help with the prediction and prevention of disease through predictive modelling, his drive to make Big Data more, is the primary reason for his inclusion in this list.
Edwina Dunn and Clive Humby, dunnhumby
Although now retired, Edwina Dunn and Clive Humby founded one of the first and best known customer data companies in the world, Dunnhumby. Predating many of the companies who now do similar work, their endeavours (they started in a bedroom in their house in 1989) paved the way for many of the companies who have since taken up their cause. Their famous relationship with Tesco, spearheading their loyalty card business, allowed them to create insights into Tesco’s customers and allowed it to become the largest supermarket in the UK. They were undoubtedly one of the pioneers of mass customer data collection.
20
Hilary Mason, Fast Forward Labs
Hilary was an influential figure in Big Data when she was the Chief Scientist at Bitly. There she was in charge of one of the largest analytics operations in the world, helping to track billions of separate events through Bitly’s platform. Her reputation only grew when she become Data Scientist in Residence at Accel and even
BIG DATA INNOVATION
16
THE DATA 30
further when she co-founded HackNY.
In addition to this, she has authored several Big Data papers and books, many of which have become must-reads for Data Scientists. Scott Howe, Acxiom Acxiom has been described as ‘the largest company you have never heard of’. What makes it unique is that it is not only a huge company (it represents around 12% of the direct-marketing sectors $11 billion annual sales), but that Scott Howe opened up the data they had on people at aboutthedata.com. This action not only gained the attention of the world media, but also showed the amount of data that people could see about themselves. It allowed for the public to see the power that Big Data has and how it can have an effect on them personally. John Schroeder & M.C. Srivas, MapR MapR and Hortonworks were the first vendors to notice the potential that Hadoop could have as a foundation for a company. From there they have built a multi-billion dollar company that is rumoured to be on the verge of going public. John Schroeder, as Co-Founder and CEO along with M.C.Srivas, Co-Founder and CTO, have been the main players who have made this happen. Tom Davenport Tom Davenport is one
BIG DATA INNOVATION
of the pre-eminent minds within Big Data and Analytics, having worked across some of the world’s top universities, as well as running research departments at some of the largest consultancies. In addition to this, he has written sixteen best selling business books, revolving mainly about making business decisions based on data. This dedication to data and spreading the ideas about it through both his teaching and writing, means that Tom is certainly deserving of a place on this list. Tom Reilly, Cloudera Cloudera have been at the heart of Big Data for a number of years and are currently heavily rumoured to be on the verge of announcing an IPO. Under Tom Reilly’s leadership the company has not only become an Apache Hadoop software vendor, but also trains potential Data Scientists within companies. It is for this reason that Tom Reilly has made this list, not only the leadership of a company or the $1.2 billion in funding that he has raised, but the attempt to bridge the skills gap and create even more data driven companies. Sverre Jarp, CERN Sverre Jarp held one of the key positions at one of the biggest pioneers of data that the world has ever seen; CERN. From the invention of the internet to the creation of the Large Hadron Collider, they have pushed the boundaries of what we could do with data. Sverre has been at the forefront of this as
CTO of the CERN Openlab, having previously worked in various data and technology roles throughout the company for the 40 years previous. Gregory PiatetskyShapiro, KDNuggets Gregory has been at the forefront of Big Data and Analytics for many years and founded KDNuggets, one of the first, and best, sources of information on data anywhere on the internet. Investigating societal issues through data, explaining complex theories in easily relatable ways and publishing breaking news within the data community, is just some of the work done by Gregory. Others have larger readerships in terms of numbers, but none can compare to the length of time that Gregory has been working. He has been publishing KDNuggets news since 1993 and has written over 60 publications with 10,000 citations. In 2007 he won the IEEE ICDM Outstanding Service Award for his contributions to the data mining field.
13
Andy Palmer, Tamr
Andy was not only one of the founders and sellers of Vertica to HP in 2010, but he has also started TamR, one of the most exciting Big Data startups of 2014. The idea of the company is to get rid of the bulk of the work needed to clean data prior to data analysis, which is one of the main problems that companies who are looking at their data in new ways has found.
17
THE DATA 30
In addition to this, he is one of the most prolific entrepreneurs working in Big Data at the moment. His serial entrepreneurship has seen him either directly fund or give help to around 35 companies. This kind of work is what has allowed the Big Data and tech industries to flourish in the last few years, and Andy has been a leader in this. Jeff Bezos, Amazon Amazon has become the poster child for a data driven company and how data should be used to push change. The approaches from Bezos include allowing everybody in the company to have the opportunity to test their ideas through data, and making changes when data dictates it. More than Google and Facebook (who have made the bulk of their money from data directly) Bezos has used data to make a difference to the operating model that Amazon was based on. The e-commerce basis of Amazon has been improved drastically from his belief in data insights. Jeff’s approach is a template that many company leaders should be looking to when looking at becoming more data centric. Jeff Smith, IBM IBM has been one of the key vendors in Big Data. They have not only been making money through it, but showing that older tech companies can move into the space. The investment from Warren Buffet (number 27) showed that they are doing a good job within this space.
Aside from this, they have used their financial clout to spread the idea through advertising.
Targeting the general public, they have sponsored several high profile sporting events, specifically with their Big Data and Analytics messages. Jeff Smith currently sits at the top of the big blue pyramid and despite a slight decline in profits (mainly down to a restructure) much of the success and drive behind their Big Data programmes can be put down to his leadership. Monica Rogati, Jawbone The Jawbone UP is one the best personal data devices in the world. There are others that offer more information or are more targeted for one particular use. However, what Jawbone do better than anyone is picking the most important data and showing it in the best possible way. One of the best ways that this kind of data has been shown was in their blog post documenting how sleep was affected by the Napa earthquake in 2014. Having the ability to collect, analyse and show data in an understandable way to the general public takes time and work. It is for her work in this, as well as her considerable input during her five years as a Senior Data Scientist at Linkedin that Monica was an easy candidate for the top 10. DJ Patil, US Government No list outlining the top 30 in data could possibly avoid DJ Patil. He has won so many
awards and had so many cover stories for the world’s biggest magazines, that it would take a long time to write them down. However they include: 2014 Young Global Leader by the World Economic Forum, Forbes The World’s 7 Most Powerful Data Scientists and CNN - 36 of tech’s most powerful disruptors. However, it his terminology creation that has really pushed him up this list, as it was DJ and Jeff Hammerbacher, who coined the term ‘Data Scientist’. He was also the man who made it sexy, when he co-authored the now infamous HBR article describing Data Scientist as the sexiest job of the 21st century. Added to this is his recent appointment as Deputy Chief Technology Officer for Data Policy for the US Government. Jamie Miller, GE Jamie makes the list not only because she is the Chief Information Officer at GE, one of the world’s most powerful companies, but because she is the model of how data science goes beyond technical skill. Having worked as GE’s Controller and Chief Accounting Officer, she came to know the business thoroughly, which gave her the insight to be able to apply the data from her team, to the areas that mattered the most. With GE now becoming one of the world leaders in IoT, with their connected engines and cars, it is possible that this business knowledge will have an even larger effect on the company’s overall company performance.
BIG DATA INNOVATION
18
THE DATA 30
Rich Miner, Google Ventures Big Data startups do not get huge by themselves. It takes a considerable amount of effort from the people working there, but more importantly, investment to fund growth. Rich Miner is one of the key men in this as an investment partner at Google Ventures. He is more than qualified to give his perspective on what makes a tech startup work, as he was one of the founders of Android, now the most used mobile operating system in the world. He also founded Wildfire communications, which was sold to Orange in 2000. He now spends his time investing Google’s billions in small startups, and we saw how popular he is in this area at the Big Data Innovation Summit in 2014. Stephen Wolfram, Wolfram Alpha Wolfram Alpha may well be one of the most important companies for Big Data in the history of its relatively short existence. If you have ever spoken to ‘Siri’ on an iPhone, then the chances are that Stephen Wolfram’s fingerprints have been all over it. Wolfram Alpha is one of the main engines that powers it. He also developed Mathematica, the standard software language and environment for scientific, technical, and algorithmic computation, and algorithmic software development. Therefore, much of what has
BIG DATA INNOVATION
been achieved in Big Data and many algorithms used to analyze, have come from the work that he has done. Kirk Borne, Professor Kirk Borne is one of the best respected people currently working in Big Data and Analytics. His role as professor is well deserved as he has a thorough knowledge of both data and astronomy from his previous experience. He has worked at NASA, processed data from Hubble space telescope and has even won several awards for the groundbreaking work he has done. He managed to excel amongst some of the most brilliant data minded people in the world. He also co-created the field of a Astroinformatics and has been voted as one of the top Big Data influencers on Twitter consistently for the past 3 years as well as winning the 2014 IBM Big Data & Analytics Hero. Rob Beardon, Hortonworks If this list had been written last year, Rob Beardon and Hortonworks would have made it onto this list, but would not be at the heady heights of fourth. The reason for Rob’s high standing on this list is that this year Hortonworks have proven that Big Data is not only sustainable, but also very profitable. Having the guts to bring a successful private company public would have been difficult,
but in doing so he proved to the world that Big Data is not only a force within businesses, but also for investors. It exceeded its initial valuation within the first two days of trading and is currently considered as a safe investment by many traders. This kind of public backing from investors is fantastic for the overall view of Big Data companies and Hortonworks have set a strong example that many others will hopefully be able to follow. Edward Snowden, NSA Whistleblower Not everything within Big Data and Analytics is positive. We have seen data hacks at major companies and people’s personal data being captured and exploited. However, through his work, Edward Snowden has shown the public that their data is collectible and that not everything they do is untraced. It has given people a better understanding of what is traceable and what isn’t, all being done in the most public of arenas. Although many disagree with what he did and even more question what this will do to the industry in the long run, through showing how the NSA and GCHQ were collecting and analyzing data, he alerted the public to what can be collected on them. Sergey Brin & Larry Page, Google Google has unquestionably become the world’s largest and most successful search engine. Beating all others in this regard,
19
THE DATA 30
they have also turned their hand to mobile devices, wearable technology and even self driving cars. All of this has come from a foundation of data. Larry Page and Sergey Brin are both Ph.D. educated in computer science and mathematics, which has given them the best possible preparation to make Google a truly data driven company. It was in fact Page’s invention of the page ranking system that has made them the superior search engine on the internet. It has allowed trillions of searches to find the pages that people want to see.
this framework, and countless companies have based their data programmes on it. It has spawned hundreds of startups and Hortonworks itself was started by engineers who had originally been working on the Hadoop Framework at Yahoo!. Doug Cutting and Mike Cafarella put all of this in motion with their work on Nutch, in 2002. This was the start of what would eventually become Hadoop, the system that millions across the world know. Their stamp is all over the current field of data and analytics, so much so that Hadoop was even named after Doug Cutting’s son’s toy elephant. ie.
It is undoubtedly the most powerful and publicly visible use of data in the world, which is why they are only one rank away from top spot. Doug Cutting and Mike Cafarella, Hadoop Almost all modern Big Data and Analytics programmes are made possible by Apache Hadoop. It is used by governments, multinational companies and even one person startups. The reason for its success is down to its power and also that it is free to use. It has a collaborative element, not only in its development (it has input from hundreds of different developers) but also in the idea that it is a library of programmes rather than an individual piece of software. Companies like Hortonworks, Cloudera and MapR have based billion dollar organizations on
BIG DATA INNOVATION
BIG DATA & ANALYTICS IN HEALTHCARE SUMMIT Improve Outcomes with Big Data
May 13–14 Philadelphia, 2015
ie.
theinnovationenterprise.com
+1 415 692 5378
hlaw@theiegroup.com
What Makes Good Data Governance? Chris Towers Head Of Big Data, Innovation Enterprise
Data governance is one of the biggest issues within departments looking at data today. With the much publicized hacks of companies like Sony and Target, organizations are realizing that in order to excel in the data landscape, they must look at their governance programmes. The success of these programmes comes down to more than simply putting it in place, but takes significant work through both personnel and technology to make it to work effectively. So how should this be done? The first thing is to make sure that management are supporters of the governance initiatives. Arguably, middle management is more important than senior management as they will be the main points of contact who will be driving the changes amongst designated teams.
“
Once you have buy-in from this level of management, they can then become champions of the new initiatives, meaning that they have the opportunity to sell the idea to a wider variety of people.
Arguably, middle management are more important than senior management as they will be the main points of contact.
Prioritization will also become important amongst this group of management as they will generally be directing workflow. This is important simply because data governance can be both time consuming and expensive. Having people who understand what should be done and when, will be the
BIG DATA INNOVATION
22
WHAT MAKES GOOD DATA GOVERNANCE?
“
“
key to giving this investment expensive is not of manpower and resources always the best and the cheapest will the best return. not always do an adequate job. Making sure that data is both clean and fit for This requires both an use will be one of the understanding of what most important aspects you want to achieve and of any data governance a product knowledge to programme too, which assess which will work simply cannot be based better for your business. purely on the work of employees, it needs The virtues of any technology. technology are also about more than just undertaking Therefore looking at a task effectively, it will also technologies that provide make getting buy in from the best value for money others within the company whilst also creating the considerably simpler. If a best quality of work will tool is easy to use, then mean that governance will people are far more likely to be most effective. When be happy using it. looking at technological solutions, the balance needs to be made between Ultimately, the core pillar of a successful data cost and effectiveness. As governance programme is with all things, the most the ability to make the most of your data. This comes from the collaboration of Making sure that people and technology, data is both clean therefore gaining effective buy in and combining this and fit for use will with the best technology for effective governance is the be one of the most only way to achieve this. ie.
important aspects of any data governance programme
BIG DATA INNOVATION
Making sure that data is both clean and fit for use will be one of the most important aspects of any data governance programme
The Rise of the Interconnected World
April 28–29 San Jose, 2015
Featuring speakers from ie.
Internet
of Things
Summit
theinnovationenterprise.com
+1 415 800 4713
meera@theiegroup.com
#205
25
MORE BIG DATA PREDICTIONS
More Big Data Predictions As we move further into 2015, we look at new trends that have emerged.
Laura Denham Producer Big Data & Analytics For Pharma
“
There have been numerous reports from almost every industry that claim disappointment from company leaders at their data programmes
BIG DATA INNOVATION
26
Our previous predictions for Big Data in 2015 were written a while ago and although the points within the article still look like they are going to be correct, we wanted to add some aspects that we think will become increasingly important. Investment To Increase When the bell rang after the second day of trading on Hortonworks stocks, they had exceeded their initial valuation. Since then they have been considered as a solid investment for many traders. This success will have many companies looking at the possibility of going public or opening funding rounds in the next 12 months. Several have even reported that recent rounds of funding have seen unprecedented demand for investment in their companies, something that we believe will only increase in the coming year. Expectation vs Reality Will Decrease
MORE BIG DATA PREDICTIONS
with how the systems work and what can be done with data, the expectations will start to align with reality. This in turn will allow more companies to trust their data and become more driven by it. Sophistication Will Increase Big Data, more than almost every other business function, is driven by technology and the underlying power of it. This technology is driving constant change and subsequently the complexities of the tasks that can be undertaken. This process, combined with an increased understanding of how to use data platforms effectively, will lead to significantly more sophisticated data handling, analysis and storage functions. However, this is not just a nice to have in terms of analysis, but instead will also be a necessity to compete and keep data safe. The amount of data that is now being stored is both brilliant for companies and customers, but also hackers, if they can access it.
As more is being understood about Big Data in the business world, those who make the decisions and who ultimately have the most to gain from results will become more adept at understanding what to expect.
Systems will need to become more sophisticated to make sure that the people who are looking to access this data for underhand reasons cannot do so.
There have been numerous reports from almost every industry that claim disappointment from company leaders with their data programmes. This has not come from people not doing their jobs properly, but instead from people expecting too much too soon. As people become more familiar
As we mentioned in our previous predictions, in-memory will increase in popularity as it allows for significantly faster speeds of analysis. In order for this to occur, the way that data is mined and accessed will also need to become more efficient.
BIG DATA INNOVATION
Efficiency Will Grow
This means smoother
systems, effective data storage techniques and more robust processes. Efficiency will not just be about efficiency within data though, it will also be efficiency within systems. Durability will become as important as speed, with systems being robust and managing to stay online for longer becoming key to sustained and successful data systems. Cloud Based Applications To Grow Internet speeds are faster than they have ever been, the cloud is becoming more secure and companies are looking at ways to reduce the cost of their data programmes. These three points are going to be some of key reasons why we are likely to see data platforms in the cloud becoming more and more popular before 2015 is out. The fact that they allow for additional flexibility at a time when many want to have the capability to access their work from anywhere in the world, is also going to be a primary driver for the success of these systems.
ie.
TAKING CONTROL OF YOUR DATA?
April 28 & 29 San Jose, 2015 Featuring speakers from ie.
Innovation
Summit
theinnovationenterprise.com
+1 415 992 7598
jdunne@theiegroup.com
Access the Minds of Industry Experts www.ieondemand.com Over 4000 hours of on-demand content
Stay on the cutting edge Innovative convenient content updated regularly with the latest ideas from the sharpest minds in your industry. Gain Insight. Optimize results.
sforeman@theiegroup.com
+ 44 (207) 193 1655