Big Data Innovation, Issue 3

Page 1

1


2

Welcome to this issue of Big Data Innovation. Since the last issue, not only have we seen a revamp in the look of the magazine, but we have also seen a huge increase in the significance of big data and the implications that it has on every one of us. Chief amongst these is the news of the various data mining techniques (whether legal or otherwise) used by the NSA. The use of several important data softwares in this could spell trouble for our future use of data. David Barton looks at the kinds of effects that this may have. We also have a chat with Chris Gobby, who took charge of the mobile data at EE during one of the biggest data integrations in telecommunications when T-Mobile and Orange combined. Gil Press gives us an insight into the history of big data from its first inception to the current usage and data needs. As we see more and more data being created Patrick Husting talks us through how to mae sense of the chaos. I hope that you enjoy this issue. We have some great insight from a selection of the keenest minds currently working in Big Data. If you are interested in getting involved in the magazine please get in contact, we are nothing without our readers and the more we can get you involved the more we can create the best magazine for you. George Hill Chief Editor

ยง

Managing Editor George Hill President Josie King Art Directors Gavin Bailey & Joanna Violaris Assistant Editor Chloe Thompson Advertising Hannah Sturgess hsturgess@theiegroup.com Contributors Gil Press Chris Towers David Barton Tom Deutsch Patrick Hustings General Enquiries ghill@theiegroup.com


3

Contents

4 13 18 23 26 29

Gil Press looks at the History of Big Data Data has gone beyond data scientists, who can you find in today’s data team? We talk to Chris Gobby about his experiences as the head of m-data at EE Big Data can be Chaotic, Patrik Husting looks at how to simplify it How might the NSA data scandal affect our future data collection? Mark Howard tells us how Forbes have utilized data in their ad strategy


4

A Very Short History of Big Data Gil Press


5

The story of how data became big starts many years before the current buzz around big data. Already seventy years ago we encounter the first attempts to quantify the growth rate in the volume of data or what has popularly been known as the “information explosion” (a term first used in 1941, according to the Oxford English Dictionary). The following are the major milestones in the history of sizing data volumes plus other “firsts” in the evolution of the idea of “big data” and observations pertaining to data or information explosion. 1944 Fremont Rider, Wesleyan University Librarian, publishes The Scholar and the Future of the Research Library. He estimates that American university libraries were doubling in size every sixteen years. Given this growth rate, Rider speculates that the Yale Library in 2040 will have “approximately 200,000,000 volumes, which will occupy over 6,000 miles of shelves… [requiring] a cataloging staff of over six thousand persons.” 1961 Derek Price publishes Science Since Babylon, in which he charts the growth of scientific knowledge by looking at the growth in the number of scientific journals and papers. He concludes that the number of new journals has grown exponentially rather than linearly, doubling every fifteen years and increasing by a factor of ten during every half-century. Price calls this the “law of exponential increase,” explaining that “each [scientific] advance generates a new series of advances at a reasonably constant birth rate, so that the number of births is strictly proportional to the size of the population of discoveries at

any given time.” November 1967 B. A. Marron and P. A. D. de Maine publish “Automatic data compression” in the Communications of the ACM, stating that ”The ‘information explosion’ noted in recent years makes it essential that storage requirements for all information be kept to a minimum.” The paper describes “a fully automatic and rapid three-part compressor which can be used with ‘any’ body of information to greatly reduce slow external storage requirements and to increase the rate of information transmission through a computer.” 1971 Arthur Miller writes in The Assault on Privacy that “Too many information handlers seem to measure a man by the number of bits of storage capacity his dossier will occupy.” 1975 The Ministry of Posts and Telecommunications in Japan starts conducting the Information Flow Census, tracking the volume of information circulating in Japan (the idea was first suggested in a 1969 paper). The census introduces “amount of words” as the unifying unit of measurement across all media. The 1975 census already finds that information supply is increasing much faster than information consumption and in 1978 it reports that “the demand for information provided by mass media, which are one-way communication, has become stagnant and the demand for information provided by personal telecommunications media, which are characterized by two-way communications, has drastically increased…. Our society is moving toward a new stage… in which more priority is placed on segmented, more detailed information to meet in-


6

dividual needs, instead of conventional mass-reproduced conformed information.” [Translated in Alistair D. Duff 2000; see also Martin Hilbert 2012 (PDF)] April 1980 I.A. Tjomsland gives a talk titled “Where Do We Go From Here?” at the Fourth IEEE Symposium on Mass Storage Systems, in which he says “Those associated with storage devices long ago realized that Parkinson’s First Law may be paraphrased to describe our industry—‘Data expands to fill the space available’…. I believe that large amounts of data are being retained because users have no way of identifying obsolete data; the penalties for storing obsolete data are less apparent than the penalties for discarding potentially useful data.” 1981 The Hungarian Central Statistics Office starts a research project to account for the country’s information industries. Including measuring information volume in bits, the research continues to this day. In 1993, Istvan Dienes, chief scientist of the Hungarian Central Statistics Office, compiles a manual for a standard system of national information accounts. [See Istvan Dienes 1994 (PDF) and Martin Hilbert 2012 (PDF)] August 1983 Ithiel de Sola Pool publishes “Tracking the Flow of Information” in Science. Looking at growth trends in 17 major communications media from 1960 to 1977, he concludes that “words made available to Americans (over the age of 10) through these media grew at a rate of 8.9 percent per year… words actually attended to from those media grew at just 2.9 percent per year…. In the peri-

od of observation, much of the growth in the flow of information was due to the growth in broadcasting… But toward the end of that period [1977] the situation was changing: point-to-point media were growing faster than broadcasting.” Pool, Inose, Takasaki and Hurwitz follow in 1984 with Communications Flows: A Census in the United States and Japan, a book comparing the volumes of information produced in the United States and Japan. July 1986 Hal B. Becker publishes “Can users really absorb data at today’s rates? Tomorrow’s?” in Data Communications. Becker estimates that “the recoding density achieved by Gutenberg was approximately 500 symbols (characters) per cubic inch—500 times the density of [4,000 B.C. Sumerian] clay tablets. By the year 2000, semiconductor random access memory should be storing 1.25X10^11 bytes per cubic inch.” 1996 Digital storage becomes more cost-effective for storing data than paper according to R.J.T. Morris and B.J. Truskowski, in “The Evolution of Storage Systems,” IBM Systems Journal, July 1, 2003. October 1997 Michael Cox and David Ellsworth publish “Application-controlled demand paging for out-of-core visualization” in the Proceedings of the IEEE 8th conference on Visualization. They start the article with “Visualization provides an interesting challenge for computer systems: data sets are generally quite large, taxing the capacities of main memory, local disk and even remote disk. We call this the problem of big data.


7


8

When data sets do not fit in main memory (in core), or when they do not fit even on local disk, the most common solution is to acquire more resources.” It is the first article in the ACM digital library to use the term “big data.” 1997 Michael Lesk publishes “How much information is there in the world?” Lesk concludes that “There may be a few thousand petabytes of information all told; and the production of tape and disk will reach that level by the year 2000. So in only a few years, (a) we will be able [to] save everything–no information will have to be thrown out and (b) the typical piece of information will never be looked at by a human being.” April 1998 John R. Masey, Chief Scientist at SGI, presents at a USENIX meeting a paper titled “Big Data… and the Next Wave of Infrastress.” October 1998 K.G. Coffman and Andrew Odlyzko publish “The Size and Growth Rate of the Internet.” They conclude that “the growth rate of traffic on the public Internet, while lower than is often cited, is still about 100% per year, much higher than for traffic on other networks. Hence, if present growth trends continue, data traffic in the U. S. will overtake voice traffic around the year 2002 and will be dominated by the Internet.” Odlyzko later established the Minnesota Internet Traffic Studies (MINTS), tracking the growth in Internet traffic from 2002 to 2009. August 1999 Steve Bryson, David Kenwright, Michael Cox, David Ellsworth and

Robert Haimes publish “Visually exploring gigabyte data sets in real time” in the Communications of the ACM. It is the first CACM article to use the term “Big Data” (the title of one of the article’s sections is “Big Data for Scientific Visualization”). The article opens with the following statement: “Very powerful computers are a blessing to many fields of inquiry. They are also a curse; fast computations spew out massive amounts of data. Where megabyte data sets were once considered large, we now find data sets from individual simulations in the 300GB range. But understanding the data resulting from high-end computations is a significant endeavor. As more than one scien-


9

tist has put it, it is just plain difficult to look at all the numbers. And as Richard W. Hamming, mathematician and pioneer computer scientist, pointed out, the purpose of computing is insight, not numbers.” October 1999 Bryson, Kenwright and Haimes join David Banks, Robert van Liere and Sam Uselton on a panel titled “Automation or interaction: what’s best for big data?” at the IEEE 1999 conference on Visualization. October 2000 Peter Lyman and Hal R. Varian at UC Berkeley publish “How Much Information?” It is the first comprehensive study to quantify, in computer storage terms, the total amount of new and original information (not counting copies) created in the world annually and stored in four physical media: paper, film, optical (CDs and DVDs) and magnetic. The study finds that in 1999, the world produced about 1.5 exabytes of unique information, or about 250 megabytes for every man, woman and child on earth. It also finds that “a vast amount of unique information is created and stored by individuals” (what it calls the “democratization of data”) and that “not only is digital information production the largest in total, it is also the most rapidly growing.” Calling this finding “dominance of digital,” Lyman and Varian state that “even today, most textual information is ‘born digital,’ and within a few years this will be true for images as well.” A similar study

conducted in 2003 by the same researchers found that the world produced about 5 exabytes of new information in 2002 and that 92% of the new information was stored on magnetic media, mostly in hard disks. November 2000 Francis X. Diebold presents to the Eighth World Congress of the Econometric Society a paper titled “’Big Data’ Dynamic Factor Models for Macroeconomic Measurement and Forecasting (PDF),” in which he states “Recently, much good science, whether physical, biological, or social, has been forced to confront—and has often benefited from—the “Big Data” phenomenon. Big Data refers to the explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely the result of recent and unprecedented advancements in data recording and storage technology.” February 2001 Doug Laney, an analyst with the Meta Group, publishes a research note titled “3D Data Management: Controlling Data Volume, Velocity and Variety.” A decade later, the “3Vs” have become the generally-accepted three defining dimensions of big data, although the term itself does not appear in Laney’s note. September 2005 Tim O’Reilly publishes “What is Web 2.0” in which he asserts that “data is the next Intel inside.” O’Reilly: “As Hal Varian remarked in a personal conversation last year, ‘SQL is the new HTML.’ Database management is a core compe-


10

tency of Web 2.0 companies, so much so that we have sometimes referred to these applications as ‘infoware’ rather than merely software.” March 2007 John F. Gantz, David Reinsel and other researchers at IDC release a white paper titled “The Expanding Digital Universe: A Forecast of Worldwide Information Growth through 2010 (PDF).” It is the first study to estimate and forecast the amount of digital data created and replicated each year. IDC estimates that in 2006, the world created 161 exabytes of data and forecasts that between 2006 and 2010, the information added annually to the digital universe will increase more than six fold to 988 exabytes, or doubling every 18 months. According to the 2010 (PDF) and 2012 (PDF) releases of the same study, the amount of digital data created annually surpassed this forecast, reaching 1227 exabytes in 2010 and growing to 2837 exabytes in 2012. January 2008 Bret Swanson and George Gilder publish “Estimating the Exaflood (PDF),” in which they project that U.S. IP traffic could reach one zettabyte by 2015 and that the U.S. Internet of 2015 will be at least 50 times larger than it was in 2006. June 2008 Cisco releases the “Cisco Visual Networking In-

dex – Forecast and Methodology, 2007–2012 (PDF)” part of an “ongoing initiative to track and forecast the impact of visual networking applications.” It predicts that “IP traffic will nearly double every two years through 2012” and that it will reach half a zettabyte in 2012. The forecast held well, as Cisco’s latest report (May 30, 2012) estimates IP traffic in 2012 at just over half a zettabyte and notes it “has increased eightfold over the past 5 years.” December 2008 Randal E. Bryant, Randy H. Katz and Edward D. Lazowska publish “Big-Data Computing: Creating Revolutionary Breakthroughs in Commerce, Science and Society (PDF).” They write: “Just as search engines have transformed how we access information, other forms of big-data computing can and will transform the activities of companies, scientific researchers, medical practitioners and our nation’s defense and intelligence operations…. Big-data comput-


11

ing is perhaps the biggest innovation in computing in the last decade. We have only begun to see its potential to collect, organize and process data in all walks of life. A modest investment by the federal government could greatly accelerate its development and deployment.” December 2009 Roger E. Bohn and James E. Short publish “How Much Information? 2009 Report on American Consumers.” The study finds that in 2008, “Americans consumed information for about 1.3 trillion hours, an average of almost 12 hours per day. Consumption totaled 3.6 Zettabytes and 10,845 trillion words, corresponding to 100,500 words and 34 gigabytes for an average person on an average day.” Bohn, Short and Chattanya Baru follow this up in January 2011 with “How Much Information? 2010 Report on Enterprise Server Information,” in which they estimate that in 2008, “the world’s servers processed 9.57 Zettabytes of information, almost 10 to the 22nd power, or ten million million gigabytes. This was 12 gigabytes of information daily for the average worker, or about 3 terabytes of information per worker per year. The world’s companies on average processed 63 terabytes of information annually.” February 2010 Kenneth Cukier publishes in The Economist a Special Report titled, “Data, data everywhere.” Writes Cukier: “…the world contains an unimaginably vast amount of dig-

ital information which is getting ever vaster more rapidly… The effect is being felt everywhere, from business to science, from governments to the arts. Scientists and computer engineers have coined a new term for the phenomenon: ‘big data.’” February 2011 Martin Hilbert and Priscila Lopez publish “The World’s Technological Capacity to Store, Communicate and Compute Information” in Science. They estimate that the world’s information storage capacity grew at a compound annual growth rate of 25% per year between 1986 and 2007. They also estimate that in 1986, 99.2% of all storage capacity was analog, but in 2007, 94% of storage capacity was digital, a complete reversal of roles (in 2002, digital information storage surpassed non-digital for the first time). May 2011 James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh and Angela Hung Byers of the McKinsey Global Institute publish “Big data: The next frontier for innovation, competition and productivity.” They estimate that “by 2009, nearly all sectors in the US economy had at least an average of 200 terabytes of stored data (twice the size of US retailer Wal-Mart’s data warehouse in 1999) per company with more than 1,000 employees” and that the securities and investment services sector leads in terms of stored data per firm. In total, the study estimates that 7.4


12

exabytes of new data were stored by enterprises and 6.8 exabytes by consumers in 2010. April 2012 The International Journal of Communications publishes a Special Section titled “Info Capacity” on the methodologies and findings of various studies measuring the volume of information. In “Tracking the flow of information into the home (PDF),” Neuman, Park and Panek (following the methodology used by Japan’s MPT and Pool above) estimate that the total media supply to U.S. homes has risen from around 50,000 minutes per day in 1960 to close to 900,000 in 2005.Looking at the ratio of supply to demand in 2005, they estimate that people in the U.S. are “approaching a thousand minutes of mediated content available for every minute available for consumption.” In “International Production and Dissemination of Information (PDF),” Bounie and Gille (following Lyman and Varian above) estimate that the world produced 14.7 exabytes of new information in 2008, nearly triple the volume of information in 2003. May 2012 danah boyd and Kate Crawford publish “Critical Questions for Big Data” in Information, Communications and Society. They define big data as “a cultural, technological and scholarly phenomenon that rests on the interplay of: (1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link and compare large data sets. (2) Analysis: drawing on

large data sets to identify patterns in order to make economic, social, technical and legal claims. (3) Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity and accuracy.”

Gil Press is Managing Partner of gPress, a marketing and publishing consultancy, and a contributor for Forbes.com. He blogs at WhatsTheBigData. com, where an earlier version of this timeline was published.


13

Start with big data and Splunk® software. End with an unfair advantage. Splunk software collects, analyzes and transforms machine-generated big data into real-time Operational Intelligence—valuable insight that can make your business more responsive, productive and profitable. Over half of the Fortune 100™ use Splunk software and have the business results to prove it. Learn more at Splunk.com/listen.

© 2013 Splunk Inc. All rights reserved.


14

Data is Taking Over Chris Towers As the role of the data science teams becomes increasingly important to overall business success, we are seeing unconventional roles becoming availble within these teams. Chris Towers looks at functions that are growing in importance within data.


15

When the general population think about data teams, the general view held was a room full of mathematicians, computer scientists and analysts.

Journalists

I recently had a discussion with an influential data scientist. We discussed they ways in which she was using data, the innovations Now the view also includes vis- that she had seen and where she ualisation experts who can put thought it would go in the future. together graphs and images to One of the questions I asked durrepresent the data in an easi- ing this was 'who do you think is ly digestible way. The rise of info the most important person in graphics in society as a whole your team?'. I was surprised to has meant that people now un- hear 'my journalist'. derstand data in ways that they never could from complex This multifaceted use of different spreadsheets or data sets. This multifaceted use of different business roles is something that we are increasingly seeing throughout the data world. As the use of data is becoming increasingly important within organisations, the diversity of those working within it is also increasing.

business roles is something that we are increasingly seeing throughout the data world

When people think about data analysis, the output is normally in the form of a graph or a table with figures. The job of analysts and data scientists is to create So what kinds of roles have been this information, but the role that incorporated into data teams the journalist working within her team was to put this into a form and why? that senior business people could understand.


16 The journalist in question had been recruited from a well known business and finance newspaper and she told me that since he has come on board, the understanding of the board has increased exponentially. Having somebody acting as the lingua franca between the data team and the board members is increasingly important within business. A journalist whose target audience is the same as the people you are trying to communicate with is vital.

A journalist whose target audience is the same as the people you are trying to communicate with is vital.

them look prettier, changing the colours and even making the graph 3D or highlight specific changes or trends. However, much like presenting somebody with a table full of numbers, graphs do not inspire exciting and engaging reactions. What we are seeing today is the use of info graphics and interactive graphical representation of data. The benefit of this is that the more people can engage with the data being presented, the more interest they will take in it. An analyst has a job to analyse the data and utilise it in a business context in order to maximise the businesses success. Their job is not necessarily going to encompass an effective communication device to get this information across to a variety of audiences.

In terms of ROI, it is often difficult to work out, but in terms of the value to the team and the company as a whole, having somebody who can take a complex thought and translate it into something A creative who does more that is both useable and un- than just creating static imderstandable is vital. ages are becoming increasingly popular within data Graphic Designers teams. Look at any news Graphs can be made in al- website and the likelihood is most any spreadsheet pro- that you will see an interacgramme. They can be made in tive info graphic to help you almost any word processing really understand and utisoftware. You can even make lise the data you are being


17


18

shown. The same theory is being used in the boardroom and within strategy meetings, just the basic human element of if something looks nicer you are more likely to pay attention to it. Put a black and white table in front of somebody and they may look at the numbers and come to some logical conclusions, put an interactive multifaceted info graphic in front of somebody and the likelihood is that they will drill deeper and interact with the data in a more encompassing way.

willingness to give data openly such as filling in forms or engaging with the brand using social media in order to gain access to their data.

Many companies can do this through online purchases or through the use of in store loyalty cards, but at the same time several industries do not have the capability to do this. Therefore there needs to be new and innovative ways to collect data, either through a

the global economy and having a variety of skill-sets utilised in its business usage is only going to be a positive thing, both for those working with the data directly and the global job market as a whole.

Utilising the skills of marketing professionals within data teams not only allows the data gathering techniques to be as effective as possible but also can target particular groups in order to gather particular data.

Data is becoming a vital division in many companies and this importance has meant a Marketers diversification of job roles beOne of the aspects of data coming involved in what used that is often missed by those to be a relatively autonomous working outside the industry business sector. is that you need to get it from Data has become big business, somewhere. bringing billions of dollars into


1 82 2

19

A global festival of crowd-sourced events for the data community celebrating the impact & application of data in business & society.

Get Involved! Opportunities Include:

City Partner Event Host Speaker Volunteer Media Partner City Sponsor

www.bigdatafestival.co


20

Data During an Explosion George Hill

George Hill speaks to Chris Gobby, Head of mData at EE about the ways in which mobile data has changed during the past 3 years and how the merging of two companies has impacted his data management at EE.


21

as a whole is a significant increase in the amount of data due to the rapid adoption of EE has recently become the smartphones by consumlargest mobile phone opera- ers. Chris admits that not tor in the UK after the combi- only EE, but the majority of nation of Orange and T-Mo- companies are only now bebile. Having gone through ginning to catch up with this this merger of data Chris has large increase in data usage. overseen one of the largest The issue was not that the data merging initiatives seen industry was unprepared in the mobile phone market. for the size of the data, but Having gone through this more that they did not anticrecently, Chris shared his ipate the variety of data that experiences with the audi- would be availability. With ence at the Big Data Inno- the adoption of apps and vation Summit in London. I the breadth of information sat down with him after the that companies can gather event to go through his ex- from this, telecommunicaperiences within data at EE. tion companies had not experienced this kind of inforHaving worked at the com- mation previously. pany for the past three years, Chris has been at the fore- This increase in data usage front of the data integra- combined with the mergtion between both Orange ing of the two data sets put and T-Mobile. This integra- Chris and the data team at tion was a huge undertak- EE in a unique and challenging but also took place at a ing position. time when mobile data as a An additional challenge that whole was going through a the industry is currently seeing is that with the developsubstantial transformation. One of the main changes ment of the data has come that the industry has seen the development of the techI recently sat down with Chris Gobby, Head of mData at EE for the UK.


22

use and as such far easier to integrate with the new data systems. Chris says that with all of the technical work that As the industry as a whole is went into the integration of catching, one of the challeng- the two separate companies, es that many companies are the internal structures and having is the implementation architecture put them in the of increasingly complex data strongest position that they solution technology. The inte- could possibly be. gration of the data with this This kind of cleanliness of new technology has created data has allowed Chris and several challenges for sever- his team to undertake several important telecommunial of EE's competitors. One of the advantages found cation innovations. nology. These have not necessarily been concentric, and there has been a definite lag time between the two.

by EE is that with the amount of work going into the merger throughout the company, the output has been that the data produced is now much cleaner than others working within the industry.

The example that Chris shared with me was his work with Google during the olympics. This consisted of seeing how consumers were using their phones during large events. This kind of insight can allow companies to not only adapt to their customer's needs during specific times, but also to adapt the uses of their phones to specific needs.

The fact that the company was undergoing this massive internal shift during what is one of the biggest changes in telecommunication data history, coincidentally left them in a strong position after the One of the interesting asmerging of the two compa- pects of the telecommunications industry data at nies had been completed. The internal restructure al- the moment is that with the lowed the data to become amount and breadth of data simpler, making it easier to available. This has given the industry an issue. With the


23

amount of data available, how do you drill down to find actionable business initiatives? Chris gave some key insights in to how his team and himself have got over this: "A lot of testing and trying out what's useful" "Knowing the data and having a gut instinct from experience of using lots of data sources� "Not necessarily having the technology to experiment, but the time. We have been able to allocate the time to go and experiment" The telecommunications industry is a great indicator of how data will be used in the future and

how we can have relatively unexpected spikes or changes in the data we use. Most industries can learn from the kind of influx of data that the telecommunications industry has had in the past few years and Chris' lessons can resonate well beyond mobile devices.


24


25

Making Sense of Big Data Chaos Patrick Husting

The top funded venture capital categories are technology companies that deal with gaining insights out of data. Whether they’re a social media company promising the next Facebook phenomenal or a transportation company looking for better routes to reduce fuel consumption, they are both trying to find insight into their business data. No matter what new solutions are developed to track or improve a specific

problem, I guarantee a new problem is being discovered - companies now experience too many data sources, leading to big data chaos. Check out the following slide from Luma showing all the different marketing technology companies (updated March 2013). In the marketing segment alone, it’s all too clear that there is a level of big data chaos like we have never seen before. Everyone


26

claims their solution will collaborate with your existing process, handle all the big data and guaranteeing to deliver better insight. So far we’re hearing companies are spending time, money and patience as they discover more problems and experience the same disappointing results. The real issue is that no one has just one system, they have multiple systems. Companies’ big data issues are not about the volume of data, but rather the number of data sources. As a result, we are forced to visit numerous sources to view the built-

Today, it’s a new level of data chaos like never seen before, but not for the reasons you may think.

the business. This provides a VERY limited perspective of data to determine the true results of a campaign, project or even daily operations. It’s actually a HUGE productivity drain. And that is just marketing systems. Think about how you need to integrate other lines of business data sources into an intelligence enterprise. Today, it’s a new level of data chaos like never seen before, but not for the reasons you may think. Data chaos doesn’t come from the volume of data, but results from the following (to name just a few): - Numerous data and reporting solutions - Various data storage and retrieval locations - Technology moving faster than a company’s capital resources

in reporting solution to glean any - Constant change in technology upgrades insight into the performance of


27

Business intelligence and analytic software companies would lead us to believe that purchasing their selfserve BI tools will fix the issue, but in the long run it often ends up making it worse because analytics quickly become inaccurate, delayed and untrusted.

and develop BI Mobile solutions that make sense of the data chaos. In the end it’s all comes down to math – so join me and other innovative thinkers and eliminate chaos with the R4 rule.

The RIGHT data, to the RIGHT person, at the RIGHT time, to the RIGHT (any) deIn my experience, companies vice. are finally taking a step back and slowing their purchasing decisions, thinking about Patrick Husting is a business intelligence what they actually need mastermind who has guided companies from the volumes of data like Microsoft, Intel, Gallup, HP, Dell, Honand making sure it aligns eywell, Nordstrom and Polo Ralph Lauren with their strategic objecover the past 18 years – establishing a new tives. It is time to go back to way for organizations to leverage data in the basics and re-think the their business intelligence solutions. Uskind of data we need to run ing his own capital, Husting created Exour businesses. We need to tended Results on the premise to provide develop smarter data warea different experience called Personal houses which can assist in Business Intelligence, for businesses. In the collection of these nujust three years, Husting grew Extended merous data sources. We Results nearly 578% in revenue, in spite of need to re-think our delivery the recession. Being the CEO and founder methods to mobile devices of a leading Business Intelligence Services and not just provide another and Software company, Husting is revodashboard squished down lutionizing the way companies evolve by to fit the device, but maybe offering business intelligence solutions to just the metrics that somecompanies across numerous industries, one cares about this month including media/entertainment, financial in a real-time fashion. services, government, health care, manuWith technology moving at facturing, high technology, and retail with unprecedented speeds, it’s new and emerging technologies. nice to finally see companies that stopped to rethink For an interview or to learn more about Patrick and Extended Results please email marketing@extendedresults.com


28

The NSA Scandal and It’s Risks to Big Data David Barton

This month we have seen the true power of big data when it is used in a sinister context. We have previously seen the use of big data in elections and it has been used for several government initiatives in the past.

seen a significant reduction in the numbers of crimes taking place. However, one of the main issues that big data has had to deal with in its public perception, is that it can be a violation of personal privacy.

The use of big data and The NYPD has been us- analytics techniques by ing predictive analytics the NSA in the USA has and crime data to predict shocked many across the where crimes are going world. For those unaware to take place and this has of what they are refuted


29

to have done is use vast data gathering techniques to collect confidential records from individuals around the world. The use of data to track dangerous individuals around the world is something that few would argue did not happen, but the widespread and vast usage of this data has shocked many. The invasion of privacy for many millions is not only surprising, but also undermines much of what has been publicly declared by the US government in the past. Having spoken to several high level executives and data scientists who are working with big data across several important industries, the key to making a successful big data programme is transparency. If people are aware of what their data is being used for, then they are likely to be happier with their information being used. The damage that has been done through this leak could be said to undermine many data gathering initiatives done not only through government agencies, but also through many organisations.

So will this have an affect on the big data efforts that many people are trying at the moment? In reality the undermining of privacy that this leak has caused will have an affect on what people are likely to share. However, the majority of the data that is being collected on people from companies is not something that they are going to be aware of. Targeted advertising for instance, is a form of big data derived from the pages you have visited from that particular IP address. These are not necessarily based on the sites you have visited but on the kinds of sites you have visited and what previous data has said about what you are likely to want to do next. This kind of data gathering is unlikely to change with this, as much of it is subliminal and the likelihood of people changing their online behaviour significantly in this way is slim. However, one of the most shocking aspects of this data breech is not that the NSA is holding vast amounts of data about individuals, but that they are bypassing server security


30

for some of the largest companies in the world to gain access to the information held. They have made the most of technicalities about where data is being stored, to access information from people all over the world.

of one government department may well have an affect of the industry as a whole.

used in malicious or illegal ways.

Unfortunately given the reaction to this, it is likely that reactions will see everything involved with the scandal tarred with the same brush. Hadoop is one of the key drivers behind this and this may be affected by any change that

Traditionally we tend to see tough legislation put into place to placate scandal and controversy. Should this happen within Did you know that for anybody the big data function, this could who has ever used Facebook, have a real and lasting affect on the information that you put on the growth of the industry. there is actually technically with- So what does this actually in the US jurisdiction? This has mean? meant that using the powers given to them in the 2001 patri- In terms of long term affects ot act, the NSA can access any that data collection and analysis of this information, seemingly at is likely to see, it could be minimal or catastrophic. any time. One of the issues revolving Governments all over the world around this story appearing in are attempting to distance the media is that it throws up themselves from this scandal more questions than answers. and one of the ways in which For instance, it would be naive to they may do this is with far think that Google holds less data stricter privacy laws. Not only than the US government, but the would this put a spanner in the usage of that data is not some- works of several companies, but thing that is a perceived threat it would stifle a business functo our privacy. This is why peo- tion that has reaped significant ple are willing to use this service rewards for thousands of comwithout fear of their data being panies around the world. The really damaging aspect of this is that not only has data mining been given a bad name within the highest profile way possible, but that the mistake


31


32 gardless of the information Hadoop is almost certainly that the NSA has this data, one of the reasons that big this was not leaked through data and analysis has seen a hack. It was not a form of such a huge growth in the cyber terrorism that allowed past five years. The open people to know that this data source and unpaid nature of was available, it was an inHadoop has allowed this to dividual whistle blower, who take place. Now that it has regardless of if you think this been implicated in this per- is a good or bad thing, did not ceived breach of personal manage to get hold of this indata, there is a chance that formation through covert or it's capabilities will be more underhand means. closely monitored and regu- This shows that despite the lated. This in turn could stifle data potentially being gathfurther innovations within the ered on you in an invasive way, it is being held secureindustry. One of the major issues that ly. If it was not then this stothose working in the data ry would have come out far space may see, is that this sooner than it did. comes due to this.

exciting industry is taken out of the hands of those who know and understand it and put into the hands of policy makers. If this is the case then decisions may be made for political gain rather than genuine business or personal privacy issues. How should we be looking at this? One of the aspects of this that is a positive, is that companies who hold vast amounts of data are now likely to assess their security. An increase in security can only be a good thing for individual data and improved internal checks on data will further improve this. Another positive is that re-


33

Forbes’ Digital Strategy Success Using Data, Interview With Mark Howard George Hill

Today strategy is going through a shift in power from those in the boardroom to those who are online. One of the companies that has truly managed to leverage its existing strategic position into a true digital dominance is Forbes. With a website that is the envy of many within the industry, as


34

This increase in audience has seen the organisation take advantage of a situation that many traditional publishing companies have struggled In 2011, the traffic coming with. to the Forbes site was at a I was lucky enough to speak plateau, and wsj was at an with Mark Howard, senior almost unassailable point vice president, digital adverabove them, working in the tising strategy at Forbes mesame space and vying for dia about the company’s rise the same audiences. Jump to to digital prominence and 2013 and Forbes.Com finds it's use of data initiatives to itself ahead of wsj.Com with achieve this. a suite of online services that Mark moved across to a straare propelling the company tegic role within Forbes in forwards and seeing them January 2012 from his preincreasingly at the cutting vious digital sales role. This edge of digital offerings. In was midway through the may 2013, unique vistors to transformation that has seen Forbes.Com were approxi- Forbes come out as a leading mately 25 million (according digitally led company today. to comscore worldwide) compared to wsj.Com at 22 mil- Talking to Mark, I gathered lion. Traffic on Forbes.Com is that there were several unique aspects of the Forbes up 35% since may 2012. online strategy that incorpoForbes’ analytics and data rated new data ideas. One of strategies have not only seen the most interesting aspects the company catch up with was the use of the startits competitors, but from an up mindset within the digiinternal point of view, its on- tal team when approaching line audience is up more than analytical and technological 100% since june 2010 when programmes. Forbes.Com was reinvented. well as digital offerings that have almost no rival, the business media company has truly embraced the new ideas within the data realm.


35

What has set this apart from other companies, is that the digital division of Forbes was originally created in 1996 as a completely separate entity from Forbes magazine, meaning that a more lean approach could be taken. The profits made by the online side of the company could also be used for further development of digital and data products, meaning that for the past decade the Forbes online offerings have had the opportunities to invest in new and exciting technologies.

namically and quickly is through the use of their workforce too. “About 18 months ago we invested heavily in the development of our own design team.”

Cloud-based ad creation tools from Flite have allowed Forbes to create ads or make changes in one or two days that may have taken an agency or freelance staff one or two weeks to do. This means that feedback gained from their data can be quickly analysed and acted upon. Moat, a NYMark has described the company as based analytics company, provides ”we act like a 96 year old start up”, in deep dive analytical tools for tracking its mentality towards trying new ap- the exposure and engagement usproaches, new technologies and data ers have with ads, ensuring that the usage. online offerings can be flexible and This has seen an increasing value for changed depending on their current advertisers on the site who, using usages. new data technologies, can now get Mark mentions though that it has a far better ROI and target individ- been important to not only have the uals rather than a relatively generic skills in house, but to leverage the audience. Using a technology called mindset of smaller startups. Through Chart Beat, Forbes can now track the partnerships they have managed to kind of people who are engaging and create a stack of technology that cresharing this content, meaning that ates opportunities for them to monejournalists have a better opportuni- tize their online offerings, but also use ty to create stories that these people the mindset that small startups tend want to read. to have and the excitement and forOne of the ways in which they are ward thrust that this brings. making sure that they can act dy-


36 Forbes has understood that a modern company cannot look at strategy in the same way as it did even five years ago. In Mark’s words “the rules have changed.”

ing is one thing, but to make it truly effective, companies need to create a foundation and jumping off point, using data to ascertain successes and build on them.

Many of its competitors have seen that their traditional methods of aggregating audiences and selling them as a stack to advertisers, is no longer fit for purpose. Using the new technologies has allowed Forbes to target individuals using in depth analytical methods. Forbes now uses a data management platform company, Krux, to look at how the individual is engaging with the content and how this can be changed to truly place adverts in front of the right people at the right time.

One of the key strategies that Mark has identified is that new technologies will always bring new innovations and better ways to deal with data and the ways in which companies can make the most of their data collection techniques. The way to manifest this is through the use of flexible products and creating a foundation of technologies that allow the company to work effectively now, whilst also being will prepared for future technological changes.

In Mark’s eyes, the social web has democratised the publishing tools that were traditionally open only to publishers a few years ago. Although effective businesses should now see themselves as publishers, Mark makes the point that publish-

One thing that is for certain is that when the next big technology and data revolution does come along, thanks to the work of Mark and others within the Forbes management team, Forbes will be well equipped to make the most of it.


37

On-Demand Business Education

www.membership.theiegroup.com


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.