1
ISSUE 8
THE DATA PRIVACY ISSUE
NSA DATA MINING
What long terms effects will the revelations have on Big Data and data mining?
INTERNET OF THINGS
What implications does the Internet of Things have on data analysis?
2
Editor’s Letter
Letter From The Editor Welcome to this issue of Big revolving around data privacy, we talk to Hyunyoung Choi, Data Innovation. With the buzz in the the data Senior Scientist at Pandora, community concentrating about how she is using data to around data protection, pri- help create the music genome vacy and hacking, we wanted project.
As Getty Images is moving from a traditional image liTherefore we have four fan- censing company to a data tastic articles focussed on driven internet company, we some of the key areas where look at how this could work data privacy and use of data out for them. in prevention of illegal activity Heather James, creator of the HR Analytics summit also is a core issue. We look at how the NSA data gives her opinion on the imcollection revelations may portance of HR analytics in have far reaching conse- today’s companies. to create a section dedicated to these issues.
If you are interested in conDavid Barton takes a look at tributing to the magazine or if the ways in which govern- you have any feedback please ments and companies are get in touch. using open data and how this use of data can help society as a whole in the future. quences beyond bad press.
As data collection is at the centre of several contentious issues, Simon Barton reports on how it is being used to prevent crime and fraud.
§
Managing Editor George Hill Assistant Editors Simon Barton Chris Towers President Josie King Art Director Gavin Bailey Advertising Hannah Sturgess
hsturgess@theiegroup.com
Contributors Heather James Chris Towers David Barton
George Hill
Managing Editor
Chris Towers also investigates the data issues that surround the spread of the Internet of Things especially after the acFor Advertising contact quisition of Nest by Google. Hannah at As well as these stories hsturgess@theiegroup.com
General Enquiries ghill@theiegroup.com
Contents
Contents
5 11 14 19 25 29 32
BD&A Retail CHI Mag Ad Half ver3.pdf
1
21/03/2014
Heather James investigates the importance of HR analytics in today’s businesses Getty Images are making a revolutionary move towards innovating Big Data, will they succeed? We talk to Hyunyoung Choi, Senior Scientist at Pandora about her work in creating the music genome project We look at how the NSA scandal could have long and irreversible consequences on Big Data As companies and governments are opening up their data, David Barton looks at how we could be doing better How is Big Data being used to detect and prevent fraud? We look at forensic analytics’ effect on companies The Internet of Things is becoming a big deal, but what are the data implications of it?
19:39
Big Data & Analytics for Retail Summit June
19 & 20 Chicago, 2014 #IERetail SPEAKERS INCLUDE:
3
4
Transform Big Data into insight, and insight into action Experts say big data will double every two years. Today, companies are scrambling to figure out the secret to transform Big Data to big insight, and insight into action. Faced with data that is ever expanding in volume and speed—and questionable in accuracy, businesses must quickly identify how to sift through this changing data, to arrive to actionable insight. To unlock the value of Big Data, they’re turning to predictive analytics. At D&B we recognize the value of transforming volumes of data into actionable information. That’s why we now offer an Informed Perspective—indispensable guidance that helps business anticipate new opportunities, curb risk, and discover more efficient ways of making decisions.
The future belongs to the informed. For more information, visit www.dnb.com/bigdata.
© Dun & Bradstreet, Inc., 2014. All rights reserved. (DB-3706 2/14)
* Source: Companiesandmarkets.com
HR Analytics
The Importance of HR Analytics Heather James Big Data Leader
5
6 6
HR Analytics
The use of analytics within HR is nothing new. For years, organizations have used data to support decision making within strategic workforce planning, talent management, succession planning and identification of the skills needed for growth.
Workforce data collection is being effectively used at Google, where the Chief People Officer, Prasad Setty, claims that they are “trying to apply the same rigor to the people side as to the engineering side.” Google are placing as much importance in their workforce designing, developing & delivering innovation as they do in their actual products. Google has found that the employees who are ‘happiest’ are the ones who are the most dedicated to their work and feel they have complete ownership. This allows them to put in place methods so that a greater number of the workforce has this sense of mission in their work.
The explosion of Big Data over the last couple of years has added a new dimension to HR analytics focused on the workforce already in place – gauging employee satisfaction and engagement as well as predicting future trends. The wealth of information now available means that organizations can collect more data on their employees than ever before and therefore glean deeper insights on both an individual and enterprise level. The digital is The idea that employee engage- revolution the ment and satisfaction improves changing of HR, productivity and business per- face formance is an established idea. as individuals However, it is only recently that are beginning identify employee engagement has to been effectively measured on a that practices wider enterprise scale, thanks associated with to an increased use of analyt- personal life can ics. In the past, satisfaction had be beneficial in the been gauged on a small sample workplace. According of employees. However, in the to a Microsoft survey digital age, Big Data allows for a of 9,000 workers, 31 much wider sample of the work- percent would be force to be analyzed. The benefit willing to spend of this insight to HR departments their own money is undeniable, although it is im- on a new social portant to maintain transparen- tool if it made them cy to minimize the risk of privacy more efficient at work. concerns amongst employees.
HR Analytics
This demonstrates how technology is becoming increasingly viewed as a ‘must have’ rather than a ‘nice to have’ within the workplace, and also signifies the multiplatform nature of the modern workplace. A good example of this is the role of social media, which has changed the way we communicate in our personal lives. Now, organizations are beginning to see the benefits of bringing social platforms to the enterprise for peer learning, to gather feedback and to increase the connection between employee and organization.
7
media could increase employee’s feeling of membership of an organization by allowing them a space in which to ask questions and discuss issues. This is a brilliant example of an institution attempting to harness the power of social media, not only to gain information and perspective on their workforce, but also to drive employee engagement within the organization.
Aside from a platform for employee feedback and engagement, social media is now emerging as a tool to assist peer learning. To continue to use the NHS as a case study, in the UK In November 2013 The National those working in the healthcare Health Service (NHS) – the UK’s sector join in chats on Twitter, biggest employer, published a called ‘tweetchats’, to have con12 page guideline document for versations about topics of interstaff members to encourage the est. These discussions often last use of social media, explaining its an hour, involve both senior and benefits (and pitfalls) and laying junior healthcare professionals, out guidelines. The tag line read and have defined topics such ‘If we trust our staff with our as discussing academic papers. patients’ lives, why do we not Aside from Twitter other plattrust them with social media?’ forms have begun to emerge as The document acknowledges useful social sharing and peer that social me- learning tools within the workdia presents place such as Adobe Connect, a great Yammer and Google Hangouts. o p p o r- Within my own workplace, social t u n i t y media has recently been used for the in conjunction with gamification organi- to motivate out of the box thinkzation to ing and product promotion: the listen and challenge was to achieve the have conversa- first sale of a new product puretions with the people they wish ly by social sharing. This helped to influence. The NHS believes us gauge a number of metrics; a policy of open access to social
8
HR Analytics
internal competitiveness, team working (based on retweets, sharing of others posts, etc), and an individual’s engagement and determination with social media. Gamification methods are becoming widely used in HR to motivate and up-skill employees, as well as improving data collection. Gamification applies behavior-motivating techniques from traditional and social games to non-game environments - but how can playing a game have true business impact? Gamification within HR can be used in a number of ways, for example in some instances it is implemented to promote a positive corporate culture by assessing and rewarding employees for cross-departmental collaboration, improving processes, or simply going the extra mile. Employees can benefit significantly from gamification programs that create an environment in which they feel recognized and rewarded for their achievements, whilst the organization is able to gather data and metrics on individual workers in the meantime. Indeed, as succinctly put by Jeannie Meister, author of The 20:20 Workplace, ‘Gamification should not be just about fun, … it should be consistent with an organization’s analytics-driven approach to workforce management and aligned to their business goals’ France and Bhutan recently be-
gan measuring the GNH (Gross National Happiness) of their respective nations and in turn are analyzing how this correlates to their economic performances. Sears Holdings have taken this idea and applied it across their enterprise by using gamification methods to assess the performance, engagement and morale of their employees and analyze whether the happiness correlates in any way to the performance of different parts of the organization. The idea of using ‘happiness ‘ as an analytics metric is interesting, and whilst the findings of Sears’ investigations are yet to be released, it is clear that the use gamification is becoming increasingly important to gather data on employees. Gamification can also be implemented as part of learning and development. An excellent example of this is Deloitte’s Leadership Academy – an online curriculum where participants are encouraged to work through levels with the reward of certain ‘badges’. The program applies behavioural science and gamification to motivate users to work through targets. Participants have the option of syncing their social media pages with the Leadership Academy so that they can share their progress and ‘badges’ with their wider network - thus initiating multi platform engagement with the program.
HR Analytics
Since the implementation of gamification on the Deloitte Leadership program, there has been a 47 percent rise in returning weekly users . The increase in engagement has two-fold benefits for the organization - the first being that employees spending more time on the platform will be accelerating their progression and improving their skills at a faster rate, the second that the increased usage results in increased data collection.
9
portunity and a challenge together to get a holistic view to those working within the of the engagement and satisHR function. The opportuni- faction of the workforce. ties are seemingly obvious Sources – more sources, more data, Jeanne Meister, Gamification in Leadmore insight, which can ulti- ership Development, Forbes, September 2013 mately lead to improvements Jeanne Meister, 2014: The Year Social in employee performance HR Matters, Forbes, January 2014 and therefore business per- Steve Lohr, Big Data Trying to Build Betformance. The challenges in- ter Workers, New York Times, April 2013 clude implementing new ide- Rawn Shah, Trying to Build Better Workas: What do you measure? ers with Social Analytics, Forbes, April How do you measure? Where 2013 is the line between gathering Jim Dougherty, Think HR Isn’t Monitoring your Social Media?, Social Media Todata and a one-way mirror day, October 2013 situation? HR is only just find- 10 Shocking Statistics About Employee The new range of data sourc- ing its feet with Big Data but Engagement es available in the age of Big the only thing that is for sure http://www.nhsemployers.org/Aboutus/Publications/Documents/HR-soData present both an op- is that the innovation here is cial-media-NHS.pdf about bringing all of the data hr analytics half.pdf
1
13/01/2014
18:22
HR & Workforce Analytics Innovation May
21 & 22 Chicago, 2014 SPEAKERS INCLUDE:
10
Getty Images
How a Picture Company Could Be Revolutionising Analytics Chris Towers Big Data Leader
11
12
Getty Images
dia around the internet, they needed to look at news ways of monetizing their content. The only real way that they previously had to combat this was through the use of lawyers and the threat of legal action against people who have breached their copyright. The issue with this was the complexities of the casOne of these would be image es as well as the sheer number licensing. Although there are of images that this would have elements of big data such as had to cover (some estimates tagging and some image rec- put the number in the millions). ognition, in reality suggestion Therefore a new strategy was engines are as far as you would necessary. think it could go. The new strategy is for them to A recent Gartner survey showed that banking and retail are two of the most developed industries for Big Data. The transactional elements of these industries makes it’s integration more obvious, but there are some industries where you would not expected Big Data to have permeated yet.
However, Getty Images, one of the world’s largest image repositories, has taken the unlikely step by creating a new way of using Big Data. In response to the widespread unauthorised use of their images on blogs and social me-
allow users to use the images on their website for free, but in order to do so they need to embed them with a code prepared by Getty. The code at the moment is relatively dormant but in future could have a multitude of uses.
Getty Images
It could be utilised in future to place adverts (much like the ads before Youtube videos) which would add an additional revenue to Getty and bring them into a new online model. However, the most interesting aspect is that the code could also be used to monitor the data of the page visitors. Given the number of websites that this could potentially feature on, this could see Getty becoming a major player in the data world. How this data will be used and more importantly how it could be monetised is not known,
but the volumes that this could potentially create will make it an impressive data aggregator.
This decision by an online image licensing company is certainly a bold move and will be a marker for how other comOne aspect that this data panies can utilise big data in could be used for is targeted non-traditional ways to imadvertising, something that prove their offerings or include would clearly work well with additional revenue streams. the commercial aspect that these pictures will have. This would certainly bring it in line with other online advertisers like Yahoo, Google and Facebook where metadata is used to customise the ads you are seeing, with the breadth of sites that this could effect, this could be even more effective.
BIG DATA BD Healthcare half ver2.pdf
1
15/01/2014
13
12:26
& ANALYTICS IN HEALTHCARE MAY 14 & 15 2014, PHILADELPHIA
SPEAKERS INCLUDE:
14 14
Pandora
Interview with Hyunyoung Choi, Senior Data Scientist, Pandora Simon Barton Assistant Editor
Pandora
There is so much music out there, far too much for anyone to feasibly wade through every album or song in circulation. It is for this reason that Pandora started the Music Genome Project in 2000, which has seen their dedicated group of analysts analyse and tag millions of songs. They do this to make sure that their customer base has the opportunity to discover music they’ll love. Tell the Pandora team that your favourite band is The Beatles and they’ll find a band similar,
based on melody, h a r m o ny, instrumentation,
rhythm, vocals and lyrics that you’re ultimately likely to enjoy. Hyunyoung Choi is a Senior Scientist at Pandora and plays an important role in bringing newfound music to the masses. Hyunyoung is well versed in what it means to be an employee at an innovative company, having worked at Facebook and Google. A common theme running through all of these companies is the abundance of data that they hold. Clearly, Pandora isn’t quite within Facebook and Google’s remit in terms of size, but still, a huge amount of data remains. Hyunyoung states the way in which Pandora deals with data is different from Google and Facebook, and that there is no universal method of success, as each company must devise their own strategies as to maximise business operations. Big Data is an ever-changing element, one that is seemingly
15
16
Pandora
evolving all the time. Hyunyoung has been part of the Big Data revolution and has consequently overseen some of the industries most important developments. She states; “data management became more important than ever” as well as pointing to the ever increasing amount of companies that want to leverage competitive advantages from Big Data.
big picture. Although this does sound suspiciously like management clichés, perhaps in the Big Data world they ring true. The ability to ‘paint pictures’ is undoubtedly an imperative attribute for a Data Scientist given the sheer amount of data at their disposal.
One of the aspects that helps to level the playing field in this respect is that many of the key data platforms are open source. Start-ups have benefited immensely from this, with Hadoop being of particular importance. Not only can they access Big Data, but also, they can use it in an effective manner that allows them to offer top service and grow quickly. If they have the right skills, this can truly level the playing field with larger companies.
ed to see if there will ever be too much data to analyse, Hyunyoung believes “How much Data is too much Data? This is something I don’t have an answer to.”
Big Data isn’t a passing fad and Hyunyoung believes that it will still be ‘hot’ in five years time, Hyunyoung is unequivocal in her testament to its importance assertion that these develop- within business operations. She ments have been made possi- states; “Big data will become a ble by Open Source Software. key basis of competition, proWithout it, despite having mag- ductivity growth and innovation” nitudes of data at our fingertips, The current tech boom has been the competitive advantages boosted by the use of Big Data that companies have been ex- and it will continue to play a periencing would not be able to key role in its success. We take place. are always interest-
Barring the obvious skills needed to successfully undertake a role like Hyunyoung’s; she reiterates the importance of understanding culture and seeing the
Business Intelligence Innovation Summit
17
May
21 & 22 Chicago, 2014
For more information contact Jordan Dunne +1 (415) 992 7918 jdunne@theiegroup.com theinnovationenterprise.com/summits
18
Data Security
Data Security
NSA
Has The NSA Scandal Irreversibly Damaged Data Security? George Hill Managing Editor
19
20
NSA
The NSA revelations of the last 2 years have sparked outrage amongst both consumers and companies. Consumers felt let down by the government as they believed that their rights had been breached and that the US government had taken dangerous steps towards becoming a big brother state. It is often an accepted fact that monitoring of targets was necessary for public safety, but the blanket surveillance of civilians went well beyond the point of acceptance and created an aura of being spied on by their own state.
of what they said it made the situation worse. If they didn’t know about it then surely this shows that their security was not what it should have been and if they did know about it then it was undermining the privacy of their customers. One of the other key reasons that they couldn’t speak out about this kind of work was because they were gagged due to the use of secret court proceedings to acquire the data in the first place.
Without being able to tell people what they had to hand over or why, meant that this grey area became almost impossible Big companies like Google and to explain and therefore trust Facebook were aware of low in these companies decreased level data gathering as this was even further. generally conducted through re- Another aspect that quests for data. This meant that these companies did companies needed to co-op- not consider was exerate but could often make actly how the govthis process awkward for gov- ernment was ernment departments, charg- r e c e i v i n g data. ing additional labour costs and the sending over data sets that A c c o r d would take extensive work to ing to many make useable. When revela- within these tions came out that backdoors c o m p a n i e s , couldn’t within code had been used to they access company servers and u n d e r s t a n d that a blanket surveillance from how Edward data moving through underwa- S n o w d e n ter fibre optic cables had been had got the used to mine data, this shocked numbers that he everybody. The reputation that these companies had created with their customers, where the data they had stored was presumed to be safe, was broken and regardless
NSA
had in terms of what had been gathered. Given that the amount of data requests to these companies sat in the low thousands, when there was talk of millions of people being affected, this did not seem to add up. This confusion between what was being reported and what these companies had actually given over created an uncertainty that once again made people wary of what the companies were saying.
21
courts. The new upstream system meant that the government mined the data before it went to the company servers, so they would gather this information as it passed from the public internet to the company cloud. When companies became aware of this there was a huge outcry with senior leaders joining together to condemn the actions.
These combined actions have created a culture of mistrust In reality, the reason for this between customers, companies number being so high com- and governments. Data privacy pared to what had been given was always an important asby the companies, was that the pect of internet usage, now we government had changed their are seeing that it is vital to condata gath- tinued growth. ering techniques from The use of big data has allowed downstream customers and companies to activities to get what they want, when they upstream ac- want it, and up until this point tivities. With the arguments revolved around downstream ‘I don’t want people knowing activities, this what I’m doing/buying, it’s perwould con- sonal’. Now it has moved onto ‘I sist of com- don’t want people snooping on panies giving me to see if I’m a criminal’. the data that This had further reaching implithey already cations for those using big data, had on their going well beyond a lack of trust servers, which between companies, governwas the infor- ments and consumers. mation that The level of deceit that many the governcountries felt from the US ment would government meant that they r e q u e s t were willing to take drastic and what steps to stop the spying on their they hand- people. Some, such as Brazil ed over due and Malaysia, were looking at to the secret creating ’splinternets’ where
22
NSA
all of the information created in that country would need to be stored within their borders and not communicated outside. The implications of this would have had serious repercussions to the internet as a whole and the growth of companies in the future. Youtube, for instance, would not be able to function as it does today if these kind of actions took place. Due to the amount of data they hold and need to have accessible on their servers, they need to have servers all over the world rather than just in the US. This kind of action would make that kind of data localisation impossible.
consumer privacy blueprint and take a harder look at existing policies,” said Podesta about the 90 day study.
Equally, one of the key reasons that trading between countries today is so easy, is due to the data that can be collected from across the world. With this flow of information gone, this kind of trading would cease almost immediately meaning that it could have a major effect not only on the local economy of that country, but on the global economy as a whole.
If we look at the pace of change in this area, aspects of data mining change every six months and are almost irrelevant within a year. Is it possible for legislation to ever be made fast enough to keep up?
So what is being done?
The idea behind this study and others in the comprehensive review of Big Data and analytics is to shape future policies in this area and help the government become more informed on Big Data issues and how companies are using the technology. In principle this seems like a great idea, after all, more informed decisions makers can make more informed decisions. However, at present, this may not be enough.
Society is now creating more information in 10 minutes than we created between the beginning of humanity and 2004, with this pace of change and this volume of information can legislation ever be good enough?
In January Barack Obama initiated a study into the effects of Big Data and analytics on data gathering techniques used by the government. This is what has led to John Podesta’s study Sweepinto data mining techniques that ing polis currently underway. icies “The study is fundamentally a to try scoping exercise. We want to to enexamine the administration’s c o m -
NSA
pass as much of the technology and techniques will be ineffective as many of the companies who are utilising them can afford the best lawyers who will always find loopholes when policies are not specific enough. The current legislation often dates back as far as the 90’s when the internet speeds were 1/1000th of what they are now and the processing power of hardware was a fraction of what we possess today. This kind of protection was fit for purpose with that technology, but within 5 years was obsolete.
decade.
The same thing cannot be repeated otherwise we will once again find that these laws are easy to circumvent when hardware improves in the next
civil liberties and freedoms, the importance of them be-
So in order to have the most effective policies they need to avoid blanket assumptions and concentrate on specifics. The issue here is that these are what shift the quickest. Would there be any government in the world that could make policies to keep up with the demand? With the sensitivity of the issues surrounding data protection and the policing of them involved, passing these into law would take 6 months or more, meaning that by the time they are enforceable they may well be The actual amendment that out of date and useless. allowed this kind of data col- Conducting studies seems lection dates back to 2008, pointless when by the time the which although only being 6 reports are proyears ago, seems like the dark duced the inforages in terms of data collec- mation is out of tion. The amendment to the date. With any FISA act under George Bush’s kind of work administration allowed the that involves government to track people an indusoutside of the US who were try that suspected of being involved m o v e s in criminal activity. At the time at this there were concerns about the p a c e amount of power this would this is alhave given to the government ways gowhen gathering information, ing to be but given the technology then an issue, but compared to now, it pales in when these comparison to how much was are issues reeventually collected. volving around
23
comes even greater. Policymakers around the world will struggle to keep up to date with this pace of change but it is an absolute necessity that they come up with something that falls between blanket legislation and specific areas. It may be down to a self legislating body made up of the major players in this area, but even then the process will need to be passed through both houses. It is an issue that has no easy solution, but given the state of affairs currently, a solution needs to be found quickly.
24
#SocialWebMiami
Social Media & Web Analytics Innovation Drive Success Through Innovative Digital & Social Media Analytics
November
12 & 13 Miami, 2014 Speakers Include:
For more information contact Lewis Chandler +1 (415) 692 5281 lchandler@theiegroup.com theinnovationenterprise.com/summits https://theinnovationenterprise.com/summits/social-media-web-analytics-innovation-miami
Open Data
We Should be More Willing to Open Up Our Data David Barton Analytics Leader
25
26
Open Data
Big Data has grown to a place that few would have predicted five years ago and has touched almost every industry in the world. This has come from a number of factors from an increased appreciation of the practice, society’s increasing willingness to share information and the software needed to explore this new area being openly available.
ous about why more data isn’t made available to the public to use.
One of the biggest success stories of Big Data has no doubt been the use of Hadoop and the maintenance of Hadoop as an open source software. The fact that this incredibly powerful tool is still free and open for use by anybody has meant that companies have not had to worry about excessive software pricing and has allowed experimentation from those with an interest in the area without having to invest heavily in the platform.
This is clearly important as this kind of work allows a group of people with an interest in the subject to explore the data far more extensively than an individual could. It is the ultimate in two heads are better than one.
The Guardian newspaper has been an open advocate of using open data to help break stories and trawl through datasets to find value. It has allowed it to publish stories buried deep within thousands of documents and correlations that have come from thousands of lines of data.
We are increasingly seeing that public data is being made freely available for this reason. Chicago recently released it’s first open data report outlining the information that it has released to be used by the public. As of Although not the single factor December 2013 they had 592 that has created the current available data sets held on their popularity, the open nature of data portal including anything this data platform has certainly from departmental budgets to been a catalyst for change. the locations of flu shot clinics. So with the success of this open This kind of work has increased nature in mind, the transparency of the city auwe are thorities whilst also curi- a l l ow i n g for
Open Data
true democracy whereby the civilian population can play an active part in analysing performances and making suggestions based on numbers. So why isn’t this being done more openly in other cities and countries? The fact is that most of data collected from around the world is done so with public money, so if the public are paying for the collection of the data, shouldn’t they have the opportunity to analyse it too? There are certainly countries that are moving ahead on this such as the UK and US, but even here there is information that could be released to help with issues currently affecting these areas. As we have mentioned before, the success of many data projects has been down to the open nature of their sources allowing a variety of people to analyse them. The difference between the public sector and private sector is huge. Working for the public sector is often seen as the morally upstanding position whilst those working in the private sector are more driven by money. In reality the best tend to move into the private sector as this is where they are most likely to be able to grow and also earn considerably more.
analysis. This is not a slight on those who work on data projects in the public sector, but in reality there will always be somebody better and in the case where demand outstrips supply, they are normally found at the company who pays them the most. Giving these people the opportunity to work with this data means that a government is creating the best opportunities to make the most of the analysis. It is also normally the people who are the best at a particular job who have the most interest in it. A prime example of where opening up data has helped with civic issues is FloodHack, a hackathon that was created to help victims of the recent floods that hit the UK. Using flood level data that had been opened up by the government allowed data experts and coders to create programmes designed to help those in flood hit areas.
Another example of how this is being used for civic good is in France, where they have utilised the agility of the startup model to create a new data platform. The French government had created a data repository, but it’s usefulness was thwarted by the search capabilities and the sheer amount of data available. They took a small team (10 people but with only 4 actually So if the best are in the private working on the new platform) sector and this kind of data is not and created a new platform for available to them, then the reali- allowing the flow of open data. ty is that we cannot get the best The initial platform allowed
27
28 28
Open Data
people to see the data and use it, but it did not allow for people to actually see how it had been used. The new platform allows people to not only download the data, but also see how people had used it for positive good. For instance Slate.fr created a map showing where tennis and football was popular and this can now be found on the new site.
erage existing data to have the same kind of impact this could be a bonus.
Overall, if data became more open and useable for everybody, the benefits would be huge. There are clearly elements to think about in terms of data security and there will be mistakes on this front before we get it right, but this is surely a risk worth taking for the eventual Away from civic work, open benefits. We know that data is data could have a big impact creating opportunities all over on startups and entrepreneurs, the world, from Africa to Ausincreasing company revenues tralia, but it is going to be down and driving growth. We have to governments and innovators seen that companies that start- to truly make it valuable. ed through utilising and building on Hadoop have now had significant success. Although Hadoop is clearly a platform as opposed to data itself, the theory is the same, that from utilising something in an innovative way that others have access to, you can make money. With talk of certain healthcare data being made public (clearly there are elements of this that need to remain confidential) many see this as an area for growth. With the mass adoption of universal healthcare in the US and the need to update the NHS in the UK, there is certainly merit to this. Finding ways of manipulating data and creating insight from it can have huge economic benefits for both companies and economies. Look at the way that Google and Facebook have leveraged data to become profitable, if there was a way to lev-
Fraud Detection
Big Data for Fraud Detection Simon Barton Assistant Editor
29
30
Fraud Detection
pockets of social media. Nicholas Mallison, former Director, Head of Forensic Data Analytics at Ernst and Young states that identifying and stopping fraud is always a difficult task. IT ultimately remains a real issue for large organisations, often due to the sophistication of the perIn reference to the recent Mar- petrators of fraud and the comtin Scorsese film ‘The Wolf On plexity of the strategies they Wall Street’, Big Data has been have in place. described as having the abil- The use of Big Data as prevenity to ‘to tame the wolves on tion against fraud has often Wall Street’ and act as a barri- been intrinsically linked to large er against crime. There is little financial organisations but other doubt over whether Big Data industries are increasingly seecan play this role, but at what ing the importance of Big Data in this area. cost to our personal privacy? If the main aim of marketing is to identify normal behaviour, then the detection of fraud is about identifying abnormal behaviour. Unfortunately, for Data Analysts, anomalies operate in a far smaller space than their positive counterparts, which can make spotting them difficult.
Online retailers lose considerable sums of money to fraud. A recent report estimated that, on average, for every $1,000,000 However, having as much in- dollars made in profformation as possible held on it, $9,000 are lost to non-fraudulent actions will fraud. Unfortunatemake spotting the fraudulent ly, this amount barely ones far easier. Once patterns scratches the surface can be established from both f o r online retailers who deal the legal and illegal aspects, w i t h identifying fraudulent activities global through historical data becomes c l i much more efficient. e n Social Networks can act as a tele and platform for crime rings and in work on a turn give fraudsters a chance to less secure collude information, as govern- platform. ments still don’t have anything Big like the amount of coverage Thankfully, they would like over the darker Data is helping to fight this probThe old adage ‘if you have nothing to hide, you have nothing to fear’ is increasingly insufficent for security conscious individuals who would rather not share any information, be it their film preferences or their bank details.
Fraud Detection
lem. In the past, online retailers could only use a subset of data and had to rely on that being an accurate reading for the rest of their customers. Now, online retailers have the ability to analyse all their data and review all of their transactions against a set of defined anti-fraud rules.
their upmost to keep abreast of all these developing trends.
For the individual user, security is an issue that is still very much up in the air. Companies that mine personal data are keen to accentuate the advantages that come with a personalised customer experience but ultimately, The ability to analyse historical if the collection of data can be data is something that can also used to help prevent fraud and go some way to helping large in turn create a fairer marketorganisations identify new fraud place, its use must be looked at patterns. The use of visual ana- as viable strategy going forward lytics is a development that has for both companies and conthe potential to help organisa- sumers. tions derive imperative insights that enable them to pinpoint geographical areas where fraud is more rampant, allowing them to finance anti-fraud initiatives in these areas more readily. Security Analytics in the form of Big Data, can go some way to cutting down unwanted noise and reduce false alerts. This can be achieved by improving current monitoring systems by supplementing them with contextual data and smarter analytics, with the hope that it will increase the validity of any analyses that are constructed. What this all comes down to is that money which should be part of a company’s profits is actually being listed as a cost. Big Data allows companies to be more vigilant and agile to potential threats. The speed in which technology is increasing means that new programmes are available all the time and Data Analysts will have to do
31
32 32
Internet of Things
The Internet Of Things and Data Security Chris Towers Big Data Leader
Internet of Things
The Internet of Things has been a term thrown around like many of the over hyped phrases coined over the last 5 years. Many don’t understand what it is and even more don’t understand what it represents.
have put this kind of technology as being worth over $19 trillion to the economy.
For many this idea seems terrifying, the idea that companies could know this kind of information goes beyond knowing your favourite brands and even goes beyond what you want companies to know about you. I know many people who don’t listen to certain types of music on Spotify because they think it will be embarrassing if it shared with their peers, but with the internet of things, companies will technically be able to see the most intimate details of you and how you use various objects around your home.
ing, not necessarily selling your personal data to companies, but using your data to put certain companies in front of you at the right time.
These are serious numbers and also a serious effect on society as a whole. This kind of work could change the way that firstly The truth is that the internet of we interact with our surroundthings is essentially going to be ings and also how our surroundthe point at which the internet ings interact with us. meets our every day life. This is Google has recently moved into more than just seeing what you this area, buying Nest who are are doing when you are look- one of the key innovators in this ing at old photos on Facebook area, for $3.2 billion. That a comor buying a new pair of head- pany like Google have moved in phones on Amazon, this new to this area is one of the reasons idea is going to monitor what that people are scared of the temperature you like in a room implications. Google are a comand how your sleeping patterns pany who have made their bilare throughout the week. lions through targeted advertis-
This kind of insight is relatively scary given the amount of targeted advertising that we currently receive purely from our actions on the internet. I researched a pair of trainers around 2 months ago and still have This isn’t something that is going advertisements for to happen within 5 years time ei- them appearing on ther or something that will effect websites. If Google had a small number of people. This is access to my movements happening now and estimates outside of the parameters of the
33
34
Internet of Things
internet, how much more would I get? However, Google have anticipated this and after buying Deepmind, a company specialising in machine learning algorithms, they pledged to create a Technology Ethics Board. The idea behind this board was that if they thought that Google was going too far, they had the power to rein them back in and make sure they were acting ethically. There is little known about the board itself, but it was a requirement of the sale of Deepminds that they create this. As Google’s famous motto is ‘Don’t Be Evil’, the ethics board will mean that evil can be defined (often companies who are perceived as ‘evil’ become so through a slow mistaken process rather than any deliberate move) and prevented. If this is a board that can help police one of the world’s most important data holding companies, this approach could be adopted to limit the potential invasion of privacy that threatens to undermine the spread of the internet of things. As Deepminds is primarily an AI company, this may have something to do with the power that they realise that AI and the internet of things will have. The AI part of the Internet of Things is already widely known, as Nest has demonstrated. Nest’s primary products are
thermostats and smoke/heat alarms. When the smoke alarm is alerted to rising levels of carbon monoxide it communicates with the thermostat to turn off the boiler, meaning that the gas is turned off and the chances of carbon monoxide poisoning are gone. This is all done without the need for human interaction and instead shows the use of AI and the internet of things in safety measures. The benefits of this kind of technology is undeniable but threatens to be undermined if there is an aspect of invasive knowledge of somebody’s movements and actions. This kind of information spread could be our generations metadata. My father doesn’t trust Google or Amazon because he feels that it is invasive that they can suggest things that he may like based on his previous actions, something that generation Y has taken for granted. As the current 20-40 age bracket need to be the early adopters in the bell curve for the Internet of Things, this data protection aspect needs to be established early and thoroughly. Especially after the invasion of privacy that has come from the NSA revelations, an effective data governance precedent needs to be set and adhered to before many would be willing to adopt.
Business Analytics Innovation Summit
35
“Gain Insights, Drive Business Planning”
May
21 & 22 Chicago, 2014
INNOVATION
INSIGHTS
PLANNING
ANALYTICS
BUSINESS
For more information contact Jordan Dunne +1 (415) 992 7918 jdunne@theiegroup.com theinnovationenterprise.com/summits
36
Big Data Industry Pioneers On-Demand “Highly focused Expert Content & Practical Solutions for your Big Data Requirements”
Email dwatts@theiegroup.com for more information