Big Data Innovation, Issue 5

Page 1

1

DATA LAUNCH PAD

We look at how pricing is allowing data to launch in smaller companies


2

Letter From The Editor

ยง

Welcome to this issue of Big Data Innovation. We appreciate all of the emails since the last issue showing your appreciation for the magazines and the kind of content that we are producing. In this issue we are discussing everything from the state of big data education through to the recent Gartner report and its effect on the industry. This month we have also seen the NSA and GCHQ data scandal hitting headlines again. Senior members of both the US and UK governments have come out to express their disappointment at how information has been collected. This could well be the wake up call that the rate of technology development could have negative results in the long term if it is not correctly policed. Companies have a role to play in this, making sure that the ways in which their data is collected is both ethical and transparent. People are increasingly worried about the ways in which their data is being collected and used, transparency is the way to alleviate these fears. This may not always be possible, but if there is a backlash against data collection then this will not only effect the ways in which companies are using their data, but how society views big data in general. It is up to those using these technologies to make sure that their data collection is ethical and that the individuals who's information is held are aware of the benefits. We hope that through publications like this, that best practices can be shared and we can make sure that Big Data grows in to the game changer that we all know it can be. As always, if you are interested in advertising or writing for the magazine, contact me at ghill@theiegroup.com

George Hill Chief Editor

Managing Editor George Hill Assistant Editors Joanna Giddings Chris Towers President Josie King Art Director Gavin Bailey Advertising Hannah Sturgess

hsturgess@theiegroup.com

Contributors David Barton Chris Towers Tom Deutsch Heather James Claire Walmsley General Enquiries ghill@theiegroup.com


3

Contents

4 8 11 16 20 24

S

David Barton looks at how Pamela Peele, Kirk Borne and Gregory Shapiro view big data education Chris Towers looks at how companies are bridging the big data skills gap We look at how pricing is allowing big data to launch in smaller and smaller businesses Heather James interviews Stephen Wolfram, the mind behind Wolfram Alpha and the mathematica language Claire Walmsley talks about how making data’s usage more transparent will help the industry as a whole Tom Deutsch discusses the importance of baking data into your products


4

Big Data Education David Barton Big Data Leader

In the famous words of Tony Blair when setting out the most important aspects of how he wanted to run the UK - “Education, Education, Education�. We are seeing that with the increasing numbers of companies now looking at implementing big data, that one of the most important aspects that will allow this to flow seamlessly is through big data education and the effective use of skilled labour. Due to the complexities involved in the education process and the incredibly speed in which the industry is moving, there have been question marks around how effective this currently is. I spoke to three of the industry's leading big data experts about their thoughts on the current state of big data education and how it could be improved.


5 Kirk Borne Professor of Astrophysics & Computational Science George Mason University Kirk's view on this is that there are two perspectives that need to be looked at in order to effectively assess current big data education initiatives. 'The phrase that I use with people is that it's an education in data as well as data in education' Data in Education: One of the things that Kirk believes is that from a young age data should be included heavily in education, as regardless of your future profession, it will be used in one way or another. For instance it can even be done at kindergarten level, the ways in which toys are sorted by colour, type, size or shape are all forms of data siloing. Using this kind of technique early where children can identify and explain why certain things are in certain areas forms a strong foundation to add more complex ideas on. Education in Data: This initial education throughout earlier school opportunities will also allow the education in data aspect to be more thorough and successful. What many lecturers currently find is that people come into higher data education with a gap in understanding, with some teachers actually saying that students don't know what 'data' is. The need to teach people these aspects of data throughout their lives will be vital to improving education and closing the skills gap. Many, when looking to data for business solutions want to find an all encompassing data scientist. Kirk believes that this is not always necessary however. A business team is like any other team, you have different people in it to do different jobs. Kirk believes that companies who are looking for the complete package data scientists can avoid doing this by looking at this concept. Sure there are 'all star data scientists' around, the ones who know about the algorithms with the business, sales, strategy, finance and can run almost as a department in themselves, but they are like all stars everywhere else. Rare.


6 Gregory Piatetsky-Shapiro Analytics/Data Mining Expert, Editor KDnuggets "There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions." The report predicts that this kind of skills gap will exist in 2017, but Gregory believes that we already seeing this. Whilst using Indeed.com to look at what expertise companies are looking for, Gregory found that of the top 10 job trends both Mongo DB and Hadoop appear. "Big Data is actually rising faster than any of them. This indicates that demand for Big Data skills exceeds the supply. My experience with KDnuggets jobs board confirms it - many companies are finding it hard to get enough candidates." There are people responding to this however, with many universities and colleges recognising not only the shortages, but also the desire from people to learn. Companies looking to expand their data teams are also looking at both internal and external training. For instance companies such as EMC and IBM are training their data scientists internally. Not only does this mean that they know that they are getting a high quality of training, but that the data scientists that they are employing are being educated in 'their ways'.


7

Pamela Bonifay Peele Chief Analytics Officer UPMC Whilst other industries such as finance, insurance and retail are also feeling the pinch in terms of the numbers of qualified and experienced data scientists, healthcare has been hit even harder. The reason for this, according to Pamela, is "in healthcare whilst it is somewhat transactional delivering services, the service isn't exactly the same because the consumption and action of service varies by patient so its much harder to deal with health data than transaction pieces which are claims or transaction data." Of course, the only real way around this is through the ways in which we are educating graduates. Pamela believes that at PHD level, the graduates that come through the system are good, however at bachelor degree level, there could be some improvement. However, this may be a changing trend as in the US especially we are seeing universities making investments in their big data, analytics and statistical courses. This will hopefully see an improvement in the quality of their statistical bachelor degree graduates. One of the ways in which healthcare companies have tried in the past to make up for the dearth of healthcare centric analytical talent is transformation. This is done through adapting either technical thinkers to healthcare or healthcare thinkers to become more technical. The issues that this creates is a bias towards one side of a role that should be balanced.


8 8

Bridging the Big Data Skills Gap Chris Towers Assistant Editor

The big data buzz over the past two years has created a thirst for technical skills amongst thousands of companies. The success of early adopters such as Google and Facebook, who's primary income drivers derive from big data, has caused business leaders to sit up and take notice.


9

Therefore companies should be looking to create a data science team. By identifying what you need to use the data for and who will be the likely recipients, those with necessary individual skills can be So with companies now look- brought in to create functioning at big data implementa- al teams. I know of companies tion, how can they bridge the who employ journalists within skills gap without compromis- their data team to help coming the quality of their analy- municate findings and business consultants to streamsis? line integrations. Data science Building a team teams need to be viewed like When discussing with those factories, where to get the in the industry, the question end product you need to have I often pose is what skills are several different aspects putneeded to really succeed in ting it together. data science? In addition to Changing places technical knowledge around Hadoop, stacks, SQL and There is a crossover between other technologies, most say many current roles that exist in organisations and the skills business understanding. needed to become an effecThe idea behind this is that tive data scientist. The prime a data scientist should be example of this would be web more than just a data user, developers. they should be a true catalyst to business change with the Although CSS and HTML do business knowledge to imple- seem like relatively basic codment these new ideas across ing languages, in reality the crossover between these and the corporation. manipulation of data is strong. Although this may be ideal, The creation of stacks is esfinding somebody like this is sentially code manipulation, almost impossible. All people something that they do within will have strengths & weak- their current roles. Due to this nesses and data scientists with some additional training are no different. They may be they could technically start a brilliant at data mining and data science program. analysis but may be weaker By plucking these people from on communicating finding. The demand has pushed up the potential salaries of those currently working in the area, whilst also creating a gap in the supply of data skills required across the wider business community.


10 their current roles and training them through external companies such as Cloudera, there is likely to be not only the technological understanding but also a wider business knowledge. External companies Sometimes companies over invest in aspects of their business which in reality can be outsourced. The same is true of data management and analysis. Of course outsourcing is not always possible, due to confidentiality and legal issues surrounding some data. However with the majority this can simply be outsourced to a company who have the experts there already. Letting another company do the leg work for your data makes perfect sense. Having experts working on your data who aren't on your payroll also means that you do not need to try to find a qualified candidate.

As a prime example of the difficulties around this, Kirk Borne, one of the early pioneers of modern big data, says that the roles have now been reversed in the past 10 years. A decade ago there would be one job for one hundred relevant graduates, today there are one hundred jobs for one graduate. Avoiding the time and money spent on recruiting in such a competitive market allows these to be reinvested in implementing the findings from the outsourced data. The skills gap is something that everybody in the industry is well aware of and until we have the number of graduates to match the number of jobs, it will be an issue. The gap might grow or shrink, but at the moment companies need to find ways to avoid falling into it.


11

How Pricing is Allowing Big Data to Launch George Hill Chief Editor

A recently released Gartner survey claimed that 65% of companies are undergoing some kind of big data initiative in 2013. This is an impressive number considering that three years ago big data was almost unheard of outside of the analytics community.


12

hard-drive. The difference between these two forms of database is astounding, as several steps within the loading and processing are skipped. This means that information stored through in-memory The simple answer is the price. data can be utilised as much as 450 times faster than data Only a few years ago, the held in a traditional database. technology needed to properly analyse a terabyte of data This use of cheaper systems was very expensive. The pro- to implement this powercessing speed needed within a ful analytics technique, has system would have cost hun- meant that even startups can dreds of thousands of dollars, realistically utilise the system making it unaffordable for all in order to run big data programmes. but the largest companies. In 2013 this figure has So combined with the technoldropped as low at $25,000 ogy prices falling dramatically, we have also seen the infor the same technology. creased use of free and open Complex and potentially vola- source software increase. tile databases can now be run through in-memory processes The community aspect of proas a result of this price drop. grammes like Hadoop has not This is the process in which a only meant that thousands of database is stored through people can help to improve the constant use of RAM the product daily, but also as opposed to storage on a means that it can stay free. Given that there is now a large number of companies who are using or considering big data, why are they adopting it now? Is it the hype? Is it the increased numbers of candidates with the correct skills?


13

Everybody is using Hadoop or similar systems, not necessarily because it is free, but because it is one of the best. The ability to have one of the top performing softwares available for no cost, combined with the cheap technology that is now available has made big data accessible to companies, making programmes a feasible idea.

Through the cloud.

The reduction in price means that it isn’t only large companies who can create their own big data programmes, but start-ups can now be formed and realistically create these kinds of technologies themselves in order to service others. With skilled and entrepreneurial people having the ability to use these technoloWhen we are looking at these gies, combined with the abilproducts however, we are not ity to use the cloud for data factoring in one crucial com- transfer, these skills can be ponent. This is that in order to truly shared. This is why comeffectively utilise a big data panies such as Qubole, can be system, analysts and data used to not only work through data analysis, but can do so scientists are needed. In reality, having good soft- with better technology and ware and good hardware can with a faster turn-around. get a company to a certain Bandwidth is widening each point, but to create truly ac- year and the decreasing price tionable initiatives from their of super fast broadband has data, companies need to be meant that outsourcing big able to drill down and notice data initiatives is often now patterns. This is something the cheaper option, despite that only a person who is edu- the decrease in price for doing cated and experienced would it in house. really be able to do. This has created a situation So how are companies getting around this?

where companies have the ability to bridge the big data skills gap without the pressure


14

to bring a full time analyst on to the payroll. Due to the decrease in price and the ability for smaller companies to start using this technologies, companies can now go to outsourced and qualified data scientists.

the sake of big data.

This kind of change within the big data system is huge and has the potential to revolutionise the way that companies use their big data programmes.

Suddenly big data can be outsourced to truly qualified and driven professionals, which will see the exponential growth curve continue to grow.

With the big data skills gap potentially throwing spanners in the works for many companies, this fast technology combined with super fast broadband will allow data analysis to be outsourced, truly creating an environment where big data companies can exist for

This decrease in technology price will have a profound effect on the industry, and with prices looking to decrease even further in the next few years could spark even more change.


15

FOLLOW US @IE_BIGDATA

SUBSCRIBE BIT.LY/BIGDATASIGNUP


16

Big Data Innovation with Stephen Wolfram Heather James Big Data Innovation Summit Curator

At the Big Data Innovation Summit, Boston in September 2013, Stephen Wolfram took the stage to deliver a presentation that many have described as the best amongst the hundreds that took place over the 2 day event. Discussing his use of data


17

and the way that his Wolfram Alpha programme and Mathematica language are changing the ways that machines utilise data, the audience was enthralled. I had initially organised to sit down with Stephen immediately following his presentation, but I was forced to wait for several hours due to the crowds surrounding him as soon as he finished. The 20 people surrounding him for an hour after his presentation were testament to Stephens achievements in the past 25 years. Having spoken to others around the conference the most common adjective was 'brilliant'. During the afternoon I did manage to sit down with Stephen. What I found was a down to earth, eloquent man with a genuine passion for data and the way that we are using it as a society. Stephen is the CEO and founder of Wolfram Alpha, a computational knowledge engine designed to answer questions using data rather than suggesting results like a traditional search engine such as Google or Bing. Wolfram Alpha is the product of Stephen's ultimate goal, to make

all knowledge computational, being able to answer and rationalise natural language questions into data driven answers. He describes it as 'A major democratisation of access to knowledge', allowing people the opportunity to answer questions that previously would have required a significant amount of data and expert knowledge. According to Stephen the product is already been used everywhere from education to big business, it is a product on the up. Many will claim that they have never used the system, however anybody who has asked Apple's SIRI system on the iPhone a question will have unwittingly experienced it. Along with Bing and Google, Wolfram Alpha powers the SIRI platform, enabling users to ask questions in standard language and translate this into data driven answers. What Wolfram Alpha is doing differently to everybody else at the moment is taking publicly held knowledge and using it to answer questions rather than simply showing people how to find the information. It allows users to ingest the information that others have


18

found to find interesting and around for a long time, things like social media and machine deep answers. Stephen has a real passion data has allowed this shift to for data, through not only occur. He only sees this trend Wolfram Alpha and his mis- continuing with increasing sion to computate knowledge, amounts of machine driven but also on a personal level. sensors collecting data.

With the use of data at Wolfram Alpha now hitting an all time high, I was curious about where Stephen thought big data would be in 5 years time. He believes that the upward curve will only continue, personal analytics will become part of a daily routine and this will only see the amount of This change in the mindset of data increase. society as a whole to a more He also sees the use of scidata driven and accepting so- ence and mechanics having a ciety is what Stephen believes profound effect on the ways to be the key component to in which companies utilise Wolfram Alpha now becoming their data. We will see analwhat it is. Stephen says that ysis looking at more than just he always knew that there numbers, but also putting givwould come a time when soci- ing these numbers meaning ety had created enough data through scientific principles. to be able to make Wolfram Overall, what I have learnt Alpha viable and that time is from talking to Stephen is that now. data is the future in more than He is the human who has the record for holding the most data about himself. He has been measuring this for the past 25 years and he can see this becoming more and more popular in wider society, with wearable personal measurement technologies become increasingly popular.

This is testament to how far we have come as an industry that we can now power something like Wolfram Alpha through the amount of data that we have now recorded. It is a real milestone in the development of a data driven society.

just a business context. Software that allows people to mine data without realising they are even doing it will be important to development of how we use information.

Wolfram Alpha is changing the data landscape and with The reason for this according the passion and genius of to Stephen is that many of the Stephen Wolfram behind it, key data sources haven't been who knows how far it could go.


19

On-Demand Business Education

www.membership.theiegroup.com


20

Data Transparency Claire Walmsley Big Data Expert

Recently companies have received a bad reputation about how they are holding individual information. There have been countless data leaks, hackers exposing personal details and exploitation of individual data for criminal


21

activities. The world's press has had it's attention drawn towards data protection and individual data collection through the NSA and GCHQ spying scandal. Society in general is becoming more aware of the power that their data holds and this combined with the increased media attention, has led to consumers becoming more data savvy. Companies like Facebook and Google have made billions of dollars through their efficient use of data and are now looked at warily by many. Al-

though major data secrecy violations are yet to occur at either organisation, the reality is that people know that data is held about them and need to trust the company who is keeping it. So how can companies become more trustworthy with their customer data? One of the keys to success within a customer base is trust and the best way to gain this is through transparency. Allowing people to see what


22

kind of information that they have held on them by any particular company creates trust. By outlining exactly what is held on people will create an understanding of what the information is used for. A sure fire way to lose trust is through the 'if you don’t ask you don’t get’ use of data collection visibility. This is the idea that when reading complex or overly long agreements the data protection aspects are available, but not implicitly stated. In reality this is much of what has happened in several cases, with information management details being buried in small prints, so although technically accessible are in reality not effectively communicated. The best way to circumnavigate this is to make it clear, send an email, have a separate section or even a blog that is outlining how data is being used and why. It is very seldom that people are having their data used in manipulative or sinister ways, making them aware of how their data is improving their experiences will make an audience far more receptive to it being used. At the moment there are ways that you can check on certain

elements of how your data is being used. Using a google account you can see what Google has matched to your here: google.com/dashboard/ This allows you to see who Google presumes you are based on your browsing history and what ads are therefore targeted towards you. It is often interesting to see what your actions online say about you. This detail is a move in the right direction for companies but still has an enigmatic feeling that there isn't total transparency. With the pressures of data protection surrounding most companies today, this kind of move would allay many of the fears that consumers currently have when their data integrity is in question. What the industry needs today is consumer trust and transparency is one of the key components to achieving this.


Datagility for the millennial enterprise

23

www.virtusa.com Virtusa offers data expertise, through its complete gamut of analytics solutions, and caters to dynamic decisioning needs • Claims analytics • Customer analytics • Healthcare analytics • Structured and web content convergence analytics • Big Data analytics • Social media, mobile and cloud analytics Talk to our experts at Booth #3


24

Baking Data into Your Core Product Tom Deutsch Big Data Solution Architecture IBM There was a good article on Gigaom recently that I though deserved some additional attention here. The article was focused on building data science into your product offerings rather than trying to “bolt� on the data science aspects afterwards. The key take away from the article was this:


25

For startups, data science should not be seen as a separate scientific initiative but as an integrated part of the product. Speed and efficiency are key factors to burgeoning companies; hiring and building out a team of data scientists, or more aptly named “data product engineers,” is paramount. Once you accept that data science is about building data products, you will see that your data engineers, contrary to popular belief, do not need PhDs. Instead, they need to be able to integrate into the core of your product and engineering organization.

that this notion isn’t only valid for startups. In fact I find the notion that the advice (which to be clear is good advice) is somehow typed to startups pretty goofy. Baking analytics into all your products should be something all firms do – full stop. So how do you actually do that? Well it starts with rethinking what the product actually is.

The tendency for most firms is to think of a product as a fixed thing where tailoring to a user is done in segments and only on the edges of the product. Think of your typical web page; 90% of it is completely standard and the parts For those that know me that change are largefrom my monthly column ly generic. That is legacy at IBM Data Magazine it thinking and is not going to probably won’t be shock- keep your customers ening that I am going to argue gaged. Instead think of the


26

product as a variable thing driven by user interactions in a segment that is designed to change, designed to flex in real-time based on the analytics and data science that is built into the product. The user needs to shape the experience and the content of the product as they interact with it – it needs to be contextual, relevant and as unique to the person doing the interacting. Now some of you may be wondering at this point how a product can do that, and of course it can’t unless you extend the notion of the product to include the underlying platforms that support it. This is a key point – historically we’ve built products and they simply ran on top of a platform. Going forward the platform capabilities are a core part of the product, and that means exploiting a ‘Fit For Purpose’ approach to architecture (link to article

please) so you are from day one thinking about how the right dynamic experience is built from the ground up. This approach will surface data and analytics needs that can run in Customer Time (link to article please). It has a notion of a closed loop analytics process where interactions are recorded, experience tweaked, interactions are recorded and rise wash repeat. This approach will built experimentation and A/B testing into the core design. We’ll pick this up in more detail in a future post. Until then thanks for the ideas and comments.


27


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.