Big Data Innovation, Issue 14

Page 1

Issue 14 | theinnovationenterprise.com

BIG DATA JOBS SPECIAL 2015 is likely to be the year that many companies make their first move into BIg Data. Inside we have a full section on advice for both Data Scientists and the companies looking to hire them.

25

Statistics Are Dead, Long Live Big Data Lee Baker tells us why he think that statistics are no more. 12

Big Data In Startups Big Data is not just for big business, but how do startups begin in data? Published By


LETTER FROM THE EDITOR Welcome to the first issue of 2015. As we begin another year with Big Data, we are again likely to see an increase in the number of companies looking to implement new data programmes. This requires personnel and expertise, something that has been lacking within the industry since its inception. However, we are seeing an increasing number of Data Scientists who have been able to find vacancies. This has meant that companies who have never hired for a data science position are interviewing Data Scientists who have never had a data science job. To help this, we have brought you a section dedicated to finding a job in data science and how to find the right candidate for any vacancies you may have. These include articles on how Big Data is creating jobs, how to interview a Data Scientist, the 10 questions to ask in the interview and much more. In addition to this Lee Baker asks if statistics are dead, and if they are has data science replaced them?

We also look at how to begin a data programme at a startup, whether you are using Hadoop properly and if Hadoop is better on-site or in the cloud.

Managing Editor: George Hill

As always, if you are interested in contributing or have any feedback on the magazine, please contact me at ghill@ theiegroup.com.

Assistant Editors Simon Barton

George Hill Managing Editor

Designer: Oliver Godwin-Brown

Are you are looking to put your products in front of key decision makers? Contact Giles Godwin-Brown at ggb@theiegroup.com for more details

Art Director: Joe Sanderson

Contributors: Chris Pearson Josie King Gabrielle Morse Chris Towers Heather James David Barton Lee Baker General Enquiries: ghill@theiegroup.com


2015

CONTENTS

12 HOW TO INTERVIEW A DATA SCIENTIST Chris Pearson talks us through how you should approach your first interview with a data scientist

5 10 QUESTIONS TO ASK A DATA SCIENTIST What are the top 10 questions to ask a Data Scientist? Find out which examples we believe to be the best

19 ARE YOU USING HADOOP PROPERLY? Hadoop is often not used to its full potential, are you making the most of it?

15 BECOMING THE BEST DATA SCIENTIST YOU CAN BE Becoming a Data Scientist is a relatively simple path, but how do you become the best?

22 BIG DATA IN STARTUPS Big Data is not just for big business, but what should startups be doing in order to implement?

8 HOW BIG DATA IS CREATING JOBS Many think that technology is taking jobs away, but it turns out Big Data is creating them. But how?

25 STATISTICS ARE DEAD. LONG LIVE BIG DATA Has Data Science killed statistics? Lee Baker gives us his perspective.


BIG DATA JOBS SPECIAL


10 QUESTIONS TO ASK A

?

?

? ?

DATA SCIENTIST

?

Josie King Managing Director Innovation Enterprise


6

10 QUESTIONS TO ASK A DATA SCIENTIST

Many companies are going to use 2015 as the year to launch their data science programmes. Many more are moving beyond the one man team that they may have had up until this point. This will often require people who are inexperienced or completely new to Data Science interviewing for roles that require an in depth knowledge. This can often be difficult for both the interviewer and potential employee. We have thought about what needs to be asked in an interview and how to identify strong candidates. Below we have earmarked the 10 most important questions to ask, and why:

1

What do you do outside of work?

2

What hobbies do you have?

3

What do you think is the best way to help a work-mate?

The key to hiring the right person is not just about what skills they have and how they can use them at your company, but about how they will fit into your culture and how they will work with others already there. This could be anything from having a sense of humour to being able to accept direction and give help when needed.

4

What resources do you use to keep up with the latest data trends?

5

Which current trends are going to have a lasting impact on how we use data?

Another key aspect to investigate through these questions is how proactive they are in learning around new trends within the industry and how they will adapt to the fast paced evolution of data science and the data landscape as a whole. The best answers to these will require a certain knowledge of the potential responses by the interviewer. These are constantly changing and the best resource for one particular aspect of a role may be less useful for another. Knowing the potential answers in advance and knowing which is the most relevant to the company is what will make the best candidate.


7

10 QUESTIONS TO ASK A DATA SCIENTIST

6

In a perfect world, what would be the ideal data ecosystem for you to use?

7

Do you think that the best results can be attained from Hadoop alone?

8

When does siloing data make sense?

There also needs to be some technical questions that are not simply ‘do you know how to use a certain technology?’. These questions allow the candidate to elaborate on the skills they have and the breadth of their knowledge around the technological aspects of the role. They are also important for the company that are recruiting as they can review whether the specifications being set out are achievable or whether they fit with what is already at the company. It is also important to discuss how data is being stored and accessed, as this will undoubtedly be the main activity for most Data Scientists.

9

You find that company data has been accidentally leaked online, if you report it, it will hurt the company, but you can easily cover up the missing data and nobody will know, which do you do and why?

10

From the research you have done on the company, what do you think your work would contribute towards?

These questions have two purposes, integrity and wider business knowledge. Part of being a Data Scientist is always going to be maintaining the integrity of the data you hold and taking responsibility for its safety. Through giving a question that poses a moral question as well as a wider business impact, it means that they are forced to consider it from two perspectives. They need to say more than just ‘it is the right thing to report it’, question 9 should also make them respond to the wider business implications on reputation etc, that lost data can create. In addition to that, question 10 looks at how much preparation they have done for the interview and also their overall business acumen. This should give a strong indication of both work ethic and how they would approach the role. These questions should be used as a guide and create the basis of any data science interview. Each company will be looking for a different type of person who will work best within their team. Due to this, those interviewing should be preparing what they want to hear in advance, to make sure that they are looking out for the best answers.


HOW BIG DATA IS CREATING JOBS Heather James Big Data Evangelist


HOW BIG DATA IS CREATING JOBS

9

Many claim that increased use of technology decreases the number of available jobs as automation means that people are no longer needed for labour intensive roles. What we are finding now though, is that Big Data is actually creating jobs around the world. So how is this the case? The roles directly relating to Big Data are predicted to be around 4.4 million across the world. This could be anything from basic server upkeep to data science innovators, but the numbers reflect the spread of data across almost every country in the world. As more companies look at their data efforts, this number is likely to increase and we are going to see a significant growth on these already impressive numbers. The question is, what will be the implication of data in 2015?

“

The roles directly relating to Big Data are predicted to be around 4.4 million across the world

2015 is likely to see a marked increase in the number of people involved within Big Data directly. This is because companies have realized that a single person department for data simply doesn’t work unless you have a data superstar. This means that different personnel are needed in order to make the most of the data that’s available and communicate its importance across the entire company. This will mean that less technical employees will be brought into the teams to supplement and maximize how the data is being used. This is likely to be graphic designers for visualizations, marketing professionals for communication and in some cases even journalists to help communicate findings in a relatable way to other members of the company. These non-technical roles (from the actual data processing

capacity) are potentially only the tip of the iceberg in terms of hires that are due to data. Companies who are utilizing new data ideas have been shown to outperform and out-grow their competitors, meaning that more jobs will become available in all areas of these companies. This could go from extra cleaners needed to service the larger offices, to additional members of the board as more departments require representation. Even at larger companies that are unlikely to experience


10

HOW BIG DATA IS CREATING JOBS

This will mean that less technical employees will be brought into the teams to supplement and maximize how the data is being used

considerable growth, the reality is that through the use of data and the monitoring of particular departments, the delegation of labour will mean that more people will be brought in. Data will be able to predict growth areas in the business or where labour may be needed in the future even if overall growth in personnel may not be significant. Unlike many technical innovations in the past 50

Companies who are utilizing new data ideas have been shown to outperform and out-grow their competitors, meaning that more jobs will become available in all areas of these companies.

years, rather than automating roles and reducing the opportunities for people looking for jobs, Big Data may well create rather than destroy possibilities. It is a strong indication of the power that data will have moving forward, let’s hope that it can live up to expectations.


ie.

Apache Hadoop Innovation Summit

&

Data Science Innovation Summit

Bola Williams + 1 415 692 5378 bwilliams@theiegroup.com http://theinnovationenterprise.com


Chris Pearson Partner and Co-Founder, Big Cloud

HOW TO INTERVIEW

A DATA SCIENTIST


13

HOW TO INTERVIEW A DATA SCIENTIST

You know the feeling. You’re sat across the table from an interviewee and the conversation starts to run away from you. Your eyes start scanning the room as they’re reeling off another jargon-crammed answer. You then notice the scratch on your hand that your 4 year old gave you, in exchange for closing the laptop during that far too familiar One Direction song. The only thing you’re not noticing are the answers the other person in the room has been explaining to you. If this typically happens to you most when you’re conducting technical interviews, it could be either that the person you’re interviewing is talking in a language not known to you, or you simply don’t have the right questions to ask that cut through the techno-babble and get to the point of why they’re sat there in front of you.

The polar ice caps of the ‘Data Scientist Age’ have well and truly melted, so if you are one of those countless companies who are currently looking for one, there’s a chance you’re going to have one of these interviews.

It’s worth pointing out that if you find someone who has nailed all of the above questions and you have that gut feel that they may do wonders for your business, please don’t get too precious about culture, team fit, etc.

The subject of this article probably won’t offer any value to those of you reading who know your way around a Random Forest, or a Neural Network, but it’ll hopefully give a few pointers to those of you who don’t. As I mentioned earlier, the age of the Data Scientist is well and truly here. This means that there are highly talented geniuses in our population who, through the development of an algorithm and the implementation of some code, can change the landscape of an entire organisation. This also unfortunately means that there are lots of people out there who want you to think that they are one of those geniuses too, making it difficult to tell the difference. This means that it’s important that you have the skills to understand who are the real Data Scientists. Having spent the last year interviewing a large number of Data Scientists, I’ve developed a simple set of questions that help me to understand the what, the why and the how of what they do.

The What At the very start of the interview, I normally like to get a general feel for who I’m speaking with. ‘What projects are you most proud of?’ ‘What contributions have you made to the businesses you’ve worked for?’ ‘Can you describe in detail the responsibilities you’re looking for in your next position?’

It could be either that the person your interviewing is talking in a language not known to you, or you simply don’t have the right questions to ask that cut through the techno-babble When interviewing someone involved in post-doc/PhD projects, I’ll always look to get an understanding of the projects they’ve encountered there, however, it’s really important to understand the time constraints they’ve been under, as typically, someone solving problems in academia may be given far more time than someone in the commercial world. The Why The best Data Scientists I’ve worked with have all had substantial expertise in a particular domain, or at the very least, will understand what the business impact is of the work they have doing. How can you solve the problems the business is


14

facing, if you firstly don’t understand the business itself? The same sometimes goes for domain too. However, it’s very important to note that a good Data Scientist will always be good at problem solving, no matter what domain they’re working in, so don’t get too hung up if the person you’re interviewing doesn’t work in your exact industry. I’ll typically ask questions like, ‘What were the business outcomes of the projects you worked on?’ ‘Give me an example of when you’ve thought about the businesses product?’ ‘Tell me about a time when you’ve improved a business process?’ The How This is the part of the conversation where I’ll begin to understand whether or not the person I’m speaking with is the

HOW TO INTERVIEW A DATA SCIENTIST

person my client is looking for. It’s very important to note that there are lots of people with the title ‘Data Scientist’ who can’t write Machine Learning algorithms, can’t code or both. What you may find is that some Data Scientists are using Machine Learning libraries that have been written by other people in the team, filled with algorithms they use like tracing paper. ‘Can you give me an example of when you’ve written a unique algorithm?’ ‘Can you give me an example of when you’ve developed an algorithm from a framework/research paper?’ ‘Can you give me an example of when you’ve written/ implemented your own code?’, these are questions that help you understand if you’re speaking to someone who can create things from scratch,

as these are typically the ‘A Players’ you’ll want to hire. It’s worth pointing out that if you find someone who has nailed all of the above questions and you have that gut feel that they may do wonders for your business, please don’t get too precious

The best Data Scientists I’ve worked with have all had substantive expertise in a particular domain, or at the very least, will understand what the business impact is of the work they have doing about culture, team fit, etc. Don’t get me wrong, these things are important, but people like this can be incredibly hard to find.


BECOMING THE BEST DATA SCIENTIST YOU CAN BE Gabrielle Morse International Events Director


16

BECOMING THE BEST DATA SCIENTIST YOU CAN BE

Data Scientists have the sexiest job in the world according to Harvard Business Review.

the best Data Scientists need to have:

communicating changes to others in the company.

Curiosity

Patience

We have seen a significant shortage of well qualified Data Scientists, which has meant that those within the industry are demanding salaries that most could only dream of.

The first thing that is needed is an inquisitive mindset. This will allow people to not only find hidden trends and patterns, but also mean that you are more likely to experiment outside of the confines of a degree or course.

As well as being agile and flexible, it is important for Data Scientists to be patient and have the ability to wait before trying to identify trends and correlations. Often the best results come from larger datasets, which take time to create. This means that there needs to be patience, and conclusions should not be jumped to on incomplete datasets.

With this in mind, there is a clamour to become a Data Scientist from those who want to have these big salaries and work for the best companies. So how do you become a Data Scientist? What does it take and how do you start? The road to being a Data Scientist is relatively simple in terms of qualifications. Several universities are offering courses with data science skills as the core component. These vary in what is required to enter, but as a rule of thumb, a good maths qualification, some coding skills and an interest in statistics are the minimum requirements. From there, many companies are looking for Data Scientists at the moment and moving in to a junior position on a team would be relatively easy. But at this point, when you are a Data Scientist, how do you become the best? We believe there are some key traits that

Agility The next aspect is agility and flexibility. Although this is not a vital aspect to become a Data Scientist initially, in order to become the best or just keep up, you need to be able to adapt to new scenarios. New kinds of data is being mined all the time and new technologies are being created to speed up and improve data processing. This means that if you are not staying abreast of all the latest developments within the data science space, you will be left behind. One of the ways that people are staying updated on these developments is through helping in the development of these new technologies. Looking at open source software like Hadoop, becoming involved in the development of this has significant benefits to not only developing a knowledge of new products, but also in

This is especially important if you are starting a new data programme at a company. It will take time to gain buy-in from certain elements of any company, it is important that Data Scientists are patient with them and not reactive. It requires a certain degree of democracy, but will ultimately results will bring those who may be against it.

“

If you are not staying abreast of all the latest developments within the data science space, you will be left behind


BECOMING THE BEST DATA SCIENTIST YOU CAN BE

Analytical Mindset

Perspective

This seems obvious, but goes well beyond the analytical aspects that are required as the base needs for a Data Scientist.

Having a perspective wider than just the dataset you have in front of you is vital to become successful as a Data Scientist. The ability to take what you know and apply it to business problems, can only be done if there is a perspective of what the business wants and needs.

There needs to be constant analysis of how things could be done better, how data could be better stored and how the company could perform better with it. It is beyond looking through customer or sensor data to identify trends and instead look at aspects of the company that may be affected by data and how these could be improved.

“

Having a perspective wider than just the dataset you have in front of you is vital to become successful

This could be anything from noticing that more people from Canada are looking at a certain product and therefore focussing advertising there, or that more people are clicking on a certain colored

17 button. These kind of findings are not useful in isolation and may well be missed by other departments, therefore perspective of how the company as a whole could utilize your findings is vital to success.


Whitepapers

Reach a targeted, localized and engaged community of decision makers through our customizable suite of online marketing services.

+1 (415) 692 5498 US +44 (207) 193 0386 UK

ggb@theiegroup.com

@IEGiles


ARE YOU USING HADOOP PROPERLY? David Barton Head Of Analytics Innovation Enterprise


20

Although 2014 saw the buzz around Hadoop grow even further, companies need to make sure they are not simply keeping up with the neighbours, but actually utilizing Hadoop to take full advantage of its capabilities. We have seen that companies have not been scared to deploy Hadoop protgrammes, but this often simply reflects a willingness to achieve costs savings on the processes they are already implementing. This means that the full potential that Hadoop possesses is not realized. A focus on looking at the issues that they are currently facing, and solving them through old means, can lead to people becoming frustrated with the promise of Hadoop not being fulfilled. For instance, if you were to run a simple visualization through Hadoop, it would make little difference to

ARE YOU USING HADOOP PROPERLY?

the outcome. This would make employees question why the changeover was merited when the older system could do the same work. To help make sure that full communication about the benefits of the new systems are effectively shared amongst key stakeholders. This may seem like a minor point, but it is something that needs to be maintained throughout the process. If it is explained at the start, then during the hardest parts of the implementation, doubt will creep in and must be dealt with at source. This can only ever be done through talking to the people involved, answering their questions and frequently demonstrating benefits.

“

The true power of Hadoop is in its ability to undertake tasks that are far more

complicated than legacy systems are capable of. This includes advanced modelling, machine learning and data mining beyond the capacities that many systems have had before. It has the potential to truly revolutionize the way companies look at, and interact with data.

If you were to run a simple visualization through Hadoop, it would make little difference to the outcome


21

ARE YOU USING HADOOP PROPERLY?

Hadoop creates systems where it becomes possible to be proactive with data, rather than simply displaying the data you have. This helps you to find new patterns and trends that may have been invisible to you through your original data processing methods. This transformation is about more than simply implementing Hadoop, though. It requires effective training and the correct people to make sure that it is being optimized and fully understood. If this is not the case, it will again have limited success, meaning that the overall value to the business compared to existing systems would hinder its future progress.

So as we move further into 2015, there are two key aspects to look at when implementing Hadoop:

Hadoop creates systems where it becomes possible to be proactive with data, rather than simply displaying the data you have ake sure the processes that M you are using Hadoop for are truly making the most of what it can do. Are you truly leveraging the power of Hadoop or are you just putting what you have always done on it? This is the key question to ask, and if the answer is doing the same, then

you need to look at why you have spent the time, money and effort to implement Hadoop. It is then important to look at what Hadoop can do and aim your programmes to achieve it. Having basic programmes and processes running through Hadoop is like having a Ferrari but only driving it at 20mph.

• Have good people in charge of it, to give you the best chance of utilizing it effectively. Hadoop is only as good as the person using it. Having a Data Scientist who can maximize the potential that

Hadoop has, should be the absolute minimum that a company should be aiming for. A thorough knowledge of how

Are you truly leveraging the power of Hadoop or are you just putting what you have always done on it? to use Hadoop is the only way that you could look to make Hadoop a true game changer within any company.


BIG DATA IN STARTUPS -How Do You Start? Chris Towers Big Data Divisional Head Innovation Enterprise


23

BIG DATA IN STARTUPS - HOW DO YOU START?

Does Big Data only work for big companies? This is a question that many have asked in the past and one that is clearly important today. The startup scene is bigger than ever and with the influx of money and talent into new companies, it is only likely to continue. In terms of utilizing Big Data within this scene, it is less clear cut. Is it worthwhile for a company with only 10 people to invest in a new data system?

“

Many companies believe so. It is clear from hundreds of examples that working with your data to create actionable insights is the ultimate goal of any data programme, so what should startups do in order to begin?

Collecting as much data as possible is always key, but the way that it is collected and stored is equally important. Start Early Starting a Data Programme does not begin when you create your first algorithm or

pull your first report, it begins in the way that data is being collected. Collecting as much data as possible is always key, but the way that it is collected and stored is equally important. A well maintained database is important for having strong actionable insights from your data. This is making sure that the correct fields are present, accurately input and correctly categorized. This begins almost before any data collection has occurred and creates a firm foundation for when a Big Data system is implemented. The popular saying for data systems is garbage in, garbage out, making sure your data is gold before it enters the systems will bring the best results. Do You Need It Yet? Jumping into a Big Data implementation before you are ready can be as damaging as not jumping in at all. With the outlay that is required for the systems or subscriptions, if it is started too early and doesn’t get the desired results then companies are unlikely to invest when they are in a better position to do so. It is often a good idea to scale the systems that are already being used. This could mean buying add-ons to existing systems or simply using it in a different way.

However, the other side of this is that companies are often reluctant to begin data programmes because they think they do not have enough data. When many of the fortune 100 companies who have made huge strides in their data programmes began them, they had only a few gigabytes before gathering more to coincide with the growing need. Starting off with a smaller amount of data does not mean that it lacks the same value, it just means more may be required in the future to make further gains. Can You Commit? When you invest in a data system, the most important aspect to consider is that it is not a system that will last for years then be replaced when it becomes too slow. Data science is an evolving business area that requires new investments and work all the time in order to maximize its potential. A startup needs to be willing to make this investment and have faith in the systems that are being created and updated. The updates and work that goes into the upkeep of systems will not always make considerable differences to performance, but will be necessary for future growth. The investments will need to be ongoing and this is something that needs to be considered when the pros and cons of budgeting are planned.


24

“

BIG DATA IN STARTUPS - HOW DO YOU START?

Data Based Changes

Starting off with a smaller amount of data does not mean that it lacks the same value, it just means more may be required in the future to make further gains.

In order to make the most of the data that is produced, startups need to be able to make changes that are based on the data they are shown. Often within the startup environment, the way that employees work is based on gut feeling and exploration, so being told to do something in a certain way because of an analysis goes against the way that they want to work. In order to make the most of any big data programme it is important for any company to be able to make changes to what is being done quickly and efficiently.

Therefore, getting full buy in from employees is as important as the amount of investment from the budget or the amount of data that is collected. Without a workforce that is willing to implement changes based on the results of any analysis undertaken, any data system, regardless of how good will ultimately be a failure.


D A E D E R A S C I STATIST

LONG LIVE D ATA SCIENCE Lee Baker CEO, Chi-Squared Innovations


26

I keep hearing Data Scientists say that ‘Statistics is Dead’, and they even have big debates about it, attended by the good and great of Data Science. Interestingly, there seem to be very few actual statisticians at these debates. So why do Data Scientists think that stats is dead? Where does the notion that there is no longer any need for statistical analysis come from? And are they right? Is statistics dead ? I guess that really we should start at the beginning by asking the question ‘What Is Statistics?’.

Briefly, what makes statistics unique and a distinct branch of mathematics is that statistics is the study of the uncertainty of data.

Is statistics dead or is it just pining for the fjords? So let’s look at this logically. If Data Scientists are correct (well, at least some of them) and statistics is dead, then either we don’t need to quantify the uncertainty or we have better tools than statistics to measure it.

STATISTICS ARE DEAD - LONG LIVE DATA SCIENCE

Quantifying the Uncertainty in Data Why would we no longer have any need to measure and control the uncertainty in our data? Have we discovered some amazing new way of observing, collecting, collating and analysing our data so that we no longer have uncertainty? I don’t believe so and as far as I can tell, with the explosion of data that we’re experiencing - the amount of data that currently exists doubles every 18 months meaning that the level of uncertainty in the data is on the increase. So we must have better tools than statistics to quantify the uncertainty, then? Well, no. It may be true that most statistical measures were developed decades ago when ‘Big Data’ just didn’t exist, and that the ‘old’ statistical tests often creak at the hinges when faced with enormous volumes of data, but there simply isn’t a better way of measuring uncertainty than with statistics - at least not yet, anyway. So why is it that many Data Scientists are insistent that there is no place for statistics in the 21st Century? Well, I guess if it’s not statistics that’s the problem, there must be something wrong with Data Science.

What is Data Science? Nobody seems to be able to come up with a firm definition of what Data Science is. Some believe that Data Science is just an impressive term for statistics, whilst others suggest that it is an alternative name for ‘Business Intelligence’. Some claim that Data Science is all about the creation of data products that are able to analyse the incredible amounts of data that we’re faced with. I don’t disagree with any of these, but suggest that maybe all these definitions are a small part of a much bigger beast. To get a better understanding of Data Science it might be easier to look at what Data Scientists do rather than what they are. Data Science is all about extracting knowledge from data (I think just about everyone agrees with this very vague description), and it incorporates many diverse skills, such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more. It is in the last bit, the ‘much more’ that I think defines a Data Scientist more than the previous bits. In my view, if you want to be an expert Data Scientist in Business, Medicine or Engineering then the biggest skill you’ll need will


27

STATISTICS ARE DEAD - LONG LIVE DATA SCIENCE

Have we discovered some amazing new way of observing, collecting, collating and analysing our data that we no longer have uncertainty? be in Business, Medicine or Engineering. Ally that with a combination of some/all of the other skills and you’ll be well on your way to being in great demand by the top companies in your field. In other words, if you want to call yourself a Data Scientist you really do need to be an expert in your field as well as having some of the other listed skills. Are Computer Programmers Data Scientists? On the other hand - as seems to be happening in Universities here in the UK and over in the USA - there are Data Science courses full of computer programmers that are learning how to handle data, use Hadoop and R, program in Python and plug their data into Artificial Neural Networks. It seems that we’re creating a generation of Computer Programmers that, with the addition of a few extra tools

on their CV, claim to be expert Data Scientists. I think we’re in dangerous territory here - it’s easy to learn how to use a few tools, but much much harder to use those tools intelligently to extract valuable, actionable information in a specialised field. If you have little/no medical knowledge, how do you know which data outcomes are valuable? If you’re not an expert in business, then how do you know which insights should be acted upon to make sound business decisions, and which should be ignored? Plug-And-Play Data Analysis This, is the crux of the problem. Many of the current crop of Data Scientists - talented computer programmers though they may be - see Data Science as an exercise in plug-and-play. Plug your dataset into tool A and you get some descriptions of your data. Plug it into tool B and you get a visualisation. Want predictions? Great - Just use tool C. Statistics, though, seems to be lagging behind in the Data Science revolution. There aren’t nearly as many automated statistical tools as there are visualisation tools or

predictive tools, so the Data Scientists have to actually do the statistics themselves. And statistics is hard. So they ask if it’s really, really necessary. I mean, we’ve already got the answer, so why do we need to waste our time with stats? So statistics gets relegated to such an extent that Data Scientists declare it dead. Talk about the lunatics running the asylum… About the Author Lee Baker is an award-winning software creator with a passion for turning data into a story. A proud Yorkshireman, he now lives by the sparkling shores of the East Coast of Scotland. Physicist, statistician and programmer, child of the flower-power psychedelic ‘60s, it’s amazing he turned out so normal! Turning his back on a promising academic career to do something more satisfying, as the CEO and co-founder of Chi-Squared Innovations he now works double the hours for half the pay and 10 times the stress - but 100 times the fun!


Email dwatts@theiegroup.com for more information


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.