T H E L E A D I N G V O I C E I N A N A LY T I C S I N N O V A T I O N
ANALYTICS INNOVATION MAR 2016 | #4
Palantir - Silicon Valley’s Most Secretive Startup Palantir is one of the most valuable private companies in the world, but little is really known about what they do. We investigate | 8
+
There Is No Data Talent Gap Much has been made about the shortfall in data scientists and the potential ramifications for companies, but is the problem real, or a fiction perpetrated for sinister motives? | 10
Big Data, Government Overreach, And Media Hysteria
You can now find every aspect of a person’s life online, and there is a concern that governments and companies will abuse it. But is their concern justified? | 13
2
Customer Analytics Innovation Summit + Big Data & Analytics for Retail Summit
CHICAGO
JUNE 15 & 16 2016
Speakers Include +1 415 670 9064 nedwards@theiegroup.com www.theinnovationenterprise.com analytics innovation
3
ISSUE 4
EDITOR’S LETTER Welcome to the 4th Edition of the Analytics Innovation Magazine ‘The goal of the future is full unemployment’ - Arthur C. Clarke The world of work is changing, and one of the central drivers of this is Artificial Intelligence (AI). In cinema, we’ve been overwhelmed with AI, but in the world of business we’re yet to really scratch the surface. This is set to change this year, with global consulting firm Accenture naming Intelligent Automation as the year’s biggest tech trend. Intelligent Automation uses machine learning processes to teach itself, constantly adapting using new data that feeds into the algorithms. There are a number of key technologies involved in the process, including natural language processing, computer vision, knowledge representation, and reasoning and planning intelligence. Lee Naik, MD of Accenture Digital SA, explains that: ‘I see organizations looking more and more towards intelligent automation to do two things: firstly, to improve the efficiencies of services that can run 24/7 and help them to become more effective in a digital world. Secondly,
Intelligent Automation as a way to enable key knowledge workers to be more productive and efficient in driving the correct outcomes for their organization.’ The positives for business are obvious. Replacing people with machines cuts costs. Machines are also both more efficient and more accurate. Increased use of AI in the workplace does, however, have huge ramifications for the labor market, and the next twenty years will see a shift in the nature of employment not seen since the Industrial Revolution. Later in this issue, Olivia Timson looks at how some of the fears around the use of data have been misplaced. However, concern around the impact on jobs has not been understated. How we deal with the rise of technology will shape the next century, and could spell disaster for the human race if not managed correctly. So far, it doesn’t look like we’re on the right track. Stephen Hawking notes that ‘everyone can enjoy a life of luxurious leisure if the machine-produced wealth is
shared, or most people can end up miserably poor if the machine-owners successfully lobby against wealth redistribution. So far, the trend seems to be toward the second option, with technology driving ever-increasing inequality.’ If handled correctly though, rather than eliminating the need for people, it could help us move into better, more interesting and creative jobs - or simply allow us to do whatever we want with our time. If machines can produce all the goods and services we need, why would we not free ourselves from the drudgery and let them? As always, if you have any comment on the magazine or you want to submit an article, please contact me at jovenden@theiegroup.com
JAMES OVENDEN managing editor
analytics innovation
4
ADVERT HERE
Gaming Analytics Innovation Summit + Social Media & Web Analytics Innovation
SAN FRANCISCO APRIL 27 & 28 2016
Speakers Include +1 415 614 4191 jc@theiegroup.com www.theinnovationenterprise.com analytics innovation
5
contents 6 | BIG DATA’S FUTURE IS IN PREDICTIVE ANALYTICS
17 | MAKING PREDICTIONS IN A NEW POLITICAL CLIMATE
Companies are now accumulating data at the rate of knots, but without predictive analytics, it won’t do them much good. David Barton explains why
Pundits have had huge success using analytics to make predictions in recent elections, but Donald Trump and Bernie Sanders have turned politics on its head, and new metrics are needed
8 | PALANTIR - SILICON VALLEY’S MOST SECRETIVE STARTUP
Palantir is one of the most valuable private companies in the world, but little is really know about what they do. We investigate 10 | THERE IS NO DATA TALENT GAP
Much has been made about the shortfall in data scientists and the potential ramifications for companies, but is the problem real, or a fiction perpetrated for sinister motives? 13 | BIG DATA, GOVERNMENT OVERREACH, AND MEDIA HYSTERIA
You can now find every aspect of a person’s life online, and there is a concern that governments and companies will abuse it. But is their concern justified?
20 | WHY THE LEGAL PROFESSION IS TURNING TO MACHINE LEARNING
Research has always been a cornerstone of the law profession, but machine learning algorithms mean that much of it is being automated. Alex Lane takes a look at how 24 | EXPLANATORY DATA ANALYTICS ARE DRIVING MARKETING
Marketers now use data in many of their decisions, but they need more understanding about consumer behavior if their campaigns are to be a success. Could explanatory analytics be the answer?
WRITE FOR US
ADVERTISING
Do you want to contribute to our next issue? Contact: jovenden@theiegroup.com for details
For advertising opportunities contact: ehunter@theiegroup. com for details
| assistant editor charlie sammonds | creative director oliver godwin-brown contributors alex lane, euan hunter, meg rimmer, david barton, olivia timson managing editor james ovenden
analytics innovation
6
Big Data’s Future Is In Predictive Analytics
David Barton Head of Analytics Innovation Enterprise
Big Data has been the most significant idea to have infiltrated itself into every aspect of the business world over the last several years. Every company wants to say that they’re making data-driven decisions, have a data-driven culture, and use data tools that non-data people have probably never even heard of. But while all this data is of course a valuable resource, and analyzing it can bring tremendous benefits, it is effectively rendered redundant in terms of gaining a competitive edge if the predictive analytics to leverage it are not in place. In the ‘How Predictive Marketing Analytics Boost B2B Performance’ report, commissioned by predictive analytics firm EverString and carried out by Forrester Consulting, just 36% of respondents cited velocity as the main challenge they faced with their data. Companies have, in the main, already worked out how to accumulate and store large amounts of data. And most know how to put it in reports that can be analyzed to see where they’re going wrong. The next challenge for many is the move away from such a reactive report-based method of working, and start to use it instead for making
analytics innovation
predictions that can impact the business’s bottom line months, days, or even seconds in advance. To do this, predictive analytics has to provoke action. It cannot simply be used to make forecasts, it needs to be deployed directly into software applications and business processes so that it can be be leveraged immediately. The tools to do this are becoming more easily available, primarily thanks to advances in Cloud technology, which enable the kind of speed and scalability that such software requires.
7 The real secret to successful predictive analytics, however, is context. According to a recent Boxever study, context is almost as important as price in consumers’ thinking when they are making purchasing decisions, and offers have the greatest impact when they address something the customer is already doing. Data scientists are increasingly finding new ways to create machine-learning algorithms that predict in real-time, what kind of highly personalized offers will work for different customers. Machine-learning algorithms identify patterns that know which profile a particular person fits at any given time and provide an accordingly enhanced and relevant experience. Such algorithms are one of the central reasons for Amazon and Netflix’s tremendous success, with VentureBeat claiming that 35% of product sales at Amazon resulted from their recommendations engine. One of the real difficulties with predictive analytics lies in measuring its success. An obvious way to do it is comparing your business’s position before a prediction to its position after. So, if your data shows that certain customers may be interested in buying a product at a certain time of year and you build a campaign accordingly, if sales in that product are high then it is generally a good indicator that the prediction was accurate. However, there is no
comparison with another product that you offer, and it could well be that would have sold equally well if not better - if you had built your campaign around that. Comparisons with other firms offer a better insight, although these are harder to come by. The Forrester and EverString report found that ’Predictive Marketers are 2.9 times more likely to report revenue growth at rates higher than the industry average,’ 2.1 times more likely to ‘occupy a commanding leadership position in the product/ service markets they serve’, and 1.8 times more likely to ‘consistently exceed goals when measuring the value their marketing organizations contribute to the business,’ compared to the Retrospective Marketers in the survey. This sort of success, while perhaps limited in that EverString is a predictive analytics company, should be evidence enough that basic analytics is no longer enough - predictive analytics is the future.
In the ‘How Predictive Marketing Analytics Boost B2B Performance’ report, commissioned by predictive analytics firm EverString and carried out by Forrester Consulting, just 36% of respondents cited velocity as the main challenge they faced with their data
analytics innovation
8
Palantir - Silicon Valley’s Most Secretive Startup
James Ovenden Managing Editor
In Silicon Valley, one firm stands out above the rest for the way it helps organizations use data for the betterment of their operations: Palantir. Based in Palo Alto, the firm was founded in 2004 by Silicon Valley investor and PayPal founder Peter Thiel. Its customer contracts totaled $1.1 billion in 2014 - up from about $30 million in 2009, meaning its annual growth rate is 107%. It now has roughly 2,000 employees and is worth an estimated $20 billion, making it the third richest venturebacked company in the United States, behind only lift-sharing app Uber and accommodation service Airbnb. Despite this, relatively little is known about the company. There’s a certain
analytics innovation
irony in just how privately Palantir operates given that its central reason for being is to find things out about people. However, leaked documents and a raised public profile mean that we now know far more than we did a few years ago. Palantir is named after the seeing stones in ‘The Lord of the Rings’ that granted powerful people the ability to see the truth from afar. Its toolsets are used to analyze massive data caches, and they primarily target the three industries that have the largest datasets to analyze: government, the finance sector and legal research. It is fundamentally an interface that sits on top of existing data sets and displays data to users for analysis, helping litigators and law enforcement to find connections
9
There’s a certain irony in just how privately Palantir operates given that its central reason for being is to find things out about people
that would otherwise be impossible to spot. Users do not have to use SQL queries or employ engineers to write strings in order to search petabytes of data. Instead, natural language is used to query data and results are returned in real-time. A Palantir deal can run between $5 million and $100 million, with 20% being asked for up front and the rest being paid only if the customer is satisfied at the end of the project. The company is known for two software projects in particular. Palantir Gotham is the tool used by counter-terrorism analysts, fraud investigators at the Recovery Accountability and Transparency Board, and cyber analysts at Information Warfare Monitor (responsible for the GhostNet and the Shadow Network investigation). Palantir Metropolis is used by hedge funds, banks, and financial services firms. As of 2013, Palantir was used by at least 12 groups within the US Government. These include the CIA, DHS, NSA, FBI, the CDC, the Marine Corps, the Air Force, Special Operations Command, West Point, the Joint IED-defeat organization and Allies, the Recovery Accountability and Transparency Board and the National Center for Missing and Exploited Children. The CIA’s venture capital arm actually invested $2 million in the project at its inception, and they have been well rewarded, with Palantir software successfully illuminating terror networks. One example of their successes is the cracking of a spy operation dubbed Shadow Network that had, among other things, hacked the Dalai Lama’s e-mail account. It has even been used to track patterns in roadside bomb deployment, and was able to conclude that insurgents were using garage-door openers as remote detonators. Palantir’s operations are not limited to matters of national security. On the fluffier side, it is also helping
Hershey’s Chocolate to learn more about things like where best to put their chocolates in stores, mining data customer transaction and store data to discover out that it sold best when placed next to marshmallows. Despite its relative anonymity compared with other tech giants, it has not completely managed to avoid controversy, and was forced to apologize when one of its employees floated the idea of discrediting Wikileaks. Despite its involvement with security agencies though, it played no role in NSA’s bugging of citizens, and is a strong advocate for privacy. Its software incorporates a series of safeguards which limit who can see particular data, and it lays ‘audit trails’ for investigators to follow to ensure that the rules were followed. Palantir has a strong idealogical bent. Wages are capped at $137,000 - a meagre sum by Silicon Valley standards - and the work they do for law enforcement is driven more by a desire to use big data to protect the world from evil. As the company grows and the clamor for it to go public increases, it will be interesting to see how these ideals stand up.
The CIA’s venture capital arm actually invested $2 million in the project at its inception, and they have been well rewarded, with Palantir software successfully illuminating terror networks analytics innovation
10
There Is No Data Talent Gap Alex Lane International Events Director Innovation Enterprise
There is plenty of supposed evidence that a substantial gap exists between the number of data scientists needed and those available. A study by McKinsey, for one, projects that ‘by 2018, the US alone may face a 50% to 60% gap between supply and requisite demand of deep analytic talent.’ Indeed, the 2015 MIT Sloan Management Review found that 40% of companies are already struggling to find candidates to fill their data analytics roles, and if the number of speakers at tech conferences complaining about a shortage is anything to go by, the situation is not improving.
This alleged shortage of candidates is baffling. Just a few years ago, Harvard Business Review called Data Scientist the sexiest job of the 21st century, while the role also topped Glassdoor’s list of ’25 Best Jobs in America 2016’. The job is rare in that it is both intellectually fulfilling and also highly lucrative, with salaries averaging around the $120k mark. Even a summer internship will often pay somewhere in the region of $6,000 to $10,000 a month. It seems to tick all the boxes, so why is there a lack of suitable applicants? analytics innovation
Many reasons have been given for the alleged shortfall. There is a common misconception that maths and science subjects at school are only for the ‘ultra bright’ - those getting straight As throughout their school life. Some argue that this can be off-putting to those considering a data-related degree. Evidence for this is thin on the ground. While ten years ago, just a handful of colleges in the US offered Big Data/ analytics degree programs, now almost 100 schools have datarelated undergraduate and graduate
11
The 2015 MIT Sloan Management Review found that 40% of companies are already struggling to find candidates to fill their data analytics roles
degrees, as well as certificates for working professional or graduate students wanting to augment other degrees. Universities do not just set up courses for no reason, they do it because the demand is there, and these courses are largely full, suggesting a healthy pipeline of job candidates. Of course, these students take time to filter through into the jobs market, but the growth in the number of courses started long ago, and there is no reason we should not be seeing the results now. There is also an argument to be made that a skills gap exists because data science suffers from the same issue all STEM industries suffer from: A lack of women. However, while only around 18% of computer science degrees go to women, they are achieving more than 40% of statistics degrees. Carla Gentry, a successful data scientist and founder of Analytical Solution, explains that, ‘More women are becoming interested in the big data field because it’s an interesting subject, filled with lots of potential. I think ‘we’ see the whole picture of these possibilities because as wives, mothers, etc. we have to see the macro view all the time. Therefore seeing the big picture comes naturally, in my opinion.’ Gentry does, however, continue to say that, ‘But, we do have an uphill battle to gain a foothold in this field, as I am constantly reminded even after 17 years in data analytics. Until our own field (tech/data science/analytics) recognizes us for our talent, how do we think others will? There are too few truly talented, experienced
people in Big Data to silence the share women have attained. It’s time to start highlighting talent and not gender. We need all hands on deck if we plan to take Big Data analytics to the next level.’ Gentry may well be right, and it could be that managers are still reluctant to hire female data scientists because of some prejudice. If this really is the problem, it’s one that is easily solved simply by hiring more women. This is a failure of management, and it could be that similar failures are leading to the perception that a gap exists that’s not really there. Managers are often not ‘data people’, and many simply do not know what they’re looking for in a Data Scientist. The term itself encompasses a variety of things, but ask a dozen hiring managers what these are and you’re unlikely to get the same answer twice. A statement made by Tom Pohlmann, Head of Values and Strategy at Mu Sigma, is especially telling. He argued that ’it can be difficult to find candidates with the creativity and experimental mind-set to truly revolutionize the handling of big data to truly transform a business.’ In how many other roles is it expected that someone will come in and ‘revolutionize’ the business? Most Data Scientists are just smart people looking to do their job well and add value. If every small business is holding out for the next Sergey Brin to come in and drive their profit through the roof, they’ll likely be waiting a long time. The talent is clearly there. Maybe it is the case that companies are simply
analytics innovation
12
While ten years ago, just a handful of colleges in the US offered Big Data/analytics degree programs, now almost 100 schools have data-related undergraduate and graduate degrees
not looking hard enough, but it could also be that darker forces are at play. If we assume that companies are not so overcome with prejudice that they won’t look at women for data science roles, and that they are do not believe a data scientist has to be a genius, only one question remains: Is the whole thing a fiction? Many studies suggest that the skills shortage in STEM in general is a myth. According to Hal Salzman, an expert on technology education at Rutgers, ‘the supply of graduates is substantially larger than the demand for them in industry.’ In fact, he found that only half of STEM college graduates each year get hired into STEM jobs. High-tech companies like Qualcomm are actually downsizing. At the same
time, Qualcomm - along with firms like Google, Microsoft and Facebook - is simultaneously lobbying hard to help get the Senate: S. 153 the Immigration and Innovation (I-Squared) Act - passed, which gives more latitude to companies for employing workers on H-1B visas. H-1B visas are granted to foreign workers who have a bachelor’s or higher degree in a wide range of areas. These are designed to serve high-skilled immigrants, but often enable the importing of Indian and Chinese guest workers to replace an older, more experienced, but more expensive domestic workforce with cheaper labor. S. 153 would increase the number of H-1B visas from 65,000 up to 245,000. Also included in the bill is an extra incentive for STEM workers like data scientists, with a provision to give international students a lifetime work visa for obtaining any advanced STEM degree. Michael Teitelbaum, a demographer at Harvard Law School and author of the 2014 book ‘Falling Behind? Boom, Bust, and the Global Race for Scientific Talent’, argues that the skills shortage in STEM is a falsehood being propagated by companies wanting to flood the market with cheap labor. As previously mentioned, the average wage for a data scientist is $120k, and bringing this down has clear benefits. Teitelbaum notes that: ‘If you can make the case that our security and prosperity is under threat, it’s an easy sell in Congress and the media.’ Rochester Institute of Technology public policy associate professor, Ron Hira, agrees, arguing that ‘many in the tech industry are using it (H1-B) for cheaper, indentured labor.’ Whether or not there is a real intent to exaggerate the scale of the skills gap, the simple truth is that the talent is there. If companies really are struggling to employ data scientists, it’s entirely their own fault.
analytics innovation
13
Big Data, Government Overreach, And Media Hysteria
Olivia Timson Analytics Thought Leader
The past week has seen two major news events again raise the debate around governmental overreach when it comes to the collection and use of people’s personal data. First, we had Apple’s open letter to the FBI, in which Tim Cook explained the reasons behind the tech giant’s refusal to comply with a court order demanding they provide investigators with a back door into the phone of San Bernardino terrorists, Tashfeen Malik and Syed Farook. There was also the article in Ars Technica claiming that the machine learning algorithm used by NSA’s SKYNET program - an algorithm that the site said was likely used to identify people targeted by drone strikes - was fundamentally flawed and may have caused the deaths of innocent civilians.
analytics innovation
14
These stories have raised two issues, which are different, but very much connected. The Ars Technica article and the idea that private data could be used to wrongly identify criminals, with terrible consequences, is one of the central reasons people fear data collection so much, and is partly why Apple believes it so important to customers that their information is protected from the government. However, there are a number of problems with the Ars Technica story - problems that speak volumes about the hysteria surrounding Big Data, and why peoples’ fears about their privacy being invaded are somewhat misplaced. Ars Technica’s headline, ‘NSA’s SKYNET program may be killing thousands of innocent people’, is both scare mongering and untrue. Granted, the public perception of SKYNET is not going to be helped by sharing a name with the AI system that destroys mankind in the Terminator films, and you’d
Right to privacy has always been a tricky concept to define legally, with many of the repercussions falling under the purview of other laws, such as theft, trespass, defamation
analytics innovation
think the NSA could have thought up something a bit more media friendly. However, the article itself is filled with conjecture and seemingly baseless assumptions. Martin Robbins, writing in the Guardian, has extensively debunked the article, pointing out that the NSA was not really even looking at terrorists themselves. It was looking at the couriers who delivered their messages, and data taken from their phones was being used in conjunction with a number of other intelligence sources to try and identify and locate them. The program was not simply about putting anyone whose behavior seemed ‘terroristy’ into a list and firing bombs at them. Robbins also, more importantly, noted that the exposed document ‘clearly states that these are preliminary results. The title paraphrases the conclusion to every other research study ever: ‘We’re on the right track, but much remains to be done.’ This was an experiment in courier detection
and a work in progress, and yet the two publications not only pretend that it was a deployed system, but also imply that the algorithm was used to generate a kill list for drone strikes. You can’t prove a negative of course, but there’s zero evidence here to substantiate the story.’ Apple’s refusal to provide the FBI with a back door stems largely from the fear that governments will mis-use data in the ways the Ars Technica article mentions. Apple argued that by complying, they would open a Pandora’s Box whereby the government could get into peoples’ phones and access their personal data at will. The FBI denies this, saying that they were happy for Apple to keep the bypass to themselves and to destroy it once it had been used. Whoever you believe, it raises the important question of how much your privacy is worth - is it right to invade the privacy of a billion people to save lives? Right to privacy has always been a tricky concept to
15
There is a strong argument that the US has no right to kill foreign citizens, but this is the argument, not whether data should be used to identify terrorists
define legally, with many of the repercussions falling under the purview of other laws, such as theft, trespass, defamation. Is it really invasion of privacy that people fear, when most people expose so much about themselves so willingly on a daily basis anyway? As data gets bigger and more aggregated, it often becomes more anonymous anyway, and it is, for the most part, someone you don’t know looking at numbers on a screen which can’t be related back to you.
Criminals, and domestic and foreign intelligence agencies could exploit such features to conduct mass surveillance and steal national and trade secrets. There’s a very good chance that such a law, intended to ease the job of law enforcement, would make private citizens, businesses and the government itself far less secure.’ This is persuasive, but are we really saying that we’re so fearful of government incompetence that we should not allow data collection?
Is it, rather, fear that government incompetence is so great that they will analyze this data wrongly and shoot you in your bed while you sleep? Patrick Ball — a data scientist and the director of research at the Human Rights Data Analysis Group — who has previously given expert testimony before war crimes tribunals, described the NSA’s methods as ‘ridiculously optimistic’ and ‘completely bulls**t.’ Ars Technica notes that: ‘A flaw in how the NSA trains SKYNET’s machine learning algorithm to analyse cellular metadata, Ball told Ars, makes the results scientifically unsound.’ However, Ball provides scant evidence that it hasn’t worked, only that it identified an Al-Jazeera journalist as a courier because he had been acting like one through his role. Of course, it was right to do this because he matched the criteria that the NSA was looking for, which if anything simply proves that the machine algorithm does work. All that was required was a simple cross check with the man’s job description to explain it. Subsequently, the journalist has not been killed by a drone.
There is a strong argument that the US has no right to kill foreign citizens, but this is the argument, not whether data should be used to identify terrorists. Wrongheaded articles like Ars Technica’s spread paranoia that restricts the governments ability to look at Big Data, a resource that has driven success in almost every industry and organization that it has been used in. Apple’s statement, whatever its merits or flaws, has at least raised the issue. We need to have a real debate about Open Data and how much people are willing to share. As FBI Director James Comey noted, ‘we have awesome new technology that creates a serious tension between two values we all treasure: privacy and safety. That tension should not be resolved by corporations that sell stuff for a living. It also should not be resolved by the FBI, which investigates for a living. It should be resolved by the American people deciding how we want to govern ourselves in a world we have never seen before.’ People need to make a decision, and if the answer to this is that we cannot trust governments with data, maybe we need to ask ourselves some even more serious questions.
The editorial board of the New York Times wrote of the Apple case that, ‘Congress would do great harm by requiring such back doors.
analytics innovation
16
Marketing Analytics Innovation Summit + Business Analytics Innovation Summit Speakers Include
MAY 19 & 20 CHICAGO, 2016
+1 415 670 9064 nedwards@theiegroup.com www.theinnovationenterprise.com
17
Making Predictions In A New Political Climate James Ovenden Managing Editor
Polling is a notoriously inexact science, and previous elections have seen pollsters get it tremendously wrong. Neil Kinnock’s loss to John Major in the 1992 UK elections and Alf Landon’s to FDR in the 1936 US presidential elections, in particular, stick in the memory. The 2008 US presidential election, however, was thought to herald a new dawn in political predictions, with renowned statistician Nate Silver offering another way that has, until now, proven extremely accurate. Silver was one of the pioneers of sabermetrics in baseball, developing his PECOTA (Player Empirical Comparison and Optimization Test Algorithm) system to predict the future performance and valuation of major league players by comparing their data with 20,000 post-WW2
players. In 2008, he began applying his quantitative methods to politics, correctly calling 49 out of 50 states in the 2008 general election. In 2012, he called all 50 correctly. It seems, however, that Silver’s methods may finally have proven fallible. analytics innovation
18 The US elections have long delighted, confused, and entertained the world in equal measure, thanks to their extraordinary length and the often bizarre array of characters on show. This year’s Republican and Democratic primaries are, however, making previous years look like a quiet, orderly discussion between old friends. The fervor surrounding both Donald Trump and Bernie Sanders has shaken the foundations of the political establishment, rendering any attempt by pundits to second guess the results a near impossibility. Donald Trump, widely considered a no-hoper when he started, is now a front runner in the race for the Republican nomination. Even with his repeated gaffes, often ludicrous remarks, and global mockery, his poll numbers have increased. This has caused panic in both liberal and conservative circles, with liberals seeing him as a dangerous lunatic, and conservatives seeing him as being unelectable when it actually comes down to the real presidential contest. Democratic hopeful Bernie Sanders was similarly considered a no-hoper, a sideshow in a primary that was supposed to be a mere formality for Hillary Clinton’s coronation. But Bernie fever has gripped the nation, particularly amongst the young, in a similar way that Corbynmania helped sweep socialist MP Jeremy Corbyn to the
analytics innovation
leadership of the UK Labour Party earlier last year. Sanders’s rallies have drawn huge crowds, and he has raised record amounts of money from individuals, dwarfing others in the contest. Trump and Sanders in the US, as well as Corbyn and UKIP’s Nigel Farage in the UK, all represent a new kind of politics. This politics is characterized by mistrust in the establishment, and its engine is social media and the internet. Pundits have yet to really acclimatize to this new environment, and none more so than Nate Silver. While panic has set in over Trump’s evergrowing poll numbers, Silver has help steadfast to his conviction that Trump will not win, providing a supposed voice of reason in the insanity and fear that such a buffoon could potentially become the leader of the free world. In September, he told CNN’s Anderson Cooper that Trump had a roughly 5% chance of beating his GOP rivals. Obviously, Trump’s poll numbers have since soared, and his second place finish in Iowa, while not exactly what Trump would have wanted, still makes Silver’s 5% look vaguely ridiculous. Silver himself recently admitted as much, acknowledging in a blog post that he’d been too skeptical about Trump’s chances: ‘Things are lining up better for Trump than I
would have imagined. If, like me, you expected the show to have been over by now, you have to revisit your assumptions.’ Blake Zeff, the editor of the political news site Cafe and a former campaign aide to Obama and Hillary Clinton, has warned of the dangers of trying to make predictions based on models created in old political environments. Jeff said: ’This is an extraordinary, unusual, utterly bizarre election year, in which events that have never happened before are happening. That’s a nightmare scenario for a projection model that is predicated on historical trends.’ What was true yesterday is not necessarily true today, and that’s a problem for Silver and his team. Accounting for emotion is a difficult task in data analytics. Cold, hard numbers cannot take into account the sort of fervor that we are seeing, and a way must be found of doing exactly that. How exactly they go about this it is difficult to know. Greater analysis of social media is an obvious place to start, but this focuses primarily on the young, an age group traditionally far more left wing than older generations who do not use social media to anything like the same degree yet vote in far greater numbers. The problems of relying too much on social media to try and predict elections was seen last year
19 in the UK, with many arguing that one of the reasons the Conservative victory caught so many by surprise was that people were looking at social media too much. One idea for how it could be taken into account, ironically, comes from the very man who managed to beat Donald Trump in the Iowa Caucus - Ted Cruz. Much of his success has been put down to an excellent ‘ground game’, an old school method of campaigning that was foresaken by many others in the race. Cruz’s team, however, gave it a modern twist, using a team of statisticians and behavioral psychologists to employ something called ‘psychographic targeting’, in which campaigners alter the way they deal with potential voters based on a psychological and political profile created using information collected about that individual. His campaigns use of data is also ironic because Cruz has been a heavy critic of excessive government data collection, but maybe consistency didn’t show up as a big voter issue. According to The Washington Post, the Cruz campaign has employed Massachusetts-based Cambridge Analytica to run the data-side of its operations. To develop its psychographic models, Cambridge surveyed more than 150,000 households across the country and scored individuals
using five basic traits: openness, conscientiousness, extraversion, agreeableness and neuroticism. According to Cruz campaign officials, the company also used social media to do this, developing its correlations in part with Facebook data such as subscribers’ likes. The Cruz campaign then modified the Cambridge template, renaming some psychological categories and adding subcategories to the list, such as ‘stoic traditionalist’ and ‘true believer.’ The campaign also did its own field surveys in battleground states to build a more precise predictive model based on issues preferences. The Cruz algorithm was then applied to what the campaign calls an ‘enhanced voter file,’ which can contain as many as 50,000 data points gathered from voting records, popular websites and consumer information such as magazine subscriptions, car ownership and preferences for food and clothing.
In emails, ’stoic traditionalist’ would receive very direct and to the point messages, whereas someone labeled ‘temperamental’ would receive a message that was inspiring, and became more and more positive as the conversation progresses. Could analysts look to similar methods to more accurately predict results? Incorporating emotions into prediction models looks like a difficult task, but such psychological testing could go some way towards better gauging the tide of public sentiment. Trying to incorporate an entity as wildly erratic as Donald Trump into prediction models, however, may simply prove impossible, even for Nate Silver.
The Cruz campaign has utilized all this information to make a concerted effort to tightly tailor outreach to individuals. For example, personalities labelled ‘stoic traditionalist’ are believed to be highly conservative, and would be spoken to in a way that was ‘confident and warm and straight to the point’, because that was the one deemed would have the greatest impact. Even campaign e-mails are tweaked according to this research.
analytics innovation
20
Why The Legal Profession Is Turning To Machine Learning Euan Hunter Analytics Commentator
The US legal system has come in for a lot of criticism of late, with Netflix’s hit documentary Making A Murderer shining a light on a perceived injustice that many have taken to be representative of endemic flaws. In fairness, it’s a big system. US courts see 350,000 cases pass through each year, and when the ABF last recorded the number of registered American law firms back in 2000, it counted 47,563 - a number that has surely risen since. All of these cases create a wealth
analytics innovation
of data and information. Research has always been central to the attorney’s role, and combing through the large volumes of informations that pertains to a case is incredibly time consuming. It’s also incredibly costly, with lawyers and law firms spending around $8.4 billion on it annually in the U.S. The quicker and more accurately an attorney can find information that’s actually useful, the more time that can be spent on developing an effective litigation strategy.
21
US courts see 350,000 cases pass through each year, and when the ABF last recorded the number of registered American law firms back in 2000, it counted 47,563 - a number that has surely risen since
Machine learning algorithms are capable of building computer models that make sense of complex phenomena by detecting patterns and inferring rules from data. They have already proven themselves an excellent tool for speeding up processes across a number of industries, as well as discovering important details that humans may overlook. The legal profession should, theoretically, be no different, and it could provide law firms with a cheaper, and better, alternative. This is made all the easier because an increasingly large amount of case information is now digitally stored. This was highlighted at the Big Law Business Summit in New York City by federal district court judge Shira A. Scheindlin (S.D. N.Y.), who discussed the dramatic ways that TAR is changing the legal world. She said: ‘The use of electronically stored information is everywhere. There is no case — civil or criminal — now that does not involve ESI. ‘Evidence’ has been modified from what was once tangible and testimonial to what is now electronically stored. From email to GPS, from social media to the cloud, from body cameras to cellphones—very little happens that is not recorded.’ Currently, the two largest companies in legal data-driven research are LexisNexis and Westlaw. They have databases that contain huge
numbers of case details, and often serve as the default starting point for legal researchers. However, they are not a resource for running advanced analytical tools. Others are filling this gap in the market though. Brainspace, for example, is applying more analytically driven tools to unstructured data in ways that companies have previously not been able to do, and one application that it’s been used for successfully is legal files. They used their software to parse millions of unsorted, unstructured emails from the Enron Scandal. While it took reviewers months to go through it during the original trial back in 2000, Brainspace managed the feat in under an hour. Its analysis also helped discover a number of connections that lawyers had missed the first time around, such as what other companies may have been involved, and what time of day and whereabouts suspicious activity had taken place. Machine learning can also help in other ways. One of the basic cornerstones of the United States common law system is that judges must explain their decisions in writing. They set out the reasons for their decision by referencing the law, facts, public policy, and other considerations upon which the outcome was based. Machine learning can find correlations between this opinion and other
analytics innovation
22 factors to determine whether there are any irregularities that impact a decision and test the system’s strength - such as racial factors for example. It can also help lawyers to find which judges could potentially be more sympathetic to their client. There are a number of problems with using data analytics. For one, the information that law firms are analyzing belongs to their clients, and it needs to be properly anonymized before analysis. There is also some debate as to whether machine learning algorithms are really the best tool for lawyers to use. Many believe that legal practice requires cognitive abilities that are currently beyond the realm of machine learning and AI. According to a paper in the Washington Law Review by Harry Surden, Attorneys must routinely use both abstract reasoning and problem solving skills in environments of legal and factual uncertainty. Modern AI algorithms, by contrast, have been unable to replicate most human intellectual
abilities, falling far short in advanced cognitive processes—such as analogical reasoning—that are basic to legal practice. It could also have tremendous implications for lawyers, particularly in the kind of background and experience that law firms look for. Traditionally, lawyers have come from humanities backgrounds that tend to involve a heavy research aspect, but as researching abilities begin to play a less prominent role, it is likely we will see more lawyers coming from data science and analytical backgrounds, such as finance. It is doubtful that machine learning will ever fully replace crucial attorney tasks, but they are likely to be an extremely useful tool. Every lawyer has a duty to provide the best possible service to their client, and if they are ignoring valuable tools like machine learning, they are failing in this.
information that law firms are analyzing belongs to their clients, and it needs to be properly anonymized before analysis analytics big data innovation innovation
It is doubtful that machine learning will ever fully replace crucial attorney tasks, but they are likely to be an extremely useful tool
23
Predictive Analytics Innovation Summit + HR & Workforce Analytics Innovation Summit
10 & 11 MAY LONDON, 2016
Speakers Include +44 207 193 3011 dmarshall@theiegroup.com www.theinnovationenterprise.com
24
explanatory data analytics are driving marketing Meg Rimmer Director, Marketing Analytics Summit
In Edward Bernays’ Crystallizing Public Opinion, he wrote: ‘The three main elements of public relations are practically as old as society: informing people, persuading people, or integrating people with people. Of course, the means and methods of accomplishing these ends have changed as society has changed.’ Human nature remains largely the same as it ever was, which has meant that much about appealing to people has remained the same since Bernays wrote that in 1923. However, technology has meant the means available to marketers for getting the word out have advanced tremendously.
A company’s ability to use data analytics can make or break their success, and marketing is one area where it has arguably had the most impact. According to Squiz research, 82% of marketers are now using an analytics tool, and the Wall Street Journal reported that spending on marketing analytics is expected to nearly double over the next two years, from 7% to 12%, as marketers overtake
analytics innovation
CTOs as a business’s biggest IT spender. Another survey of 308 CMOs and business unit directors by Forbes Insights, ‘The Predictive Journey: 2015 Survey on Predictive Marketing Strategies’, found that 86% of executives with experience in predictive analytics believe the technology has delivered a positive return on investment for their business.
25
The three main elements of public relations are practically as old as society: informing people, persuading people, or integrating people with people. Of course, the means and methods of accomplishing these ends have changed as society has changed
Introducing data analytics is not, however, simply a case of buying a tool, sitting back and watching it churn out insights. Consumers’ interests and shopping habits are constantly evolving, and how the data is used should change accordingly. As data ages, it becomes irrelevant in terms of consumer value. It is for this reason that predictive analytics in marketing are, while highly useful, limited in how successful they can be. The next year should see a greater onus being placed on explanatory analytics in their place. The limitations of predictive analytics was evidenced earlier in the year, when Whole Foods attracted controversy with its plan to launch a line of grocery stores geared towards millennials. The announcement was labeled variously as ’offensive’ and ‘stupid’, among other even less flattering terms, and the retailer’s stock price dropped. In Harvard Business Review, Robyn Bolton wrote that members of Generation X and Baby Boomers also want access to ‘lower-priced, organic, and natural foods.’ He pinned the error on Whole Food’s marketing models. The mere fact that someone is in a demographic doesn’t necessarily indicate certain preferences or behaviors - marketers need a greater understanding of why a demographic is interested, and know the best ways to capitalize on them. Predictive analytics is an incomplete approach because it only gives you a likely outcome if nothing changes. It is useful only for understanding what the future will be like, it does not tell you why outcomes are likely, the correlations driving those outcomes, or how to intervene to change those outcomes. In order to alter an outcome, you have to be looking to explain why it will happen - a luxury afforded by explanatory modeling.
A good example of the difference between predictive and explanatory modeling is in healthcare. Predictive modeling is useful in that it would be able to give you an accurate estimate of, say, which hospitals require certain services. It doesn’t offer an explanation of why those areas require the most money, and would subsequently do little to address root causes and actually decrease illness rates. Explanatory modeling, on the other hand, will identify things that are having an impact. For instance, it would identify that smoking is prevalent among patients, that smoking causes a higher risk of cancer, and that raising the price of cigarettes will lead to less smoking and therefore less cancer in the area. The same logic applies to marketing data, in which the understanding of consumer trends and how that will impact campaigns is paramount - something difficult to spot when relying simply on past data. Explanatory modeling mainly focuses on variables which are in control of the user, either directly or indirectly. Marketers should be trusted to use their intuition and experience to ask the analytics questions, and data should then be used to identify the correlations that matter. In light of this, there needs to be a shift in marketers’ attitudes towards data, away from looking simply at gathering as much as possible, which may work well for predictive analytics but is not so important in exploratory analytics. Rather, the focus should be on the quality of data received. Marketers need a clear strategy in place to ensure that they are all on the same page with what they want from the data, and communicate this through the team, only then will they be able to garner meaningful insights and drive growth for their organization.
analytics innovation
26
For more great content, go to ieOnDemand Over 4000 hours of interactive on-demand video content
View today’s presentations and so much more
All attendees will recieve trial access to ieOnDemand
Stay on the cutting edge
Listen
Watch
Learn
Innovative, convenient content updated regularly with the latest ideas from the sharpest minds in your industry.
+1 (415) 692 5514
www.ieondemand.com analytics innovation
sforeman@theiegroup.com