An Introduction to Machine Learning in Business: Analyzing Your Success process.st/machine-learning-in-business/ January 26, 2018
Ben Mulholland January 26, 2018
While it might seem new and intimidating, machine learning in business is already bringing massive benefits to companies and consumers alike. From increasingly effective product suggestions to accurate journey time predictions and advanced customer analytics, machine learning is an incredibly powerful tool which lets you analyze every important aspect of your business without wasting human hours on the task. But what exactly is machine learning, and how do you go from knowing that to actually using it? “Instead of using labels to teach an AI what each object it’s looking at is, this DeepMind project teaches itself because it learns to recognise images and sounds by matching them up with what it can see and hear. This method of learning is almost exactly like how humans think and learn to understand the world around them.” – Vaughn Highfield, How Google DeepMind is learning like a child: DeepMind uses videos to teach itself about the world
1/12
Keep reading to find out all of this and to see examples (both good and bad) of how companies like Google and YouTube are using machine learning to enhance their business in gamechanging ways.
What is machine learning? Although some use the terms interchangeably, “AI” and “machine learning” aren’t the same thing. To prove it (and to explain what machine learning is), take a moment to think of an advanced machine from a book or movie, like The Terminator or Data from Star Trek. These machines are examples of artificial intelligence (AI), seen in their ability to solve complex problems and act in a way that we would see as “smart”. For example, they could hold a conversation with a human rather than just saying “yes” or “no”. Meanwhile, machine learning is an application of AI which lets machines analyze the data they’re given and learn from it without humans having to teach them. In this way the machine gets “smarter” by identifying patterns and the factors that cause them, which can be used to solve future problems more accurately.
(Source)
This learning model usually needs a human to start off the process with an initial thesis (“studying more gives better grades”) and the factors used to analyze it (“the amount of time studying vs the student’s final grade”), but that’s not always necessary. To take a bit of a deeper dive, a machine learning model is made up ofthree separate parts: Model Parameters Learner Let’s take a quick dive into each.
Model 2/12
The “model” is your thesis statement. It tells the machine what it needs to be analyzing and iterating on as it learns more from the data it’s presented with. Let’s say that you wanted to find out when you should take a break. Your model might start off as “the longer you work without taking a break, the less productive you become”. This would give you a summary statement to make sense of the vast data sets machine learning requires to progress and a core principle that that data can be related back to. The model doesn’t have to be a statement backed in any facts as long as it gives a frame of reference for the relationship you’re looking to analyze, such as the amount of time worked without taking a break vs how productive you are. The model will also be changed to fit the results of whatever data you give your machine to process.
(Source)
For example, as a machine analyzes data showing how productive you are against how long you’ve been working without a break, the model might change from “the longer you work, the less productive you become” to “working more than 2 hours without a break makes you less productive”.
Parameters The parameters of machine learning are the things being assessed; they’re the data and factors which are analyzed to make iterations of the model. In our case, these would be “the amount of time worked without taking a break” and “how productive you are”. Parameters will never change, and so are the constant guidelines which tell the machine what information it should pay attention to. If the model is a relationship statement, then the parameters are the items whose relationship is being assessed. 3/12
Learner The “learner” is exactly what it sounds like – it’s the system by which the machine takes its findings, makes sense of them in relation to the model, and then alters the model to match that information. It uses your model, parameters, and a relevant data set to review the model and adjust it to match the relationship shown by the data. For example, let’s say that your first data set showed that productivity increased as you worked but then dropped off after two hours if you didn’t take a break. The learner would then adjust the model to read “working for longer than two hours without a break reduces your productivity”.
(Source)
This is an area where machines have a huge advantage over humans, as they are able to process huge amounts of data and perform the complex equations required to learn from them much faster than the average human. That’s why machine learning in business is so powerful. If you can get a machine to do all of your data analysis and number crunching for you, that frees up time for you to work on more pleasant (or creative) tasks. Not to mention that humans only have a limited capacity to crunch data before suffering burnout, whereas machines can just keep going. Once the data you’ve given the machine has been assessed and the model updated, the entire cycle repeats with the next set of data. This continues either until the system is stopped or it runs out of data.
Machine learning in business holds massive benefits Let’s be frank; data crunching is not fun to do. As a short task it can be a nice change of pace, but manually analyzing hundreds of thousands of data points would drive even the most dedicated team insane. The cost in time and money for processing these massive data sets would also be crippling, and any mistakes caused by
4/12
human error could render the entire model completely useless. Instead, machine learning allows us to hand over these huge data analysis projects to machines while still reaping the rewards, saving time, money, and sanity in the process. While that’s one big benefit of machine learning versus a manual system, the true power comes in its application. Machine learning has already allowed us to analyze massive data sets and increased accuracy in everything from image recognition and spam filters to self-driving cars, chatbots, and virtual assistants. The time, cost, and sanity savings hold true, but the speed at which these services and more can keep improving with machine learning is now limited only to how much new data they have to assess.
(Source)
For example, Microsoft’s image recognition program may be 80% sure that an image you submitted contains a cat, 9% a dog, and 1% a whale. If it was right in guessing a cat, the next time it’s presented with a similar image it may be 81% sure of the result, as it learned that the factors used to assess your photo are more reliable in showing cats. So, the applications of machine learning are almost infinite, and the results only get more accurate with more data. Sounds great, right? Unfortunately, there are a couple of downsides to the practice too.
Machine learning isn’t accessible to all 5/12
For one thing, machine learning requires a lot of computing power to carry out, as you’re typically dealing with massive data sets in order to prove any consistent or conclusive models. This usually results in multiple machines working on the same model in order to increase the speed at which iterations are produced. In a similar vein, the amount of data involved also means that the uptime for the machines you’re using needs to be as close to 100% as possible. Any downtime will cause data analysis to stop dead in its tracks, and the risk of gaining corrupted results from machine failures is one that you can’t afford to take. Remember; these machines are learning based on analyzing data and applying it to an everchanging model. If that analysis is corrupted by outages, there’s a chance that your entire model could be compromised unless you intervene and discard the corrupted results. Having the necessary data sets can also be a problem, especially if you’re a small business or startup that doesn’t have a vast database of its own. This leaves you with four options if you want to take advantage of machine learning: Run a small scale analysis of the data you have to create a basic (if vague and unreliable) model Wait until you have enough data Use someone else’s data Use someone else’s machine learning service Thankfully, using someone else’s data or machine learning system isn’t as difficult as it might seem. For example, you can use Google’s Vision API to detect what an image shows (both for images and text), then use that to automatically assign a tag or description to it. That alone isn’t a great business advantage, but you could use that knowledge to place relevant ads alongside your image. This could allow you to completely automate the process of selecting relevant ads to show alongside new images on your site. Last (but certainly not least) is probably the biggest downside of machine learning; until hundreds of thousands of data points have been analyzed, there’s a huge danger that a machine learning system will produce incorrect results due to a lack of human context. Real life is more complicated than a X=Y relationship between parameters, and until the nuances of a certain data sets have been taught to the system, it’ll likely fall afoul of mistakes it has no way of predicting. For example, Google’s image recognition service has landed them in hot water for assigning tags to images which (with a little more context) were highly offensive. This isn’t necessarily a fault of the system itself, but one of both wider context and (potentially) a lack of diverse data to analyze. 6/12
Some examples to spark your imagination So, we now know what machine learning is, and how we can use machine learning in business to carry out tasks that would be impossible for all but massive corporations with huge workforces. However, due to the vast number of ways machines learning can benefit your work, it can be difficult to know where to start. If you will, it’s hard to see the forest for the trees. To help you get started, here are a few examples of both good and bad ways that machine learning has been implemented.
Google’s custom search suggestions and intent detection Google’s predictive search system is an incredibly useful example of machine learning saving you time and effort in your everyday life. Imagine you start up your browser for the first time, head to Google, then search for “Chris Pratt”. As you type the search suggestions will include general terms that (for the precise letter combination at the time) are the most popular in Google’s system. That’s great, but can be done using a simple popularity rank. Google uses machine learning to go one step further.
Let’s say that you go back to Google a while later and type “gua” into the search bar. Chances are that one of your suggested searches will be “Guardians of the Galaxy”. This isn’t necessarily because it’s the most popular search term starting with “gua” (although it’s
7/12
probably high up the list) – Google’s search algorithm knows that you’ve looked for Chris Pratt in the past and will be able to relate him to the various films he’s starred in. At the very least Google will know that a lot of people who have searched “Christ Pratt” have also searched for “Guardians of the Galaxy”. As such, Google can use that information to suggest “Guardians of the Galaxy” as a search term, thus saving you the trouble of typing the whole thing out. The system then marks its prediction as correct (if you go on to search that term), filing away another data point in favor of suggesting things based on that formula.
Shazam analyzes music to assign tags and genres automatically Shazam is another great example of using machine learning to tackle a task which would both require a huge amount of human hours to solve, and is abstract enough to make human analysis difficult in general. The task they had to tackle was assigning tags and genres to the songs in their library. While that might sound easy, anyone who loves a certain style of music can tell you that a genre can span a massive variation of styles ( black metal, thrash metal, death metal, sludge, and more are all “metal”). Worse still, the genre (and especially sub-genre) that a song belongs to can be entirely subjective to the person listening to it. It’s rare that something can be categorized without any kind of dissent. To solve this issue, Shazam developed a model which analyzed snippets of songs and assigned a “signature” for each. This worked by creating a spectrogram for the track snippet and then looking for peaks in amplitude.
(Source) 8/12
These track signatures were then converted into two “feature” values for each track, which could then be visualized and presented in a graph. Once separated by genres assigned by humans, this gave a heat map of what each genre of music looked like visually, which could then be used to cross-reference with other tracks to more easily assign a genre automatically. While the details are a little complex, the end result is that Shazam is able to automatically assess songs and assign tags and genres to them automatically with reasonably high (in the low 90%s) accuracy. This task would have been impossible using only humans, as tags and genres are subjective to each listener, and the cost of doing this manually for millions of tracks just isn’t feasible.
CamFind detects image contents and suggests relevant search results As I’ve already mentioned, machine learning is vital to image recognition software. Without the ability to analyze vast data sets and become increasingly accurate with every result, the sheer amount of human time and effort it would take to manually tag and describe every single image uploaded to the internet would be insane. Google and Microsoft both use image recognition services to let you upload images which can then be automatically assessed and tagged for you, but CamFind does something a little more interesting. Instead of just analyzing uploaded images, CamFind is an app which lets you take a photo with your phone and have it instantly analyzed.
(Source)
Rather than having to mess around with uploading it to Google or Microsoft, CamFind analyzes whatever photo you take, then searches the internet for related results to it. So, you could do anything from looking up a dog breed you can’t remember the name of to ordering your own box of Cards Against Humanity after playing it with friends. 9/12
Sure, Google and Microsoft’s image recognition services have more power and wider applications, but machine learning for business doesn’t have to be the sole priority of the practice.
Youtube automatically disables ads on “inappropriate” content YouTube has had trouble being profitable from day one, largely because they allow everyone to upload whatever they want. As such, they have to focus more on maintaining a good relationship with big advertisers who want to show their ads on videos. This becomes a problem when said videos aren’t curated or monitored (which, for a long time, they weren’t). The website has grown so large and so much video footage is uploaded to it (300 hours of new footage every minute) that it’s not feasible to employ enough people to sift through and approve or reject them before they go live. This meant that ads could be shown on videos that advertisers didn’t want their brand associated with (eg, extremist content). The solution YouTube came up with (or at least one of them) was to create a machine learning system that automatically detected when a video wasn’t suitable for advertisements. If flagged, your video would have advertisements disabled entirely.
(Source)
Creators were also allowed to request a manual review of their video if they thought the system was incorrect in flagging it. In theory, this would limit the damage caused to creators’ revenue streams during the transition and initial learning period (where mistakes are more common).
10/12
Basically, YouTube created a system to automatically scan videos and judge whether they were suitable to show ads on. This system was supposed to learn from every video analyzed to become more accurate as time went by, thus appeasing both content creators and advertisers. Unfortunately, this case shows why the hard reality of implementing machine learning can be much more difficult than just plugging in a formula. While the system was given data to work through beforehand, it was released on the site with little forewarning and (seemingly) far too small of an initial sample data set. This meant that the model wasn’t yet ready to deal with all of the nuances associated with YouTube’s billions of hours of content, falsely flagging thousands of perfectly good videos as unsuitable and removing the ads placed on them. The system of manual reviews didn’t work either, since the team size was too small to deal with video reviews fast enough. The seven days it would usually take for a manual review meant that the majority of views (and thus ad revenue) for a video would have already been and gone. This lead to many content creators having significant drops in revenue (some as high as 99%) and some leaving the platform entirely. Aside from losing face, this is a problem for YouTube as a platform because they rely on user-generated content to form their business.
(Source)
I won’t linger anymore on this, but take it as a warning that (despite its benefits) you should carefully consider the wider context of how your system will be used and who it affects. If you don’t have enough data to put into your machine learning model to make it accurate before fully deploying it, try instead using someone else’s data to train it or even deploying it in small chunks to limit any damage caused by mistakes. 11/12
Machine learning for business holds huge potential for the future Despite its drawbacks and pitfalls, machine learning will only continue to grow and become more valuable as a practice. The uses of it are almost infinite (as long as there’s data to assess) and the benefits of having an automatically improving model are too good to pass up. Plus, the longer we use machine learning in business, the more accurate any application of it will become. Aside from the data and power requirements, the only way is up! As long as they don’t learn too much that is – we don’t quite want the Terminator knocking on our door just yet.
12/12