Oreilly AI and Machine Learning in Industry

Page 1

Co m pl im en ts of

Artificial Intelligence and Machine Learning in Industry Perspectives from Leading Practitioners

David Beyer



Artificial Intelligence and Machine Learning in Industry

Perspectives from Leading Practitioners

David Beyer

Beijing

Boston Farnham Sebastopol

Tokyo


Artificial Intelligence and Machine Learning in Industry by David Beyer Copyright © 2017 O’Reilly Media Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.

Editor: Shannon Cutt Production Editor: Kristen Brown Proofreader: Kristen Brown March 2017:

Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest

First Edition

Revision History for the First Edition 2017-03-20:

First Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Artificial Intelli‐ gence and Machine Learning in Industry, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limi‐ tation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsi‐ bility to ensure that your use thereof complies with such licenses and/or rights.

978-1-491-98388-1 [LSI]


Table of Contents

Artificial Intelligence and Machine Learning in Industry. . . . . . . . . . . . . . 1 Michael Osborne: Automation, Simplification, and Meeting the Technology Halfway Arjun Singh: Helping Students Learn with Machine Learning Jake Heller: The Future of Legal Practice Aaron Kimball: Intelligent Microbes Bryce Meredig: The Periodic Table as Training Data Erik Andrejko: Transforming Agriculture with Machine Learning

2 7 12 17 22 27

v



Artificial Intelligence and Machine Learning in Industry

Just as the dust started to settle in the aftermath of Google’s stunning victory for artificial intelligence in the game of Go, researchers at Carnegie Mellon University kicked it up once more by defeating humans in poker. In so challenging the human metier, these and other breakthroughs in speech, reasoning, and vision startle as much as they impress. Taken together, they suggest a new normal of rapid and sustained progress. In time, the excitement of research segues to its application, setting once-elegant abstractions against commercial realities. To date, a growing number of businesses centered on AI are in the process of stress-testing society, challenging its basic assumptions about labor and the economy. In a recent report, McKinsey projects that about half of today’s work could be automated by 2055. While similar studies may disagree on specifics, precision in this case matters less than the accuracy of their consensus: automation will fundamentally reshape work and, by extension, industry. The adoption of AI is, in a word, uneven. The choppy history of other major technology shifts, from the steam engine to IT, would suggest as much. The broader sweep of AI and its economic, social, and political influence will face growing scrutiny from scholars and policymakers alike. This report hopes to add to this discussion through interviews with the entrepreneurs and executives on the front lines of AI, machine learning, and industry. To begin, Michael Osborne situates the report in its historical and economic context, drawing on his research at Oxford. Arjun Singh 1


follows with a discussion of AI in education; Jake Heller takes us on a tour of machine learning and the law; Aaron Kimball illuminates the otherwise hidden world of microbes and their commercial use; Bryce Meredig describes the application of machine learning to materials and the periodic table; and finally, Erik Andrejko discusses his work at Climate Corporation and the role machine learning plays in farming and agriculture.

Michael Osborne: Automation, Simplification, and Meeting the Technology Halfway Michael Osborne is the Dyson Associate Professor in Machine Learn‐ ing at the University of Oxford. He is a faculty member of the OxfordMan Institute of Quantitative Finance and the codirector of the Oxford Martin Programme on Technology and Employment. Key Takeaways 1. The current wave of automation is of a piece with previous tech‐ nological revolutions in the social and political debate it gar‐ ners. Yet its potential impact on labor markets—and by extension, society—breaks with precedent. 2. The burden of automation will initially fall on the shoulders of the least skilled. The accelerating pace of research in machine learning and robotics, however, suggests that automation’s reach will usurp functions of ever higher skill and complexity. 3. Jobs more immune to automation possess some mix of skills that require social intelligence, creativity, and manual dexterity. 4. The varying degree to which firms can redesign tasks, jobs, and processes to take advantage of automation is an important, if often overlooked, driver of automation trends. Let’s begin with your background. I’m an engineer by background with a focus in machine learning. My academic career to date has focused largely on designing algo‐ rithms that, in one form or another, automate human work. At heart, they wrest decision-making away from people. Such algo‐ rithms can operate well beyond human capacity, examining, for example, billions of data points a piece in search of an anomalous signal.

2

|

Artificial Intelligence and Machine Learning in Industry


This technical background, in time, offered a segue to my current work, exploring the societal consequences of automating the pre‐ serve of human activity. Around 2013, I connected with the econo‐ mist Carl Frey, which led in turn to a joint paper and a renewed focus—using machine learning itself as a tool to understand machine learning as a crucial driver in industry and society at large. Can you provide some context on the history of labor and its rela‐ tionship with automation? Our assertion that work is increasingly vulnerable to automation draws fierce pushback: that is, the historical antecedents to our claim have largely been proven false. Any labor decline from break‐ throughs in automation has been consistently offset with a range of new employment. So what’s different this time? We believe the profound wave of machine learning currently sweep‐ ing through society will replace cognitive work, much as the Indus‐ trial Revolution of the 18th and 19th centuries replaced its manual analog. As machines grow increasingly adept at automating cogni‐ tive labor, the human metier correspondingly declines. The present shift need not reprise the historical pattern, in which humans redistributed to other work. As others have noted, a better analogy invokes not humans themselves, but their equestrian companion. Imagine, if you will, that you are a horse in the early 1900s. Despite breathtaking revolutions in technology over the previous hundred years (e.g., the telegraph overtaking the Pony Express, and railroads cannibalizing horse-powered travel), you might be feeling pretty happy about your prospects. In fact, the US horse population con‐ tinued to increase approximately sixfold between 1840 and 1900. Your confidence in future job opportunities might begin to seem like an idee fixe: equine labor is in some fundamental way resistant to automation. Such confidence would soon crumble under its own weight. By 1950, the US equine population declined to 10% of its 1900 level. Society had crossed a rubicon of sorts, beyond which machines could outdo horses in every relevant dimension. Our work examines how this scenario might unfold with human instead of equine labor. Humans, for example, may do better at very high-level emotional interactions. Yet it seems unlikely that such a skill (or others like it) will find sufficient demand to maintain full

Michael Osborne: Automation, Simplification, and Meeting the Technology Halfway

|

3


employment. This isn’t a conclusive prediction, but rather a plausi‐ ble outcome worthy of our attention. Assuming machines really do crowd out human labor, what aspects of work are at risk? This is the vital question. Absent the worst-case scenario, even mod‐ erate perturbations in the labor market can lead to major upheaval in society. Our work suggests the automation burden rests most heavily upon the shoulders of the least skilled—a tragic outcome considering the difficulty of retraining. We contend that new jobs will emerge from the dust of automation, but they might be a shadow of their former self. 21st-century work, by and large, may not match the skill mix and volume for a healthy replacement rate. In the absence of decisive education reform, a growing list of occupations (e.g., truck drivers, auditors, clerks in various retail situations—to name just a few) will fail to keep up. The workforce dislocation might permanently disenfranchise a meaningful swath of society, setting them adrift in an economy without demand for their time and skill. This stands as one of the key points we hope to convey to policy makers: these trends in auto‐ mation pose a real risk to already widening wealth inequality. In the coming decades, which are the “safe” jobs more immune to automation? We found three loose groupings of skills that offer some degree of protection from automation. The first of these is creativity. The abil‐ ity to generate novel ideas still remains generally out of reach for machines. The second is social intelligence. While algorithms can interact with humans via chatbots, for example, they still fall short at higher-level social functions (e.g., negotiation or persuasion). The final of these three guardrails, so to speak, centers on manual dex‐ terity—unstructured physical interaction in the world. This is fairly difficult to automate even today. The upshot of our work suggests that jobs without at least one the above bottlenecks faces material risk of automation.

4

| Artificial Intelligence and Machine Learning in Industry


How far along is the AI research community in tackling these “bottlenecks?” And which of the three do you think will be the first to succumb to machines? If I were to rely merely on technical progress, I think we’ll see advances in manipulation first, social intelligence second, and crea‐ tivity third. Advances in robotics continue to enable improved object manipulation in obstructed environments. As it relates to social intelligence, we’ve seen the reemergence of chatbots and algo‐ rithms with meaningful marks on the Turing Test. Finally, creativity itself has found expression in machines over the past couple of years, such as the DeepDream algorithms that can “paint” in a num‐ ber of artistic styles. While research continues to shatter our expectations of the possible, the technologies with the most immediate impact trace back to older work. In terms of jobs, cutting-edge research matters less than the evolving nature of work itself: what matters more is the means by which jobs, and by extension industry, can be remodeled to exploit state-of-the-art machines. It’s less about new technology, and more a question of redesigning jobs to suit the technology already at hand. Can you elaborate? Consider the typing pool of the 1950s, in which groups of workers were arrayed to take dictation and other miscellaneous tasks. These occupations now seem but a distant memory. You might attribute the demise of the typing pool to the invention of the word processor, but word processors alone were insufficient as a drop-in replace‐ ment. Firms eventually realized that while typing pools covered a wide range of tasks, their cost outweighed the benefit of the alternative (that is, whittling down the task of handling documents to a degree that employees could manage themselves). This key rearchitecture made the typing pool obsolete. Which industries and which categories of labor will experience the biggest impact from automation? In a paper published in 2013, we described a novel approach to esti‐ mating the probability of computerisation for 702 occupations using a Gaussian process classifier. Our work drew heavily from O*NET data from the Department of Labor and involved some degree of

Michael Osborne: Automation, Simplification, and Meeting the Technology Halfway

|

5


hand labeling. In the final analysis, we found that 47% of the US labor market faces the risk of automation. Over a twenty-year horizon, we found the accommodation and food services industries to be particularly high-risk; 87% of their current employment faces the real threat of automation. As an example, res‐ taurants like Chili’s are replacing some of the tasks performed by their waitstaff with tablets. At the same time, travel booking websites like AirBnB portend profound shifts in the accommodations space. In the UK alone, we’ve seen employment for travel agents drop by 50% in the last decade or so. In the case of the transportation and warehousing industry, 75% of employment is at risk—from forklift operators to hospital porters. The transportation of goods already commonly occurs in highly structured environments. For example, Amazon recently acquired Kiva systems, which astutely recognized that their robots don’t need to fully solve the SLAM (simultaneous localization and mapping) problem. Instead, the robots can make effective use of barcodes strategically placed on the warehouse floor for guidance, leaving humans the more complex task of removing items from shelves. The robot, for its part, simply moves the entire shelving unit as required. These robots reduce, but don’t fully replace, human labor. But like a thou‐ sand cuts, such reductions add up over time. If the biggest impact will be for jobs that are more amenable to being restructured, how do we gauge the “restructurability” of a given task? Toward this end, we’re exploring automation around primary healthcare delivery in the UK. Automation in healthcare is both urgent and complicated: rising costs combined with the specter of budget cuts in the UK demand some degree of automation. Our work includes ethnographic surveys and other primary research. In interviewing front-line staff, we seek to understand their views and interactions with technology, as well as opportunities for efficiency. Assessing the restructurability of a job requires a narrow aperture and a nuanced understanding of the given occupation. That said, even if a task can be automated, following through requires navigat‐ ing a web of stakeholders and norms. In the case of healthcare, for

6

|

Artificial Intelligence and Machine Learning in Industry


example, GP and patient associations chafe at certain kinds of auto‐ mation. The barriers, in other words, are many. What are the most exciting directions you expect your research and that of your peers to take in the next five or so years? Within machine learning, most recent advances have occurred within supervised learning tasks, requiring algorithms to be explic‐ itly taught (structured) tasks. I expect that in the next five or so years, we’ll begin to make more progress on the more challenging problems within unsupervised learning, in which an agent must infer properties of the world from raw observations of it; and in active and reinforcement learning, in which an agent is able to request new data so as to optimally inform itself about the world. These latter modes are much more closely akin to how humans learn, and offer the most exciting prospects for artificial learning agents.

Arjun Singh: Helping Students Learn with Machine Learning Arjun is the cofounder and CEO of Gradescope, which he built while a teaching assistant at UC Berkeley (BS EECS ’06, PhD CS ’16). He worked under Pieter Abbeel on robotics and computer vision research, including autonomous helicopter control, laundry-folding robots, and robotic perception. As a six-time TA, Arjun enjoyed working on educa‐ tional technology to improve the experience of his students. In 2012, he worked to integrate Berkeley’s homegrown MOOC platform into edX, and was also the head TA for CS188x, one of Berkeley’s first MOOCs. Key Takeaways 1. With educational material increasingly digitized, the application of machine learning can benefit students and teachers alike, whether through intelligent and automated grading, personal‐ ized learning, and other promising approaches. 2. Gradescope, one such startup in the quickly growing edtech space, uses recent advancements in computer vision and deep learning (e.g., LSTMs) to help teachers grade assignments more efficiently.

Arjun Singh: Helping Students Learn with Machine Learning

|

7


3. Gradescope and its peers are building toward a world in which students receive instant feedback and adaptive educational con‐ tent designed around their skill and understanding. Let’s start with your background. I’m originally from the Las Vegas area. I completed my undergrad in EE & CS at Berkeley, as well as my PhD, where I focused on robotics research under Pieter Abbeel. How did Gradescope come about? What was the motivation? I was a teaching assistant at Berkeley for a graduate course in artifi‐ cial intelligence a number of times. Each time, I faced renewed frus‐ tration with the grading process. It required a grading party of ten to fifteen graduate students all huddled around a table for ten or more hours—exhausting for everyone involved. To save time, we tried scanning work and grading it online, in lieu of pen and paper. The move online eliminated some of the more tedious aspects of grading (e.g., adding up scores, writing the same thing over and over, flip‐ ping pages and so on). This minor reform reduced the total time burden and, as an added bonus, made it harder for students to cheat. With digitization behind us, we can get to the more interesting busi‐ ness of applying machine learning as an aid to real automation. We’ve been hard at work on exactly that over the past few months. Can you provide context and history for the application of machine learning in the education world? At the moment, one of the most widely deployed examples in the field is likely automated essay scoring for standardized testing. These technologies extract features from the student writing and use standard classifiers to predict the score a human would give the essay. Features can include word length, word count, words per sen‐ tence, spelling, and grammar quality (similar to what you might find in Microsoft Word). More sophisticated approaches might review how a particular sentence parses (i.e., do the words fit together in a reasonable way?). WriteLab is the most advanced essay feedback system I’m aware of. They have a very sophisticated system; it is focused not on scoring, but on essay improvement. Overall, the most widely deployed sys‐ tems tend to be the least sophisticated. They work sufficiently well

8

| Artificial Intelligence and Machine Learning in Industry


to suit their use case, yet no one would mistake their output for that of a human grader. On another front, we’re seeing progress toward scaling human assessment in the evaluation of computer code. A common approach in this vein involves clustering similar student responses together, typically at the function and part-of-function level (on the order of 3–10 lines of code). Rather than providing feedback to each student piecemeal, the grader can comment on such a cluster once, which then fans out feedback to the relevant students. This approach has been applied by turns to a number of domains. Powergrading, a project from Microsoft Research, is a notable example. It first learns a similarity metric for short answer questions from labeled data. Next, the system places responses into groups and subgroups, allowing instructors to evaluate them all at once. Outside of grading, personalized learning and intelligent tutoring represent another important thread. The problem has elicited a number of different solutions, but common to them all is the goal of fully understanding a student’s skills and the knowledge required for any given question. By understanding this dynamic, an intelligent tutoring system can guide students down a path of materials and questions, constantly updating its estimate of the student’s mastery. How does Gradescope work and what’s the science behind it? Machine learning will power our soon-to-be-released “assisted grad‐ ing” feature. The key insight underlying this feature recognizes that students generally provide a bounded set of answers to a given ques‐ tion (e.g., one thousand students might answer a question in fifteen ways total). The grading assistant thus allows instructors to grade only these fifteen unique responses, rather than a full scan of the entire thousand. As a very simple example, imagine the algebra question: “What is x if 50 - x = 30?” Perhaps 800 students supply the correct answer of “20.” However, 150 students might make a mistake with the minus sign, and respond with “80,” and the other 50 stu‐ dents supply an assortment of other answers. Rather than cycling through 800 times, the instructor can mark all the correct answers at once. Furthermore, each incorrect response can still be addressed individually. As a result, the grader can allot partial credit and sup‐ ply appropriate feedback for each response.

Arjun Singh: Helping Students Learn with Machine Learning

|

9


Broadly speaking, Gradescope focuses on handwritten work, which follows directly from our use cases—in-class exams and complex homework assignments. By breaking away from the digitized out‐ put, we unmoored ourselves from the mainstay of research in our field. As a result, our current efforts draw heavily on computer vision and handwriting recognition. The first version of assisted grading is designed around shortanswer questions (i.e., at most a few words and short math ques‐ tions/answer pairs). We rely on deep learning, and more specifically, LSTMs (long short-term memory networks), to recognize the hand‐ written work. LSTMs, which have recently become very popular for a wide range of problems in computer vision and speech recogni‐ tion, are useful in cases involving long-term dependencies between elements of a sequence. In the handwriting case, this ends up having a big impact, as handwriting is often connected together, and accu‐ racy is greatly improved by recognizing full words at a time rather than individual letters. Once we are equipped with a digital representation, we group the answers together. We then employ different methods for different types of problems (i.e., we treat text-based short answer questions differently from math-based short answer questions). We currently ask the user to tell us the type of each question, but we are develop‐ ing methods to detect this automatically. In all of our work, we pay particular attention to user trust. Grades, by their very nature, are a sensitive matter, demanding accuracy and fairness. This puts the burden on us to move as close as possible to 100% accuracy. We poured a lot of effort into the instructor/user interface, letting them quickly accept or reject our suggestions, which yields more grist for our training. What were the key machine learning challenges you were forced to address? I’d start by noting that even though human graders don’t always ach‐ ieve complete accuracy, they expect it from us. From their point of view, losing a point to an algorithmic error is unacceptable. So until we meet or surpass human-level benchmarks in grading, we will maintain a human-in-the-loop. Specifically with regard to machine learning, one of the early chal‐ lenges we faced had to do with our handwriting algorithms. We real‐

10

|

Artificial Intelligence and Machine Learning in Industry


ized that existing datasets didn’t fully meet our needs. In a certain respect, they provided the complete opposite. That is, such datasets consist of a small number of writers who produce a lot of work. Grading, on the other hand, implies a lot of writers with limited out‐ put per writer. This mainly meant having to go through and label a large amount of data. Specifically, we went through our existing bank of exam submissions and transcribed the handwriting, so that we could train the handwriting recognition system. Going forward, what are some exciting new machine learning approaches you hope to apply with the data you’re collecting? Above, I mostly discussed our efforts in automating grading and freeing up instructor time to teach instead of grade. We hope to go beyond those efforts and apply machine learning to other parts of the learning process as well. When instructors grade on paper, they typically use the gradebook to record a single number for each student per assignment. This means they inadvertently discard a lot of very valuable data (namely, the reason behind every point earned by every student on every question in a course). Instead, with Gradescope, they develop a digi‐ tal rubric, a list of grading criteria with associated point values. As they grade student work, they select a subset of the rubric items to associate with each student’s answer. The software proceeds to com‐ pute a grade from the chosen rubric items. Because grading now happens digitally, we store the previously lost data. This data allows us to generate a far clearer picture of the stu‐ dent’s understanding. And unlike most other digital platforms that exclude partial credit, our rubric approach leads to a more nuanced understanding of student progress. More broadly, if you had a magic wand and could apply machine learning at will, how might you reshape education? First, we could instantly grade all work, which would confer a num‐ ber of benefits. Students would get instant feedback, enabling them to practice with an endless supply of questions per topic. Instruc‐ tors, for their part, would spend zero time grading, freed up, instead, to focus on teaching and student interaction. Second, we’d have a clear understanding of the state of every stu‐ dent’s understanding, data that would guide teachers at the student level. They’d know, for example, whether a student might need addi‐ Arjun Singh: Helping Students Learn with Machine Learning

|

11


tional practice to brush up on a particular concept. Stepping back, in such a world, teachers could measure the effectiveness of their les‐ sons and tune them in response to incoming student data. We’re already hard at work on the first problem—instant feedback for every type of question. The second problem is a bit more chal‐ lenging, for a few reasons. The first reason has to do with data. Pub‐ lishers, for example, often want to measure the effectiveness of their educational content—e.g. does this particular textbook chapter help students learn effectively? To measure this, the publisher would need to know not only whether a student read the chapter, but when: was it before the assessment or after? In the same vein, publishers might want to know whether the stu‐ dent also attended the lecture or watched the corresponding video, in addition to the telemetry from the digital textbook. As publishers increasingly switch to digital content, they’re building better data profiles in turn. Some publishers even embed multiple choice quiz‐ zes to get a coarse measure of student understanding. Most, how‐ ever, don’t close the full feedback loop with access to the written exam. Further complicating matters is the lack of an accepted taxonomy of concepts. For instance, how do you map one instructor’s teaching to another for the sake of comparison? Perhaps their terminology dif‐ fers. In other words, if both instructors tagged every lecture, hand‐ out, and homework and exam question with the associated concepts, there’s little guarantee they’ll match. As a matter of fact, you can essentially guarantee that they won’t. Recently, researchers have focused on ways to automatically tag questions with the required “skills” (or concepts), and at the same time, measure the student’s ability vis-a-vis said skills. This is currently one of the big‐ gest and more challenging areas of research in the space.

Jake Heller: The Future of Legal Practice Jake Heller is the founder and CEO of Casetext. Previously, he was president of the Stanford Law Review and a managing editor of the Stanford Law & Policy Review, and worked in the White House Coun‐ sel’s Office and the Massachusetts Governor’s Office of Legal Counsel, clerked on the First Circuit Court of Appeals, and was a litigation associate at Ropes & Gray LLP.

12

|

Artificial Intelligence and Machine Learning in Industry


Key Takeaways 1. Contrary to common wisdom, the legal field has been at the vanguard of technology adoption, including a head start on machine learning applications. 2. The vast corpus of legal knowledge lends itself to machine learning. Meaning in the field pivots on language and a complex network of cases, opinions, briefs, and so on. 3. Casetext seeks to, first, “free” legal knowledge by exposing case law that is otherwise paywalled, building a unique dataset as lawyers interact with and annotate its content. Its latest product, CARA, uses machine learning to enrich any legal document with relevant research. 4. The intersection of machine learning and law presupposes a bigger, albeit philosophical question: is the law computable? Why don’t we start with your background and how you got to Casetext? I grew up in Silicon Valley, and have been coding from an early age. My dad founded an internet company in our garage in ’94. As his company grew, I worked alongside him on weekends, nights, and summers, giving me a head start on web technology. And for the longest time, I envisioned a career in programming. My passion for code gave way to a keen interest in policy, through high school speech and debate, and then, in turn, to law. At Stanford Law School, I applied myself primarily to questions of technology law and policy. After graduating and a few years into my legal practice, I kept returning to an idea that I started thinking about in law school. I knew, both in theory and from personal experience, that lawyers spend 20–30% of their time engaged in legal research—the task of locating precisely the correct precedent, statutes, and regulations to help you win your case (as a junior lawyer, it clocked in closer to 70%). Finding the relevant precedent means combing through a rather large search space—more than ten million cases containing over one hundred million pages of text. Locating the precedent further entails situating it in the broader context of the law. Even a precedent seem‐ ingly on point might prove misleading—perhaps the case was over‐

Jake Heller: The Future of Legal Practice

|

13


turned or deemed irrelevant to your circumstances. It’s a maddeningly difficult process. To help accomplish this task, lawyers typically subscribe to legal databases, such as LexisNexis, Westlaw, and others. The price tag bites: $100 for a single search and $20 for access to a single docu‐ ment. Combined, companies in this space generate over ten billion dollars in annual revenue. The key idea behind Casetext is to remove the paywall to legal knowledge while building a business through premium research technologies. The incumbents are so expensive because their models required them to hire thousands of human editors, which drove up costs. Instead, Casetext builds on a mix of data science, natural lan‐ guage processing, and crowdsourcing, accomplishing with twenty engineers what the others produce with twenty thousand editors. What’s the history of machine intelligence in the law? Common wisdom suggests that lawyers are somehow techbackward or late adopters. This couldn’t be further from the truth. The law and its practitioners have been on the vanguard of technol‐ ogy. If you should happen to visit the Computer History Museum in Mountain View, California, you would come across a LexisNexis ter‐ minal, an erstwhile internet for legal research well before the modern public web. The same is true of machine learning: lawyers made early use of its power. It found its initial purchase in “e-discovery.” An acronym of electronic discovery, e-discovery technologies help lawyers during the “discovery” phase of litigation, where each side shares their records with the other. E-discovery software allows lawyers to parse records at large scale, as they hunt for a “smoking gun” email or an incriminating presentation. Developed almost a decade ago, ediscovery tools began including “predictive coding,” which marries machine learning and e-discovery. Human reviewers (i.e., attorneys) generate the initial flurry of training data by annotating documents for relevance. The first batch of, say, ten thousand human-reviewed documents enable prediction on the remaining million. Machine learning, in addition, has long played a supporting role in legal research. Bloomberg, Westlaw, and LexisNexis, for their part, use it to improve their search results based on clicks, views, and other behavior data from their clients. It has also been used to aid

14

| Artificial Intelligence and Machine Learning in Industry


their human editors in predicting whether a new case overturns an older one. More recently, what explains its widening use in the legal field? On the heels of the economic collapse of 2008, a wave of budgetconstrained clients balked at paying for research technology. Law firms went from “recovering” (i.e., billing their clients for) nearly all their technology costs to settling for recovering only a fraction of those costs. Worse yet, clients began to demand alternative fee struc‐ tures, shaking an industry accustomed to billable hours. Market pressure gave way to soul searching, as law firms aspired to new effi‐ ciencies, abetted by novel technology. Casetext and other companies in our category has given firms capacity to do, as the saying goes, more with less. Can you elaborate on Casetext’s approach to machine learning? Casetext provides a full-fledged legal research system that lets you do a lot of the work you might do on LexisNexis, for example. You can search the law, and you can read, bookmark, and cite cases. Such functions all make use of machine learning to some extent. Our most rigorous application of machine learning, however, powers CARA, our “Case Analysis Research Assistant.” CARA enables attorneys to simply drag-and-drop in a legal docu‐ ment, and in seconds, it will read and understand that document, returning relevant research the attorney has so far missed. This ena‐ bles attorneys to make sure they aren’t missing any key precedents as part of their research, or catch opposing counsel leaving something critical out. CARA can even help predict the thrust of the opposi‐ tion’s brief, given what you’ve worked on so far. CARA is powered in large part by a machine learning model that is trained on the network of citations between legal cases. Legal writ‐ ing, by its nature, derives from precedent—so all cases, articles, and briefs will vigorously cite prior precedent. Our citation network draws from the massive collection of articles, cases, and briefs in our database. Using machine learning, we began to discern the relation‐ ships that shape the law. When considering what sorts of techniques to use, we explored the literature on classical information retrieval sciences. From the litera‐ ture, we picked up an insight we could apply to legal research specif‐ ically: if every case that cites cases A, B, and C also cites cases D and Jake Heller: The Future of Legal Practice

|

15


E, anybody citing A, B, and C would be wise to further consider D and E. We developed a similar concept that we internally call “cita‐ tion bundles”—bundles of cases that we know to be related because they are so often cited together. However, we quickly realized that this method was not nuanced enough for the precise activity of legal research. Over time, we began to fold in more and more factors into the machine learning core of CARA, including which topics the brief covers (derived from latent semantic analysis), a relative weighting of the importance of each citation relationship, the recency with which certain cases have been cited, and literally over a hundred other factors. At what point can you predict rulings ahead of time based on the content of the briefs or the presiding judge, for example? Our goal is not to predict how courts will rule, which is more the concern of journalists and investors. (As far as investors go, the Supreme Court adjudicates at most eighty cases a year, the vast majority of which aren’t financially actionable—so the idea of using machine learning to predict how the Supreme Court will decide cases is often a waste of money for investors.) Further, forecasting judicial decisions is rather complicated. Machines, for example, have a hard time grokking ideology or a par‐ ticular political philosophy that might underpin a decision. No machine could have predicted Bush v. Gore, for example, which no human was surprised to find completely came down to the political party of each individual justice. There’s an outfit called Fantasy SCOTUS, where you can predict how the Supreme Court will decide cases, and the crowd’s predictions are compared to a machine learn‐ ing model; to date, human players best the AI systems at predicting Supreme Court rulings every time. To venture into the abstract, do you think the law is fully comput‐ able in theory? In the sense that we need human judges at all? That’s a great question, and I’m going to stake a controversial posi‐ tion compared to many in legal tech. A lot of people in my field believe that should machines be taught to understand the language and content of the law, then jurisprudence and other legal matters can be restated as a series of “if, then” statements. I disagree. First, the law is sometimes intentionally ambiguous. Con‐ gress, for example, might purposefully engineer ambiguity into leg‐ 16

| Artificial Intelligence and Machine Learning in Industry


islation for any number of reasons, including that they don’t want to take the political heat for a controversial position and are content to “let things get worked out in the courts.” Furthermore, at their core, most precedents turn on somewhat sub‐ jective weighing factors and standards. Take fair use, for example. Determining whether a use of someone’s copyright is permissible “fair use” constitutes a four-part test, and none of these factors approach mathematical certainty. Each decision, when considering these factors, may vary from judge to judge. All that said, I will admit that the more well-defined areas of the law (e.g., tax) might in fact be computable. Nonetheless, I think the most interesting, difficult parts of law are by their design uncomputable and will likely remain so for some time.

Aaron Kimball: Intelligent Microbes Since 2014, Aaron Kimball has been the CTO of Zymergen, a company sitting at the intersection of machine learning, automation and biology serving customers in the industrial chemical market. He previously worked at Cloudera on both Apache Hadoop and Sqoop, after which he co-founded the retail analytics company, WibiData. He completed both a bachelors and masters degree in computer science from Cornell and the University of Washington, respectively. Key Takeaways 1. Throughout history, humans have co-opted microbes to serve a variety of applications, from artisanal fermentation to large scale industrial output. 2. Despite advances in genomic sciences, progress toward a full understanding of microbes and their commercial use has remained slow, gated by human intuition in a lengthy process of trial and error. 3. With the goal of improving microbe performance, Zymergen replaces human intuition with machine learning and manual lab work with automation. In so doing, it can better discern the complex relationship between microbial DNA and its associated traits, making it possible to produce better microbes that can then be applied to the production of various industrial molecules.

Aaron Kimball: Intelligent Microbes

|

17


Let’s start with your background. I’ve been developing software professionally since 2008, initially at Cloudera—where I was the first engineer. During my tenure, I developed Apache Sqoop, in addition to working on Apache Hadoop. In 2010, I cofounded WibiData, where we focused on developing big data applications for the retail sector. I first met Josh, Jed, and Zach at Zymergen in early 2014 and have been CTO here ever since. My education centered entirely on computer science, first at Cornell University, followed by my time at the University of Washington. Through my work at Zymergen, I’ve gotten an on-thejob education in biology. Tell us a bit about the history of microbes and their application to industry. Microbes have been used for millennia. Although of course they didn’t know it at the time, it’s what ancient civilizations used to brew beer and wine—and what we still use today. By the 19th century, Louis Pasteur discovered the throughline that connects microbes to fermentation and to disease, revolutionizing the field of microbiol‐ ogy in the process. Over time, human civilization wielded the microbe as a versatile chemical factory: artisanal applications even‐ tually gave way to industrial-scale ventures. In addition to drugs like penicillin, many of the vitamins we buy for nutrition or the ingredi‐ ents commonly found on food labels are produced microbially. The industrial chemicals space, as it is known, amounts to a multibillion dollar global industry. Despite their enormous value, microbes still conceal their share of secrets. Decades have passed since Watson and Crick’s famous dis‐ covery, yet human understanding of biology remains limited. Zymergen hopes to change that. By combining recent advances in genomics, sequencing, automa‐ tion, and machine learning, we developed a platform that efficiently and systematically explores the vast search space for biology. As a result, many of the world’s biggest challenges—feeding a growing global population, security, climate change, and materials for safer cars—are poised to find solutions in biology. To understand why this hasn’t been done before, consider the mag‐ nitude of biology’s complexity. Scientists estimate there are 1081 stars in the universe. The number is beyond comprehension, and yet it

18

|

Artificial Intelligence and Machine Learning in Industry


pales in comparison to the 1013,000 ways in which the genes of even the simplest biological system—a microbe—can be altered. To say the space exceeds human intuition and intellectual capacity would be an understatement. Nevertheless, as is common with most scien‐ tific discovery, the history of microbe engineering is one of testing human-generated ideas. As a result, progress has been marked by epiphanies, the fruit of individual breakthroughs and error-prone lab work, rather than predictable engineering. Where does Zymergen and its approach fit in? Zymergen takes a data-driven approach, replacing manual lab work with automation, and human-generated hypotheses with machine learning algorithms. The result is a platform that comprehensively and systematically explores the search space for biology, generating a growing library of data that delivers results with increasing effi‐ ciency. Just as Google PageRank replaced human-curated search engines, Zymergen recognizes that scientists cannot efficiently query the search space for microbes using intuition alone. Today, Zymer‐ gen uses its platform to engineer microbes to make industrial chem‐ icals. While the platform can support work in other living systems, microbes offer a useful starting point. As with wine and beer, microbes naturally convert feedstock into an end product. Industry has long sought to repurpose this capability for their own products, achieving only limited success. Even though microbes are currently used for the production of some commodity goods, the cost and complexity has limited their application on a broader scale. What are the key problems Zymergen hopes to solve? Zymergen focuses on improving the performance of microbes: gen‐ erating higher yield, productivity, or other metabolic measures when applied to production of a particular molecule. Since we recognize the limitations of human intuition, algorithms and machine learn‐ ing enable us to drive each successive experiment more effectively and efficiently. Today, human understanding of the relationship between genotype (i.e., the DNA) and phenotype (i.e., the associated trait) can be described as tenuous at best. For instance, changes to parts of the genome previously believed to be unrelated to direct metabolism (the sequential conversion of feedstock to intermediate molecules

Aaron Kimball: Intelligent Microbes

|

19


and then to the desired output molecule) can have a material effect on the actual metabolic processes. Complicating matters, we have yet to develop a deterministic model of DNA-to-phenotype expression that can be modeled in software. Efforts to simulate the impact of DNA changes on the cell lack suffi‐ cient fidelity to predict whether the modified DNA will have posi‐ tive, adverse, or null impact. Improving a phenotype requires us to, in effect, reprise evolution on a vastly accelerated schedule. To that end, we run numerous trials with subtle changes in each variation of DNA we design. In the pro‐ cess, we use high-throughput capabilities to employ a more system‐ atic approach to proposing and testing genomic edits, not one based in ad hoc human intuition. Our approach further serves to identify the consequences of changes to nonobvious parts of the genome. The initial improvements come with a cost, however. Even as scien‐ tists identify performance increases, reaping marginal improvement becomes increasingly difficult. Manipulating the narrow set of genes directly involved in the relevant metabolic pathway can deliver a lot of the early improvements. To counter diminishing returns, addi‐ tional marginal improvements require exploring the outer reaches of our current understanding, the so-called “dark space” of the genome, where the correlation between DNA and function remains a mystery. Using machine learning, we can extract patterns from a large num‐ ber of trials invisible to the human eye. For example, one of the problems we need to address is that of “consolidation.” That is, examining the aftermath of a series of trials will reveal some per‐ centage as “hits”: genetic edits found to improve cell function over the common parent strain. But it wouldn’t be useful to construct a new “master strain” through a union of the various “hits” because such amalgamation is more complex than might otherwise seem; specific subsets of changes are additive in combination; others are neutral or deleterious. Addressing this problem means negotiating a combinatorially large number of distinct subsets. Which subsets of the hits do we test, knowing that creating these subsets is no easy task? We have had good success using machine learning to predict useful subsets of edits, thereby quickly narrowing the search space.

20

| Artificial Intelligence and Machine Learning in Industry


Finally, we rely on machine learning to improve the Zymergen pro‐ cess writ large, efficiently managing factory capacity. Specifically, we use it to plan which trials and DNA variations to run in our produc‐ tion factory, and, moreover, in what priority . What are some of the key challenges going forward from a machine learning perspective? Interpreting DNA is like staring at machine code for a computer system with little understanding of the instruction set and a missing reference manual. Biologists have made progress toward under‐ standing the syntax of DNA—identifying markers that delimit where genes and other functional elements begin and others end, but a complete picture of their complex interaction remains elusive. In some well-studied species, such as E. coli, researchers have cre‐ ated detailed reference genomes with “annotations” that describe the instruction set of delimiters, genes, promoters, and other functional elements, along with links to known functional impacts. In the microbial species used in industry, on the other hand, we’re stuck with rather sparse annotation sets. This means that despite a highresolution understanding of the genome, we lack the rich feature set needed for training. We currently fill this gap by combining genome data with lab trial performance test data. Unfortunately, this approach has its limita‐ tions. The results are often binary; editing gene X is or isn’t useful. Yet this simple binary answer obscures a more complex reality of causation. In some cases, by using “ladders” of changes with titrated strengths of effect, we can construct more linear gradients of cause and effect. Nevertheless, to extrapolate from there requires a new theory about the genome, which itself demands, at the very least, higher-fidelity annotations. Beyond the genome, our wet lab factory in itself resembles a com‐ plex system. Perturbing and testing genomic change chains together an intricate process with hundreds of steps, the effects of which remain invisible to the human eye. In effect, optimizing our factory for higher throughput, lower cost, load balancing, and effectiveness represents a challenge at the intersection of operations research and biology. Understanding causality in process changes requires a model that combines the lab environment in detail (as a function of sensors in addition to process specifications) with test outcomes.

Aaron Kimball: Intelligent Microbes

|

21


Figuring out the true tolerance of process changes on various steps will come only through repeated trials and statistical analysis. In this effort, we lean on process modeling techniques, such as root-cause analysis. Over time, we develop control charts for our processes and use them to model the effect of process deviations on outcomes. Predicting whether a process change will be beneficial or not can save us time and money, simultaneously adding to the precision of our work.

Bryce Meredig: The Periodic Table as Training Data Bryce Meredig is cofounder and Chief Science Officer of Citrine Infor‐ matics. Citrine’s software aggregates and analyzes large volumes of sci‐ entific data to help customers rapidly invent and manufacture new materials. Key Takeaways 1. Materials science concerns the understanding and control of matter and its properties toward practical applications. Work in the field has traditionally counted on expertise in the sciences (e.g., solid-state physics, chemistry and so on). 2. Materials problems can be reframed as data problems. By describing materials and their properties as scalars and vectors, researchers and businesses can now compute what they used to intuit. Citrine offers a platform to help companies solve materi‐ als problems using data and machine learning. 3. Academics and researchers are actively exploring the exciting intersection of machine learning and physical theory. Citrine, for their part, is actively working on incorporating physical pri‐ ors into their materials models. Tell us a bit about yourself. I’m a materials scientist by background. I studied the subject as a Stanford undergrad, followed by a PhD at Northwestern, where I focused on materials informatics—the materials analogue of bioin‐ formatics. The idea is to apply machine learning to materials data to derive new knowledge, purely from the data itself. With my science education complete, I returned to Stanford for an MBA, which

22

|

Artificial Intelligence and Machine Learning in Industry


dovetailed with my current work as co-founder of Citrine Informatics. For those unfamiliar, can you provide some background on mate‐ rials science? Materials scientists are concerned with understanding and control‐ ling the properties of matter, and specifically, matter with practical applications. Energy materials are one such application, the domain of batteries, photovoltaics, and so on. A materials scientist asks, “What materials should we use to make such products achieve our desired performance?” Historically, materials scientists train in the fundamental concepts of chemistry and physics, especially solid-state physics. This broad education centers on the observation that materials phenomena span a wide range of length scales. Our field considers matter and its behavior all the way down to the atomic scale. How do atoms, for example, sort into crystalline structures or molecules, and how do these arrangements influence materials properties? Then, we must also be concerned with the scale of everyday life: we have to manage the behavior of materials that comprise an aircraft, for instance. Commercial applications naturally follow. Companies like Boeing and Airbus might worry about fashioning a collection of materials into a fuselage. Alternatively, a company like Intel needs to drive certain performance and cost improvements in their semiconduc‐ tors, a materials endeavor that spans the entire periodic table. Prior to Citrine, what role did modern data science and machine learning play in the field? The domains of production and manufacturing led the way in the use of data and statistics, through methods such as Six Sigma. More fundamental R&D, in contrast, has traditionally been driven by domain knowledge and expert intuition. Tell us more about Citrine’s approach, in contrast. Citrine’s aim is to function as the materials-centric data and analysis platform across the manufacturing sector. Any company with mate‐ rials or chemistry-intensive products can use Citrine to improve their decision-making process around materials, ranging from materials selection for highly tailored use cases to novel material design and discovery. As an example, a chemical giant like Dow

Bryce Meredig: The Periodic Table as Training Data

|

23


might want to design a polymer or molecule with novel properties in a predictable and rational manner. Then, with the molecule design in hand, the challenge shifts to scaling the manufacturing process, which Citrine supports as well. By and large, our customers arrive with specific design goals in mind. They might want to design a lighter vehicle that achieves bet‐ ter fuel economy. Ultimately, these high-level goals are reducible to the lower level chemistry and physics of materials science. Put dif‐ ferently, the existing vehicle comprises certain alloys with particular mechanical properties. The question thus becomes how to substitute the existing materials with lighter alternatives without sacrificing the crucial mechanical properties. Complicating matters for our customers, a typical industrial applica‐ tion must juggle dozens of targets and constraints. In some cases, some of the neccessary materials need to be created de novo or modified from existing materials, in which case Citrine helps iden‐ tify the right mix of materials for the task. Where does machine learning fit in? Suppose an aerospace company requires materials for hypersonic flight. Most materials, needless to say, have not been rigorously sub‐ jected to hypersonic conditions. Machine learning is ideally suited to closing this knowledge gap. By learning how a few select materials behave under hypersonic conditions, it can provide guidance about others without such tests. In reality, many of the problems our customers solve using Citrine boil down to regression problems. Frequently, the materials proper‐ ties in question can be represented as scalars or vectors. Let’s con‐ sider thermoelectric materials to illustrate this point. Thermoelectrics generate a voltage when subjected to a temperature gradient, or vice versa. One commercial application involves har‐ vesting waste heat—the kind you’d find, for example, in the engine compartment of a car. If, in principle, you could design a cheap, easy-to-manufacture and high-efficiency thermoelectric material, car makers would adopt it in a heartbeat. They could use it to help capture engine heat and redirect it to battery charge, rather than los‐ ing it to dissipation. A key materials challenge for good thermoelectric materials is that they conduct electrons, but not heat (i.e., phonons). Typically, these 24

| Artificial Intelligence and Machine Learning in Industry


two properties strongly correlate, but we might want to find unusual materials that decouple the two effects. Posed as such, this problem provides a great test case for machine learning. We can train models to predict the key properties of a thermoelectric—screening for the rare materials that combine low thermal conductivity with high electrical conductivity. The alternate approach would require a huge number of experiments, largely driven by physical intuition, which is the usual strategy in materials design. Machine learning can be useful when applied to physics-based simu‐ lations as well. Such simulations are often directionally useful, but simulating real world effects poses a challenge. We cannot easily compose a set of tidy equations to completely describe the relevant effects. In practice, we have found that outputs of physics-based simulations can serve as useful input to machine learning programs, along with experimental observations. Can you elaborate on the limitations of such physical simula‐ tions? One standard tool in physics-based simulation of materials is den‐ sity functional theory (DFT). DFT is an electronic structure method that solves quantum mechanical equations to predict materials properties. However, real-world materials behavior happens at the scale of meters, not individual atoms (i.e., one tenth of one billionth of a meter), and most DFT simulations are constrained to treating a few hundred or few thousand atoms. In another limitation, DFT models materials at zero temperature. We, of course, do not use materials at absolute zero. As a result of these constraints, we rely on considerable approxima‐ tions and extrapolations in the application of DFT to practical mate‐ rials problems. Machine learning, in contrast, can directly model real-world materials behavior—that is, if we have the training data, which, of course, may originate in DFT. Machine learning can, for example, incorporate various finite-temperature effects in experi‐ ments left out of DFT. One would expect the ground-state electronic structure of a material to correlate in some way with that material’s properties at any temperature, in principle, and machine learning can directly take advantage of this correlation in a black-box fashion.

Bryce Meredig: The Periodic Table as Training Data

|

25


Can the machine learning inform or potentially modify the underlying physical equations? Tremendous potential exists at the intersection of physical theory and machine learning, and indeed the promise of the field is matched by its growing interest. Citrine is actively working in the integration of physics-based priors within our machine learning framework. Academic researchers making exciting contributions in these areas include Alán Aspuru-Guzik at Harvard, Kieron Burke at UC Irvine, and Klaus-Robert Müller at TU Berlin. As it happens, a recent workshop theme at UCLA’s well-known Institute for Pure and Applied Mathematics was entitled “Understanding ManyParticle Systems with Machine Learning.” Can you share an example success in the application of machine learning in your domain? Citrine’s platform has played an important role in several successful, significantly accelerated materials development efforts. While our customers deem the industrial examples proprietary, we have, in fact, published peer-reviewed case studies with academic collabora‐ tors. In one example published in APL Materials, Citrine accurately pre‐ dicted the properties of a new thermoelectric material that was sub‐ sequently synthesized and characterized experimentally. The material is noteworthy because it contains a high proportion of met‐ allic elements and exhibits surprising thermoelectric performance, in spite of this traditional disadvantage. We further demonstrated that this material draws from unexpected regions of the periodic table, yet simultaneously shares other charac‐ teristics with well-known thermoelectrics. As an exercise, this out‐ come demonstrates the power of applying machine learning to physical sciences. It builds upon and reinforces known principles, and at the same time, helps domain experts generate completely novel ideas. Separately, in a Chemistry of Materials article, we established that Citrine’s machine learning can reliably anticipate chemical systems that will form a particular atomic arrangement known as the Heus‐ ler crystal structure. This paper is important for two main reasons: First, it shows that machine learning can dramatically improve the yield (i.e., efficiency of compound discovery), in some cases from

26

|

Artificial Intelligence and Machine Learning in Industry


perhaps a few percent to over 80%. In fact, we reported the success‐ ful experimental synthesis of 12 novel Heusler compounds in this single paper alone; the entire scientific community typically discov‐ ers about 50 Heuslers per year. Second, it reveals a case in which a common chemical rule of thumb breaks down in a way that machine learning informed by more features can address. What are you most excited about in the next, say, five years around materials science and the intersection with machine learning? I subscribe to the Fourth Paradigm idea, which suggests that datadriven science is an emerging new paradigm of scientific inquiry, complementary to theory, computational simulation, and experi‐ ment. I believe that folks in our field should apply data science and machine learning to extract insights from data collected over the previous decades. We finally have the computational horsepower and the algorithms needed to unlock the mysteries trapped within data all this time. I expect we’ll see a step function improvement in the state of the art in materials science, driven by data-intensive methods. We’re already starting to see this, actually, with our current customers. Discoveries are falling out of the data, just as we hypothesized that they would.

Erik Andrejko: Transforming Agriculture with Machine Learning Erik Andrejko leads the data science and research organization at The Climate Corporation, applying large-scale statistical machine learning and data science to solve problems in a variety of domains including climatology, agronomic modeling and geospatial applications. He has a PhD in Mathematics from University of Wisconsin-Madison. Key Takeaways 1. Farming generates huge data volumes: a single crop in a single country during a single season can produce upward of 150 bil‐ lion observations. 2. Agriculture presents a unique set of challenges for machine learning, including the limited number of growing seasons and complex causality chains.

Erik Andrejko: Transforming Agriculture with Machine Learning

|

27


3. Data science, as applied to agriculture, demands a multidiscipli‐ nary approach that melds mechanistic models with more com‐ mon machine learning practice. Let’s start with your background. I started out in computer science. In the process, I realized I really enjoy mathematics, ultimately earning a PhD in pure math. These days, I work with statistics, machine learning, and data science at the Climate Corporation, where I build data products to help farmers make more informed decisions to improve their productivity and sustainability. Could you describe the history of data and analysis in agricul‐ ture? A wealth of what we now consider modern statistics originated in efforts to design and analyze agricultural experiments, tracing back to Fisher and the Rothamsted Experimental Station in the 1920s. That tradition continues today in the modern context. Today, agriculture generates more and more data from a variety of different sources, from farming equipment, satellite images, weather, and so on. It’s to the degree that farmers feel inundated with data, but don’t have the systematic ability to process it into actionable insights. Walk us through the kinds of data you work with at Climate Cor‐ poration. We work with data that touches on everything from the soil to the atmosphere, at very large volume. Consider that a single crop in a single country during a single season can generate upward of 150 billion observations, when combining the environment, farmer practices, and the measurements of the crop. To understand how the environment impacts agriculture requires understanding the impact from each of these data sources on a layer-by-layer basis. Generally, it’s not enough to merely highlight correlation. To succeed, we need to apply sophisticated statistical and machine learning techniques in order to tease out root causes. What kinds of experts comprise these teams? Our organization is composed of teams representing a rather broad range of domain scientists who work directly with specialists in 28

| Artificial Intelligence and Machine Learning in Industry


machine learning and statistics. Some of these teams focus on build‐ ing models to understand atmospheric processes, such as meteorol‐ ogists or atmospheric physicists. Others work in different specialties —for example, soil hydrology, soil physics, and biogeochemistry. In addition, we employ experts with a deep understanding of the crop, such as crop physiologists and crop breeders. Finally, given the sorts of data sources involved, we need scientists with experience analyz‐ ing physical and biological systems through remotely sensed plat‐ forms, including satellites or aircraft platforms. How does machine learning in agriculture differ from more typi‐ cal applications? I believe there are two meaningful differences. The first is the com‐ plexity of the domain and, by extension, a limited number of avail‐ able trials. A farmer will typically encounter about 40 growing seasons over their career. This limitation practically constrains what farmers can learn over time, which in turn presents a unique challenge for machine learn‐ ing. There just isn’t much opportunity to experiment with new approaches. In this context, the risk associated with a bad decision multiplies. Getting it wrong even once can have material conse‐ quences for a farmer’s business and prospects. Making matters worse, while we often develop and back-test mod‐ els, unlike our peers in other settings, we can’t rely on online testing. For example, we can’t easily select models by performing a short term A/B test in an online setting, as it generally takes one growing season to observe the predicted outcomes. Instead, we rely on care‐ fully designed field trials, which means actually venturing out into the field to collect data for our tests. Getting back to your original question, I believe the other key differ‐ ence is the visibility and transparency required of our models, as opposed to other applications. As an example, consumer web mod‐ els tend to be primarily driven from the data and rarely expose their logic to the end consumer. In contrast, our models come preloaded, so to speak, with a plethora of background knowledge derived from multiple scientific disciplines. Furthermore, when presented to the farmer, we can expose not only the recommended course of action, but the reasoning behind it.

Erik Andrejko: Transforming Agriculture with Machine Learning

|

29


We find that farmers are excellent in using model-based thinking to consider counterfactuals—e.g., what would happen if I planted a crop like last year, but with some permutation in the weather, like a wetter early season? By accurately capturing conditional depend‐ ence, these models gain the trust of the farmers with significant domain expertise. Can you elaborate on how your models bake in scientific knowl‐ edge? Consider, for example, how precipitation affects crop yield. Cer‐ tainly, more precipitation is better than a drought, but excess water is equally harmful. At present, we know precipitation affects crop growth and development by interacting with the soil through a vari‐ ety of latent processes that ultimately link precipitation to crop yield. This means a machine learning model that uses precipitation and crop yield data can be enriched by feature engineering these wellunderstood scientific principles. Sometimes we express such scientific principles statistically. For example, precipitation can be modeled as a stochastic process. We might postulate that precipitation will impact a latent moisture state informed by a relevant soil measurement. We can encode this back‐ ground information directly into the structure of the model to cap‐ ture relevant structure as part of the model itself. These models typically can be trained much more quickly, while holding fixed the amount of data. Can you elaborate on the kinds of machine learning approaches you use in your work? I’d preface this by noting we make extensive use of a wide variety of models, and typically build composite models that draw from multi‐ ple approaches. This includes a large number of mechanistic mod‐ els, which are relatively uncommon in most machine learning shops. Mechanistic models attempt to capture the underlying physi‐ cal understanding or the underlying causation using a mathematical approach. Modeling the motion of billiard balls on a pool table using Newton’s laws of motion is a good example of this. Most people are familiar with atmospheric models, which model atmospheric physics to help forecasters explain how two neighbor‐ ing regions in the atmosphere, with different temperature and pres‐ sure, will interact. These models, in other words, help predict what

30

|

Artificial Intelligence and Machine Learning in Industry


will happen at the boundary layer between the two. We use mecha‐ nistic models like this in cases where the underlying physics is well understood, as is the case with systems like the atmosphere and the soil. The mechanistic models are almost always coupled with a machinelearned model either on the input side (e.g., a mechanistic soil mois‐ ture model that consumes a statistical rainfall forecast model) or on the output side (e.g., a mechanistic soil moisture model used as an input feature to a machine-learned crop yield model). Coupling different types of models together helps us to maximize both the predictive power of models (what is likely to occur) with the explanatory power of models (why something could occur), ach‐ ieving a good trade-off between the two. Where do you see machine learning in agriculture going over the next five years? I think we are at an inflection point in the application of machine learning technology in many fields. In particular across several domains, machine-learned models are performing at the level of human experts—and often exceeding it. This is particularly true with the application of deep neural networks to things like image classification in pathology and natural language translation. As I see it, the challenge for machine learning in agriculture over the next five years will be to adapt and apply these rapidly evolving tech‐ niques to the domain. In particular, it will be essential to connect the large volumes of time-series environmental data (with geospatial data collected from proximal and remote machinery) together with the genetic data that describes the crop. The techniques that have shown promise with these classes of data individually will need to be adapted and extended to work with these types of data in an integra‐ tive context. I anticipate that we will see a number of successful applications of deep neural networks to this and similar types of problems.

Erik Andrejko: Transforming Agriculture with Machine Learning

|

31


About the Author David Beyer is an investor with Amplify Partners, an early-stage VC focused on the next generation of technical founders solving overthe-horizon problems for the enterprise. He began his career in technology as the cofounder and CEO of Chartio.com, a pioneering provider of cloud-based data visualization and analytics.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.