CAN MACHINES WRITE?
What narrative science will mean for you by Larry G. Anderson
12 DEPAUW MAGAZINE SUMMER 2013
SUMMER 2013 DEPAUW MAGAZINE 13
A computer didn’t write this story. But it could have, and you likely wouldn’t realize that the author was a machine. The technology exists. In fact, there’s a good chance you’ve already read machine-generated stories and were unaware they were not written by a human. (Two sample stories are printed on page 18. Can you tell whether they were written by a human or computer?) Nathan D. Nichols ’05 is on the leading edge of making that happen. By applying a journalistic approach to an unprecedented and humanly unmanageable amount of data, Nichols and his fellow engineers and editors at a company called Narrative Science are harnessing computers to explain what the data means in a way we can understand: stories. Welcome to the emerging field of “narrative science,” which involves embodying in a computer what human journalists know how to do. The result is a machine that automatically generates stories that make sense to people reading them. The number of potential stories far
14 DEPAUW MAGAZINE SUMMER 2013
exceeds human time and staff available to create them. Ironically, in the process, computers are actually making the massive amounts of information more personalized for us, Nichols says. For example, Narrative Science currently generates more than a half-million stories about Little League games, complete with names of the young players and their accomplishments in the stories and headlines. Computers also can sift through complex financial data, select the important items and explain them in an understandable narrative specific to an individual’s investment situation. “The world is being rendered ever more vividly in data, and with a little programming, journalists can transform bits to beats,” says Mark Hansen, director of the David and Helen Gurley Brown Institute for Media Innovation and professor of journalism at Columbia University Graduate School of Journalism. “Narrative Science takes
this creative partnership a step further. They seem to be extending the reach of the computer from story creation, from source of inspiration, straight through to presentation.” As lead architect at Narrative Science, Nichols designs software systems and works closely with the editors to improve the quality of the written product. Nichols and a colleague, Andrew Paley, developed an interface for Quill, the company’s platform, and Nichols focuses his efforts on improving its authoring capabilities. Most of the stories Narrative Science generates are in sports, where the company got its start, and in finance, which has large amounts of data readily available. Clients include the Big 10 Network and Forbes. “In sports we cover previews, recaps, in-game tweets and league-wide updates across baseball, softball, hockey, golf and soccer for teams ranging from Little League to professionals,” Nichols says.
“We write a variety of financial stories, including company-based profiles, quarterly earning recaps and immediate stories based off what’s happening in the stock market right now.”
All you have to do is read
Narrative Science’s purpose is expressed well in a statement on its website: With spreadsheets you have to calculate. With visualizations you have to interpret. But with stories, all you have to do is read. Kristian J. Hammond, the company’s chief technology officer and director of the Information Lab in Northwestern University’s computer science department, explains it this way: “The reality is that businesses have gathered data for years and are struggling with how to pull out insight from that data. And once that insight is pulled out, how can it be explained to people who might not understand data analytics, data in raw form, or even data as a series of images, such as graphs and charts? “But most people can read. If you can explain what’s going on with the data, suddenly you can take a process that used to involve people going to a computer and having to understand the computer in its terms, and you turn that into a process by which the computer comes to us and articulates what is going on with the data so that we can better understand our world.” That’s what convinced Nichols to commit to help develop the company. Narrative Science grew out of a system called StatsMonkey, which wrote baseball game recaps from box-score data. Graduate students at Northwestern’s Medill School of Journalism developed StatsMonkey in a course taught by Hammond, Nichols’ graduate school adviser.
Nathan D. Nichols ’05 While Nichols was a graduate student in computer science at Northwestern, he put together a personalized news show with animated anchors and text-to-speech recorded voices. His work was similar to
and-butter of stories that Narrative Science generates now. Nichols’ work on machine-generated content influenced development of StatsMonkey. Nichols was completing a doctoral degree around the time that the StatsMonkey concept was expanded into a new company devoted to doing similar story generation not only in sports, but also across other fields, including finance, real estate, hospitality services and healthcare. “My original plan,” Nichols says from the company’s offices overlooking Lake
With spreadsheets you have to calculate. With visualizations you have to interpret. But with stories, all you have to do is read. StatsMonkey and Narrative Science in that computers automatically generated interesting and useful news content. But they were all created without numerical data – the kind of data that’s the bread-
SUMMER 2013 DEPAUW MAGAZINE 15
NATHAN D. NICHOLS ’05 Lead architect, Narrative Science Major: computer science Minors: mathematics and English literature Rector Scholar Member of Honor Scholar and Science Research Fellows programs DePauw activities • Served in several positions at Delta Chi fraternity Further education • Northwestern University, M.S. and Ph.D. degrees in computer science • Member of Information Lab in the Cognitive Systems (Artificial Intelligence) division of Northwestern’s computer science department Related experiences • Designed and taught a course in Mobile Application Design and Development at Northwestern University • Designed and taught a half-course in Web Programming for Journalists in the Medill School of Journalism at Northwestern Outside interests • Performed improv through Second City Favorite DePauw professors • Doing research with David A. Berque, professor of computer science, convinced him to attend graduate school in computer science • Andrea E. Sununu, Raymond W. Pence Professor of English, was unrelenting in what she expected from his writing and thinking Favorite DePauw memories • Symposium style of Honor Scholar classes he took • Hanging out on the porch of Delta Chi fraternity
16 DEPAUW MAGAZINE SUMMER 2013
Michigan in downtown Chicago, “was to work at Narrative Science as a contractor during the summer and maintain a consulting position when I began as an associate professor at Northwestern in August. After just two weeks with Narrative Science, I realized that it was a once-in-a-lifetime possibility to be involved with a startup so interesting and important that early in its stage.” He signed on full time. The company has grown from eight employees to 45, including some in New York City.
Data designed for you
Nearly everyone recognizes that the amount of information accumulated in computers today exceeds our capability to deal with it. Companies don’t have sufficient budgets or journalists to turn it into understandable stories. What better solution than to harness the same machine to search the data, select only the important items and explain it to us in terms we can understand easily? “We’re just buried in data,” says Jon Fortt ’98, technology correspondent for CNBC in San Jose, Calif. “Computers are constantly churning this stuff out: weather data, market data, sales data, surveillance data. To be useful, someone or something has to sift through all the bits of information and decide what’s actually going to be useful, what’s going to help us make better decisions. “Software that can handle language like a person will continue to be a hot area,” Fortt adds. “We’re only creating more information – what scientists call ‘unstructured data.’ The more digital text, audio and video we create, the more we’re going to need computers to explain to us what it means. Those explanations will come in the form of spoken responses, and sometimes in computergenerated stories.”
What really excites Nichols about narrative science is the extraordinary opportunity to personalize the data out there – to bring it to an individual level and make it relevant in a way not now possible. “Computers have helped some things scale up so much,” Nichols says. He cites the transmission of medical records between hospitals as an example. “Computers are great at talking with each other with data, but we’ve lost a lot of the human touch, and it doesn’t seem likely that there’s a way to get back with actual humans. We’re not going back to a time when your family doctor can sit with you for an hour to answer your questions and explain things. And even if doctors had time to talk in detail with every patient, they may not be familiar with the most recent research or treatment options.”
stories that include players’ names and actions. The personal value for players, their family members and other participants in these sporting events is reflected in the many thank-you notes received from appreciative parents and fans. Stories can be just as meaningful for people who want their financial information explained in understandable terms and that spell out just what market fluctuations and financial instruments mean for them personally. Those are already happening. Next on Nichols’ wish list at Narrative Science is to tackle the complex phenomenon of health insurance and medical/hospital billing by creating stories that clearly explain the details. “It is something that is super exciting for me and the company,” he says. “It is in our vision of democratizing a lot of that understanding.” But patients deserve an explanation that they can understand, he says. “One of the things we’re trying to do, then, is to recreate that personalized experience. We’re really excited about
it’s about how well your congressperson is representing you, it’s about how your retirement account is doing. The machine is what lets us reach this scale of personalization.”
“We’re just buried in data. To be useful, someone or something has to sift through all the bits of information and decide what’s actually going to be useful, what’s going to help us make better decisions.” – JON FORTT ’98, technology correspondent for CNBC solving the problem of how you scale humanity’s collective expertise – in healthcare, education, public policy, finance – down to the level where it’s not an academic paper, it’s not aggregate statistics; it’s a story created just for you. It’s about your health prognosis, it’s about how your child is faring in school,
The half-million Little League and other sports stories generated for a company called GameChanger illustrates the personalization possible through the power of computers. As soon as Little League game results are posted, Narrative Science can generate
From classroom to newsroom Consider how parents could benefit from stories that provide individual feedback on their student’s academic performance, or that clearly explain the meaning of the student’s test scores and other information for which the teachers, parents or both may not have time to discuss face-to-face. Narrative Science has already worked with a journalistic organization, ProPublica, to allow parents to access stories that describe programs at their schools and compare them with other schools. ProPublica, an independent, nonprofit newsroom that produces investigative journalism in the public interest, was the first online news organization to win a Pulitzer Prize – the first ever awarded to a body of work that did not appear in print. ProPublica worked with Narrative
SUMMER 2013 DEPAUW MAGAZINE 17
Human or computer? Which wrote the stories below? Putnam City North falls to Stillwater 6-4 in spite of Brantley’s performance Nathan Brantley did all he could to give Putnam City North a boost, but it wasn’t enough to get past Stillwater, as Putnam City North lost 6-4 in seven innings at Santa Fe on Wednesday. It was a good day at the plate for Putnam City North’s Brantley. Brantley went 2-3, drove in one and scored one run. He singled in the second and sixth innings. Hunter Heffington had an impressive outing against Stillwater’s lineup. Heffington held Stillwater hitless over 1 2/3 innings, allowed no earned runs, walked none and struck out two. The top of the second saw Putnam City North take an early lead, 1-0. Brantley singled to ignite Putnam City North’s offense. Blake Seibert singled, bringing home Heffington. Stillwater went up for good in the third, scoring three runs on a two-run double by Joe Smith and an error. Stillwater built upon its lead with three runs in the fifth. Scott Williams started the inning with a double, plating Dan Johnson and Eric Welch. That was followed up by George Segal’s double, plating Williams. Three runs in the top of the sixth helped Putnam City North close its deficit to 6-4. An RBI single by Brantley, a groundout by Heffington, and a steal of home by Brantley triggered Putnam City North’s comeback. Chuckie Lundeen struck out to end the Putnam City North threat. Powered by Narrative Science and GameChanger. Copyright 2012. All rights reserved.
18 DEPAUW MAGAZINE SUMMER 2013
East City Senior High School East City Senior High School, part of the Community Schools of East City district, is located in East City, Ind. The school reports enrolling 860 students in grades nine through 12, and it has 63 teachers on staff. East City Senior High School is above the state average but below the district average in terms of the percentage of students eligible for free or reducedprice lunches. On average, 43 percent of students in Indiana are eligible for free or reduced-price lunch programs, whereas 52 percent of East City Senior High School students do. At the district level, 62 percent of students are eligible. ProPublica’s analysis found that all too often, states and schools provide poor students fewer educational programs like Advanced Placement, gifted and talented programs, and advanced math and science classes. Studies have linked participation in these programs with better outcomes later in life. Our analysis uses free and reduced-price lunch to estimate poverty at schools. We based our findings on the most comprehensive data set of access to advanced classes and special programs in U.S. public
schools – known as the Civil Rights Data Set – released by the U.S. Department of Education Office for Civil Rights. East City Senior High School offers 10 AP courses, and 8 percent of students participate in those classes. The school’s pass rate for AP exams is the same as the district’s, both at 23 percent. A school’s AP pass rate is determined by the number of students who both sat for AP exams and passed some or all of those exams. East City Senior High School’s enrollment rates in chemistry, physics and advanced math subject areas are 4 percent, 2 percent and 7 percent, respectively. Gifted and talented at the school has an enrollment rate of 23 percent. West City Community High School, in West City, Ind., is a lower-poverty school than East City Senior High School, with 4 percent of its students qualifying for free or reduced-price lunch. The school offers 20 AP courses, and 38 percent of students are enrolled in those classes. Generated by Narrative Science for ProPublica.
Science to update a news application called The Opportunity Gap. Using data from the U.S. Department of Education, the app allows people to explore data about access to educational opportunities such as AP courses, gifted-andtalented programs, higher mathematics, chemistry and physics. Results, however, were displayed only graphically. Scott Klein, senior editor of news applications for ProPublica, explains, “Narrative Science created narratives, which were added to every page – 52,000 of them – that allowed a reader who was confused by the graphic, or who would rather read than look at a graphic, to read a story that described the educational opportunity at that school in comparison to others in the state. His team at ProPublica builds software to do the work of journalism, Klein says, “to help us gather data, analyze and refine it, and present it. Think of it like how a photojournalist
journalists to spend more time working in areas where data is not yet flowing online.” The notion of what a news story is will shift, he adds. “News stories will actually be more related to us. That is, data about us will be integrated into them.
“Computers are actually making the massive amounts of information more personalized for us.” – Nathan D. Nichols ’05 uses a camera. We ask questions of the data, talk to sources and make the data tell a story. We wouldn’t want to replace human journalists because we are ourselves human journalists.” The vision for narrative science, both the lowercase field and uppercase company, is not to eliminate journalism, but to enhance it. “Eventually, there will be, either literally or virtually, a narrative science box in every newsroom in the world,” Hammond says. “That box will be given ongoing flows of data and write stories in their language, with their analysis and their voice. They will allow
We’ll start seeing more stories that have personal connections to our lives.” For example, remember the Little League game stories that Narrative Science already produces? Hammond wants the company to eventually write stories for every Little League game in the country – and not only in English, but also in every language so each young player’s family and friends can read it anywhere in the world.
Sounding human
But can machines write? Really write, that is? It’s a paraphrase of the iconic
question first posed by the British mathematician Alan Turing more than 60 years ago in his seminal paper, “Computing Machinery and Intelligence.” The paper is generally regarded to have established Turing as the founder of computer science, and the question he asked in it was “Can machines think?” Turing decided that question could not be answered. He revised it, which led to what has been known since as the Turing test, a test of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Computers have come a long way since then. “A lot of our content is actually indistinguishable from something a human being would write under similar circumstances, and we’re actively working on ways to make them sound even more human-like,” Nichols says. “But the exciting and interesting thing for me isn’t that our system passes some kind of journalistic Turing test, but that it does it while providing genuine Continued on page 32.
SUMMER 2013 DEPAUW MAGAZINE 19
Narrative Science continued from page 19. meaning and understanding to people.” That is, from the vast amount of data available, people need to know certain things: What will it do for me? How will health insurance premium changes affect me? How does a rezoning bill impact me? What do fluctuations in the market mean for me? “Many people are left with an
inscrutable bank statement, confusing pamphlets about high blood pressure, frustration about their child’s math grades in school. The data and expertise is there to tell these stories now and to help people understand the world around them,” Nichols says. “The problem is limited human resources; the financial adviser, doctor and teacher have too many things to do, too many other people to talk with. To me, that’s exactly where narrative science is most valuable. There is no longer a need for the adviser, doctor or teacher to explain the story and meaning in their data.” A computer can do it faster, too. While the writer of this feature story labored long over his interview notes for what
32 DEPAUW MAGAZINE SUMMER 2013
to include, a computer would determine the important data to include quickly, and this feature would have been completed much earlier. But would that story have reflected human creativity? Based on the standards Turing was working with 60 years ago, today’s computers can think, CNBC’s Fortt believes. “Turing wondered whether a human could engage in a conversation with a computer and be unsure whether
the voice belonged to a real person or not. In a lot of ways, we’re there. I mean, Watson (IBM’s famous computer) is beating people on “Jeopardy,” responding to real spoken questions,” Fortt says. “I think what’s less clear is a computer’s capacity for creativity and imagination.” The latter might still require a human being. “We’ve entered an era where communication with computers is suddenly about much more than typing and printing,” Fortt says. “If you think about it, that’s mainly where we’ve been for a long time. The way we got information into a computer was through a keyboard, and the way we got information out was to print it. Now, largely driven by smartphones and the
Internet, we’re talking to computers more. “It sure would help if they could understand us better.”
Data as language
“Narrative Science’s automated representation of data as language is new and extends the creative partnership between computing and journalism in exciting ways,” Columbia University’s Evans says. “The question might not be whether computers will replace human journalists, but rather: What does journalism look like in a world in which computers have observational and, now, expressive capabilities that can augment the practice?” If you read the two sample stories printed on page 18 you learned it was written by a computer. Could you tell? More and more, the machine is coming to us. It’s the machine doing its job to serve us better, Narrative Science’s Hammond believes. One of the ways it can serve us better is to talk in the language we use. Will machines be able to think the way we think? Probably not exactly the way we think, but in similar ways, he says. “Part of our job is to make sure the computer can communicate what it knows about the world to us. By teaching it how to reason about the things it knows about, it will become smarter.” Computers will inevitably end up being much smarter than we are, Hammond says. “A computer can reason and has access to every fact that ever was, and every document that ever was, and more information about the world than we can conceive of, but it will be able to deal with it. That will be a phenomenal world.”
Read more about Narrative Science at www.narrativescience.com.