PURA

Page 1

DATA VISUALIZATION

ISSUE 6 DEC 2015

3

Model studies:

Urban Heartbeat Colors of the Street Patricles in Ocean

Paints to Pixels By Jacoba Urist

By Nick Brown




“

THE

Purpose

of visualization


DATA VISUALIZATION

ISSUE 6 DEC 2015

is insight,

not pictures. -Ben Shneiderman


Contents

16

09

08 FEATURES 10 Visuals Variations Neverending Transformation of Data Over Time

16 Paints to Pixels Growing field of Data-Art as the Future of Data

22 Why Can’t Americans Find Out What Big Data Know About Them? Data Privacy should equals Protection

WHAT’S NEW 06 The benefits of Data Lakes Madness on manipulation

07 Designing Data Driven Interfaces Tell the story differently


25

10 RECAP 26 Solving Poverty Needs Data Visualization Data can save lives but...

27 Can You Tell That My Email Sounds Angry? Big Data is not Big Context

MODEL STUDIES 08 Colors of the Street Descirbes Cities through Streets’ Differences

09 Particles in Ocean Flow of the Piles of Garbages

25 Urban Heartbeat Foursqaure’s Visualization of Cities



DATA VISUALIZATION Letter from Publisher

PURA, deriving from the word “plural”, is the revolutionary adjective of Big Data. As you have noticed from previous issues, Big Data got unfold to data science as a new field of study and to data visualization as a new form of explaining, summarizing, and showing data. According to the past, data are purely the information collected in science research and used as evidence to prove the viability of new findings. In current days, as it was imprinted in all of our heart, data expands itself in forms, sizes, and usability. As in PURA, my team and I have always strive for newly evidences that data comes innovatively of its new existence in every of PURA’s issues. We are open-minded and full of curiosity in new summarization of modern lives. We also aspire for groundbreaking and/or advanced technology in helping all of us to think differently. Thank you for all the support for us to proceed on. We are proud and happy to continuously provide the ingenious approaches in interpretation.

-Angel Choong, Publisher



By Nick Brown

Looking across the old gold in the form of Willard Cope Brinton’s 1939 book Graphic Presentation, many example graphics uncovered. It’s available to read online.


12 / PURA ISSUE 6

T

here must be 1,000 varieties of charts and graphics catalogued in the 500 page book, and despite being 75 years old, it still feels incredibly relevant today. I was particularly struck by the introduction titled Magic in Graphs. If you look closely, you’ll even find examples of arguments the blogosphere is still fond of having today, and early versions of charts that you thought were invented just yesterday. Hubbard closes the introduction on a forward-looking note (emphasis mine): Wherever there are data to record, inferences to draw, or facts to tell, graphs furnish the unrivaled means whose power we are just beginning to realize and to apply.

“There is magic in graphs. The profile of a curve reveals in a flash a whole situation — the life history of an epidemic, a panic, or an era of prosperity. The curve informs the mind, awakens the imagination, convinces.” – Henry Hubbard Clearly, Hubbard was excited about the future of graphic presentation. Surely it would bring multitudes of innovations as we established a grammar of graphics and figured out how we should best present data for consumption. While there were 1,000 chart varieties in the 1939 edition, who knows how many there would be by 2000?

data visualizations, I think Ed is mostly right, with a few exceptions. That doesn’t mean we haven’t innovated though— there has been tremendous progress in and around the graphic forms that were already established in 1939. To better understand the progress in the years since Brinton’s book was published, first we need to understand how we got there. When were all these graphics invented anyway? Is it fair to expect a boom in new chart types every decade?

Data visualization’s past It is easy to think of the basic charts and graphs of today as universally understood forms that have existed forever. But many chart types that we use everyday (whether as visualization professionals or as students using the Excel chart wizard for the first time) are relatively recent inventions. Working off of this nice taxonomy of chart types, I tracked down the earliest known dates for as many visualizations as I could. As you’ll see from the timeline below, recognizable modern charts began to be used in the mid-18th century (William Playfair is famously credited with the invention of the line, bar and pie chart), and the rate of innovation was rapid through the next 100 years. The years between 1812–1855 in particular featured three of themost famous and influential visualizations of all time: Minard’s March on Moscow (a Sankey diagram before Sankey was born), Snow’s dot distribution map of the cholera epidemic in London, and Nightengale’s polar area diagram of the causes of death in the Crimean War.

Unfortunately, my friend Ed has bad news for Hubbard — after reading through Graphic Presentation he exclaimed, “I’m really shocked at how little innovation there has been in 2D data visualization in the last 80 years!”.

After this furious pace, the rate of innovation as measured by brand new visualizations slowed substantially. There are notable exceptions (histograms, cartograms, heatmaps, etc.) and I am purposefully trying to stick to general purpose graphics —there have been quite a few domain specific visualizations that have proven very popular, such as these information-dense circular diagrams of genomic data.

Always the optimist, I’m here to take up the challenge. What’s new since Brinton’s book? In the world of static, 2D

That said, rather than focus on the slowdown in the introduction of broadly used, standard chart types, let’s look at the latest era of data


MARCH ON MOSCOW Creators: Minard POLAR AREA DIAGRAM OF THE CAUSES OF DEATH IN THE CRIMEAN WAR Creators: Nightengale


14 / PURA ISSUE 6 visualization with a wider lens. We’ve made much more progress than a count of new tabs on the Excel chart wizard would imply.

The last 75 years The history of the last 75 years of data visualization runs in close parallel to that of many industries and fields, and it is unsurprisingly dominated by computers. The key innovations have not been in what the finished product of a data analysis looks like, but rather in how it is created and what readers can do with the chart when they see it. Computers have made data astronomically easier to gather, process, analyze and manipulate, leading to expanded data visualization capabilities. This has enabled us to do four key things with data that Brinton could have only dreamed of: 1. Visualize data in real-time. Most commonly seen in today’s omnipresentdashboard displays, real-time analysis and presentation of data is a huge step forward from the kind of slow and painstaking work required to generate a single good chart before computers. Compare the ease with which you can spin up a real-time dashboard today with Brinton’s diligent approach — graphics were so time-intensive to create that he included entire chapters on the selection of quality paper and binding techniques. 2. Show a lot of data. This chart made waves for visualizing 10 million Facebook friendships. The translucent connections between friends overlap and brighten in areas with high Facebook friendship density, producing a beautiful pseudo-population map as the output. Good luck drawing that by hand. 3. Present data in motion. While related to the raw ability to visualize a lot of data, time series data in particular benefits from the fact that it can be played back as an animation. For example, this video visualizes 24 hours of flights in North America. Motion takes this visualization to another level — it is much more compelling than simply showing a count of how many flights took off and landed at each location. 4. Allow your audience to interact with the data. There have been fantastic developments in recent years in allowing readers and consumers of data visualizations to interact with data in meaningful ways. Interactive data apps allow the reader to not only see that story, but to discover more stories waiting for them in the data set.

Data visualization’s future Using the kind of exploratory tools that have recently become available, you can easily recreate that analysis for your favorite actor or look at grouping on different variables. There was magic in graphs in 1939. Now, thanks to the power of computers, we have the ability to chart more data, faster, and we can engage the audience with animated or interactive visualizations — there is more magic in graphs than ever before.


MOBILITY Creators: Thomas Clever, Gert Franke


16 / PURA ISSUE 6

Paints Pixe nts Pixels TO

TO

By Jacoba Urist

A growing number of artists are using data from self-tracking apps in their pieces, showing that creative work is as much a product of its technology as of its time.


It’s

unlikely Claude Monet would have been Claude Monet without the portable paint tube, which allowed him to work outside and experiment with capturing natural light. Andy Warhol wouldn’t have been Andy Warhol without the modern movie star or the mass-produced Campbell’s soup can. Art is as much a product of the technologies available to artists as it is of the sociopolitical time it was made in, and the current world is no exception. A growing community of “data artists” is creating conceptual works using information collected by mobile apps, GPS trackers, scientists, and more. Data artists generally fall into two groups: those who work with large bodies of scientific data and those who are influenced by self-tracking. The Boston-based artist Nathalie Miebach falls into the former category: She transforms weather patterns into complex sculptures and musical scores. Similarly, David McCandless, who believes the world suffers from a “data glut,” turns military spending budgets into simple, striking diagrams. On one level, the genre aims to translate large amounts of information into some kind of aesthetic form. But a number of artists, scholars, and curators also believe that working with this data isn’t just a matter of reducing human beings to numbers, but also of achieving greater awareness of complex matters in a modern world. Art confronts the uncertainty of human existence: Why am I alive? What makes me different from anybody else? Handprints made some 40,000 years ago, are a common feature of Upper Paleolithic cave art—a kind of prehistoric selfie. National Geographic describes the early artists as sending a timeless message: “Like you, I am human. I am alive. I was here.” So it’s unsurprising that many data artists are responding to an increasingly data-saturated culture. After all, almost every human interaction with digital technology now generates a data point—each credit-card swipe, text, and Uber ride traces a person’s movements throughout the day. The smartphone, as The Economist recently described, is a true personal computer, the defining innovation of the era, on par with the mechanical clock or the automobile in past centuries. In turn, there’s been what the Pew Research Center calls a selftracking explosion, whether it’s counting the number of calories or using a mood app to glean patterns in one’s mental state. Like a fingerprint, no two people have the same data set. A couple sharing a bed follow independent sleep cycles. Friends who spend the day together count different steps; their phones connect to different IP addresses. But what’s more remarkable is the


18 / PURA ISSUE 6

FRICKbits Data Art Take back your data and turn it into art. Your data is now art on your iPhone, from artist Laurie Frick. You make the ultimate ‘data-selfie’. Don’t be a data victim, it’s your life, it’s your data... why not turn it into art?


idea that within all of these numbers lies a better way of understanding ourselves. The information doesn’t just provide a broad document of a life lived in the early 21st century: It can reveal something deeper and even more essential. One data artist who believes this is Laurie Frick, who splits her time between Austin and New York City. She came to art circuitously, after spending 20 years in tech working for HP and Compaq and co-founding a software company. Frick believes that while numbers are abstract and unapproachable, human beings respond intuitively—and emotionally—to patterns. Unlike many of her peers, Frick has no assistants. She uses self-tracking data to construct objects and large-scale installations, including one called Floating Data that’s about two stories tall and made from 60 anodized aluminum panels that represent her walking patterns. Frick used her own records, gathering steps on her Fitbit and combining it with location data from the online program OpenPaths and her iPhone’s GPS.


20 / PURA ISSUE 6 “I drew a little track that tries to capture the experience of walking speed, and the feel of walking through a busy neighborhood near my apartment in Brooklyn,” she explained. In a series called Moodjam, Frick took thousands of Italian laminate countertop samples from a recycling center and created a series of canvases and billboard-sized murals based on her temperament. For weeks, she manually tracked her feelings, using the online diary Moodjam, which allows users to express their emotions in color patterns. The smaller Moodjam pieces capture only a day’s worth of data, Frick’s ups and downs over a 24-hour period. Larger ones reflect weeks of journal keeping and internal swings. For her upcoming solo exhibition this May, at New York’s Pavel Zoubok Gallery, Frick has made wood, leather, and paper assemblages based on accounts of her daily activities. In several pieces, she used apps like ManicTime on her laptop and Moment

In fact, many data artists straddle art and science as Leonardo da Vinci did. Curators and historians still disagree about how to classify him: great artist or scientific genius? Or does the divide even matter? on her iPhone to track each click and touch of her screen for almost a month. Frick is adamant that her work is about more than simply visualizing information—that it serves as a metaphor for human experience, and thus belongs firmly in the art world. The distinction between data presentation and data art is often fuzzy, and the art world still struggles to separate the two. For example, MOMA’s recent show, Scenes for a New Heritage: Contemporary Art from the Collection, included digital prints with images generated from ArcGIS software. The work, Million

Dollar Blocks, was designed at Columbia University’s Spatial Information Design Lab and showcased a series of maps based on data from the criminal-justice system. According to the project, of the more than two million incarcerated people in the U.S., a disproportionate number come from a handful of neighborhoods in the largest cities. The maps are meant to pose ethical and political questions about criminal justice reform, and they do that successfully. But Million Dollar Blocks may just be powerfully presented data, rather than conceptual art, which is where the artist’s underlying idea is more important than the execution. But by blurring the boundaries, conceptual artists are helping scientists see their research more creatively. The New York Times recently chronicled Daniel Kohn, a Brooklyn-based painter, who spent roughly a year at the Albert Einstein School of Medicine teaching geneticists ways to represent their digital data in more intuitive ways. And while algorithms have seeped into daily life—informing everything from consumer music choices to dating options—they’re also edging into conceptual art. In March, the website Artsy held what it called the world’s first Algorithm Auction, “celebrating the art of code.” Works included Turtle Geometry, an 11-inch stack of programming on dotmatrix printer paper from 1969 made by Hal Abelson, a professor of electrical engineering and computer science at MIT. In fact, many data artists straddle art and science as Leonardo da Vinci did. Curators and historians still disagree about how to classify him: great artist or scientific genius? Or does the divide even matter? Current tools make self-tracking more efficient than ever, but data artists are hardly the first to express themselves through their daily activities—or to try to find meaning within life’s monotony. The Italian Mannerist painter Jacopo Pontormo kept records of his daily life from January 1554 to October 1556. In it, he detailed the amount of food he ate, the weather, friends he visited, even his bowel movements. In the 1970s, the Japanese conceptualist On Kawara produced his selfobservation series, I Got Up, I Went, and I Met (recently shown at the Guggenheim), in which he painstakingly records the rhythms of his day. Kawara stamped postcards with


the time he awoke, traced his daily trips onto photocopied maps, and listed the names of people he encountered for nearly 12 years. “Have you ever thought about how much is known about you?” Frick asked in one of our conversations. Not what pops up in Google or on social media, she clarified, but what companies know about your character. If you have a Kindle, Amazon knows how fast you finish a book, and whether you’re a cheater and skip chapters or read the ending first. Netflix knows whether you’re a binge watcher. E-ZPass knows where you go, even on local streets. Frick understands that this type of data collection can cause discomfort. Few of us like the idea that the government or Google is watching our every move. As a data artist, however, she sees her role as convincing people to want more personal data—regardless of who’s tracking. “In all of these patterns, I do think there is an essential idea of who we are,” Frick said. Data art can’t capture the essence or totality of somebody—if either exists—any more than a handprint on a cave wall can. But she believes personalized data art can accomplish something traditional art forms can’t: It allows a viewer to see her nuances and idiosyncrasies in higher resolution—and to discover things she may have forgotten about herself or perhaps has never known. “I think people are at a point where they are sick of worrying about who is or isn’t tracking their data,” said Frick. “I say, run toward the data. Take your data back and turn it into something meaningful.” To prove her point, she’s developed a free app, Frickbits, which allows anyone to “create the ultimate data-selfie,” by turning personal data into personalized art. But even today, data art isn’t all Google Maps and iPhones—its practitioners embrace traditional mediums too. One of the earliest examples of the genre is Danica Phelps. In July 2008, two months before Lehman Brothers filed for bankruptcy, The New York Times wrote, “For the last decade, Danica Phelps has chronicled her personal and financial lives with an exhaustive system of lists and charts accompanied by diagrams of colored stripes.” Her ongoing project, titled Income’s Outcome, tracks the money generated by each of her drawing’s sales.

Every time somebody buys a piece from the series, Phelps creates a new series of drawings, depicting what she bought with the money from the previous transaction. The drawings reflect how consumption and debt are intricate parts of our personal identity. Phelps has also painted every dollar in her bank account, as well as gray strips for every dollar she owes her bank for her mortgage—627,000 grey lines. For example, the Boston Quantified Self chapter describes itself as a “a regular show-and-tell for people who are tracking data about their body and conducting their own personal investigations and research into their bodies, minds, and selves.” “Datification” is a cultural process by which people put enormous faith in data, explained Gina Neff, an associate professor of communication at the University of Washington and the School of Public Policy at Central European University, as well as the co-author of the forthcoming book The Quantified Self, with Dawn Nafus. “What we see in the ‘quantified self,’” Neff explained, “is that people have taken up an N of 1, truly exploring what data means in their own personal lives, or in the artist’s case, as a form of artistic expression.” The line between data and self, she believes, is only where a person chooses to draw it. Yet the question remains whether data art can endure as much as a simple, striking handprint on a cave wall. On the one hand, data art may just be a link in a chain of artists who record and display their personal movements— some of whom will be displayed at the world’s leading museums decades from now, some who will fall by the wayside. On the other, data art may be the apogee of self-expression—a digital fingerprint that says more about modern man, and the inevitable forward march of time, than anything artists have been able to produce before.


In a country that prizes the individual’s right to privacy, data protections are practically nonexistent. By Adrienne Lafrance


WHYCAN ’TA M E R I C ANSFIND O U T W H AT B I G DATA K N OWA B O UTTHEM?

I

remember the moment, four years ago, when I realized just how well Big Data knew me. I was sitting in the newsroom one afternoon and clicked over to Facebook, where an advertisement caught my eye.

There on my screen was an image of the exact pair of hot pink Tory Burch sandals I was wearing at that very moment. On my feet and on my screen: the same color, the same style, identical twins in patent leather. I had bought them in-person, something of an impulse because they were on sale, and had never looked at them anywhere online or even visited the designer’s website. But somehow, Big Data determined, these were the sandals for me. And they were right. It was a silly thing. So Facebook knows what kinds of shoes I like. So what? But it creeped me out. How did they know? And more importantly: What else did they know? In the United States, there’s not much we can do to find out which aspects of our personal lives are being bought and sold by data brokers. That’s not the case in much of the rest of the world, where there are vast data protections, entire agencies devoted to data privacy, and serious enforcement efforts. “Generally, if information is publicly available in the United States, its use is not restricted,” said Jim Halpert, a lawyer with the Washington, D.C.-based firm DLA Piper who specializes in global data regulations. “The way that a defender of the U.S. system would respond is to say people don’t really care if they get more specific advertising that they might be interested in... But it goes to discrimination in a certain way rather than to the information collection itself being a harm. At some point you can collect so much information about an individual that it becomes intrusive.” Browse through DLA Piper’s extensive guide to data regulations and enforcement around the world and it’s clear that the United

The irony is that to find out which data about you is being bought and sold, you often have to give marketers even more information. States stands out compared with more robust protections in places like Canada and Europe. (Elsewhere, protections are lesser or nonexistent.) Many European countries have central agencies dedicated to data protection. In France, individuals must give their consent before a data broker can distribute his or her data. In the United Kingdom, websites have to notify visitors of data-tracking software. Many European countries require data brokers to give individuals the opportunity to review their data profiles, and to show them how to access, change, remove, or otherwise object to the data that has been collected. There’s some dissonance to the fact that data protections in the U.S. are so slim compared with other regulations in other nations. Culturally, Americans prize the right to privacy. And there are


24 / PURA ISSUE 6 U.S. sectors, like health care, where protecting personal data is paramount. But Americans don’t even know which pieces of their personal data is swirling around out there. Your name, age, past addresses, political party enrollment, whether you own a home—sure, you might expect that kind of stuff is to be shared by marketers and others who deal in data. But data brokers specialize in inference, too, so they can figure out all kinds of super-specific details about who you are and how you live. “For example, a data broker might infer that an individual with a boating license has an interest in boating, that a consumer has a technology interest based on the purchase of a Wired magazine subscription, or that a consumer who has bought two Ford cars has loyalty to that brand,” wrote the Federal Trade Commission in a report on Big Data this week. Big data knows your net worth. It knows that you have a dog. It knows when you’re most likely to use a coupon. It knows your favorite brand of detergent. It knows your dress size. It knows about your last speeding ticket, and when you got the oil changed. It knows you have a hunting license. It knows whether you’re pregnant— and often before you have a chance to share the news. The New York Times in 2012 told the alarming story of a teenager

The New York Times in 2012 told the alarming story of a teenager whose father angrily complained to Target for sending his young daughter promotional mailings for cribs and baby clothes. whose father angrily complained to Target for sending his young daughter promotional mailings for cribs and baby clothes. It later turned out the girl was pregnant, just as her data profile predicted. She simply hadn’t told her parents yet. The nine data brokers that the FTC examined for its report highlight the dizzying data-collection industry in the U.S. According to the report: data aggregator Rapleaf has at least one data point associated with more than 80 percent of all U.S. email addresses; analytics firm Corelogic has 795 million historical property transactions, 93 million mortgage

applications, and property-specific data covering more than 99 percent of U.S. residential properties; and the firm Datalogix, in partnership with Facebook, has marketing information about every U.S. household and more $1 trillion in consumer transactions. From the FTC report: Data brokers acquire a vast array of detailed and specific information about consumers; analyze it to make inferences about consumers, some of which may be considered sensitive; and share the information with clients in a range of industries. All of this activity takes place behind the scenes, without consumers’ knowledge. In the United States, the irony is that to find out which data about you is being bought and sold, you often have to give marketers even more information. For instance, data broker Acxiom runs the website About The Data, which bills itself as a portal to show you what marketers know about you. What isn’t immediately clear is that Acxiom is on the marketers’ side. The site’s terms of use say that Acxiom can share with marketers the info your provide during registration—things like your name, address, and the last four digits of your Social Security number. Oh, and by the way: Acxiom, according to the FTC report, has some 3,000 data segments for nearly every U.S. consumer. “Consumers deserve to know what information about their personal lives is being collected and sold to marketers by data brokers,” Sen. Jay Rockefeller, D-W.V., said in a statement earlier this year. Rockefeller introduced the The Data Broker Accountability and Transparency Act of 2014 to provide consumer protections and oversight to what he calls a “booming shadow industry that generated more than $150 billion in 2012.” He said this week that the FTC report makes his earlier concerns about data brokers “stronger than ever.” The measure, along with similar bill introduced in the House, is still in the early stages of consideration before a congressional committee. Meanwhile, the Big Data machine keeps going where you go, tracking what you do, and creating an ever clearer picture of who you are.


URBAN HEARBEAT

MODEL STUDIES

N

ew York City has a pulse—a certain rhythm of human activity that we can sense but not see. This rhythm jumped out at Foursquare designer Matt Healy while he was playing with a slice of the company’s user location data. He wanted to figure out how to show the urban heartbeat visually.

That was months ago. In a series of animated data visualizations published last week, Healy and the folks at Foursquare were finally able to illustrate precisely how people move around cities like Tokyo, Istanbul, and New York City—and the result is stunning. Healy prototyped and built the Pulse visualizations using Processing, the design-centric programming language and development environment. “That’s something that Processing is really great for: quickly, rapidly prototyping a visualization, seeing how it works and then making some of the changes from there,” Healy says. Working with Foursquare data scientist Blake Shaw, Healy mapped a year’s worth of user check-ins within each city, paying particular attention to pairs of sequential check-ins and condensing all that activity down into a 24-hour span. Since each check-in is a discrete data point without any directionality, Shaw and Healy needed to build

in some assumptions about how people behave in order to “animate” the human traffic patterns in Foursquare’s data. The pair used a three-hour window between checkins to define what constitutes two subsequent check-ins, which gave them a logical basis for drawing movement from place to place. “If I check in at Foursquare headquarters and then check into a coffee shop an hour later, then we assume I went from Foursquare to that coffee shop,” says Healy. “So the visualization takes the dot representing me at the time that I checked into Foursquare headquarters and then gradually moves it over to the coffee shop in the window between those two times. It then multiplies that by millions.” In each animation, color-coded dots and lines are used to represent people checking into various locations and then moving about within the city. Each color represents a different type of venue: Red for residences, lime green for food, dark blue for nightlife, turquoise for transit, and so on. The end result looks like a massive colony of fluorescent insects zooming throughout their habitat in loosely organized groups, flickering with an intensity that eventually dims as people return home and go to sleep. See more at https://foursquare. com/infographics/pulse



“The greatest value of a picture is when it forces us to notice what we never expected to see.” —John W. Tukey.


DATA VISUALIZATION



Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.