VISUALIZING DATA
05 INTRODUCTION TO DATA
17 YOUR LIFE IN DATA
31 STORYTELLING WITH DATA
39 GALLERY OF DATA
Print-based information visualization has been used effectively for centuries to marshal multifaceted data in the service of making a point visually. The advent of interactive computer graphics, the Internet, and readily available sources of data extend that rich tradition and introduces a new kind of expression, interactive visualization. This innovative medium has the potential for sustaining a meaningful virtual dialogue between scholars and their audience, using data as the liaison.
INTRODUCTION TO DATA The visualization of information in a graphical context has been practiced for centuries, with an ever-increasing sophistication and reliance on empirical data. Leonardo da Vinci’s 1487 drawing Vitruvian Man (next page) directly communicated the correlations of the ideal human proportion using the figure of a man circumscribed within a perfect circle. By the eighteenth century, merchants and governments collected progressively more quantitative data about their world. This information was typically stored in massive tables of numbers, making interpretation difficult at best. The Scottish social scientist William Playfair (1759–1823) devised a number of a now-familiar graphical device such as pie, bar, and line charts that are able to quickly communicate tabular data with clarity using this much more accessible form. Although she is known mainly for her work as a nurse, the writer and statistician Florence Nightingale was a key figure in using statistical graphics to convey insight into the role that inactivity, malnutrition, and inadequate sanitation had on soldier deaths during the Crimean War. She kept detailed records and used those data to expand on Playfair’s pie charts to create rose diagrams, now known as coxcombs. The use of graphics and charts to visualize quantitive data became more commonplace with new advances in statistical analyses. Representing even the simplest of these new ideas such as the range, median, and quartiles as numbers was not immediately intuitive. However, techniques to visually represent them through the use of new charting types, such as the pioneering statistician John Tukey’s Box and Whisker plots, provided depth to the data.
The Impact of Computers The advent of computers that are capable of drawing sophisticated graphics provided a new opportunity for both designers and statisticians to experiment with a new medium of expression, one that could render complex graphics based on large sets of data to create data rich graphical visualizations beyond the static drawings of previous centuries.
VISUALIZING DATA
07
Top: Leonardo da Vinci’s Vitruvian Man (1487) Bottom: First pie chart by William Playfair
VISUALIZING DATA
08
Top: Reproduction of Florence Nightingale’s Coxcomb Chart (1855)
VISUALIZING DATA
The Impact of the Internet The Internet and the open-source ethos of free accessibility have made data more widely available for instantaneous download. A wealth of freely available information from US Census data from 1790 to worldwide geographic, economic, and political data to real-time social media data from services such as Twitter can be instantly downloaded and used without any requirement for permission from the provider. Interactive visualization producers to provide powerful tools for information and inquiry are increasingly using these data. Google and other Internet companies have added simple application programming interfaces (API) to their popular online services such as maps that encourage people to create geographically based visualizations called mash-ups, which layer data and map together in ways that are easier and more accessible to a mass audience than previous GIS-based desktop systems. This visualization connects real-time police crime reports and plots them on a map where users can select the kinds of crimes to view during any give time period. Other freely available APIs offer a wide range of webbased tools that programmed with little effort or experience can use to create timelines, such as MIT’s SIMILE project. These tools have access to online images using Flickr and any number of charting and visualization APIs including Processing, ProtoVis, and Prefuse, all of which are freely available for anyone with even a modest level of programming skill to incorporate into his or her website. Recently, a new genre of Web sites has emerged that make it easy for people who do not know how to create Web sites or programs to produce sophisticated visualization using their own data. These Web-apps make it easy to upload datasets and make a variety of compelling visualizations that can be shared worldwide. IBM Research ManyEyes Project allows users to share and visualize their own datasets using simple Web-based tools to create network diagrams, charts, and visualize their own datasets using simple Web-based tools to create network diagrams, charts, tree-maps, word
09
clouds, and geographic maps. Other tools such as Tableau and our own VisualEyes allow for sophisticated datadriven interactive visualizations to be rapidly created by non-programmers.
The Art of Visual Aesthetics
Top: Using Tableau to visualize data.
The role of visual aesthetics cannot be overemphasized. The effective use of visual design principles will greatly improve the communicability and success of visualization and encourage people to spend more time with it. Naturally, designers with more experience will tend to produce better designs, but these principles are much less subjective than they are generally assumed to be and are easily learned, providing an achievable baseline for success. A number of excellent guides to graphic design include Nancy Duarte’s (2008) sliderology and Robin Williams’ (2004) The NonDesigner’s Design Book, which rapidly cover the fundamental concepts of presentation design.
The Role of Aesthetics in Visualization A common criticism of some infographics and visualizations is that they sacrifice substance for style, resulting in what is sometimes called eye candy or chart junk (Tufte 1983). This occurs when the design elements distract from the commnication message and should be avoided when the added graphics do not increase the communicative value. Increasing the aesthetic quality in visualization is not an act of vanity on the part of its designer. A number of studies have shown a strong positive correlation between the perceived aesthetic quality of the visualization and the willingness of users to take the time to interact with it, to understand its meaning, or to extract information from it. These were measured by the degree of task abandonment and the time it takes to recognize false information retrieval results. Some visualization are designed to be purely artistic, such as the automated music visualizations within Apple’s iTunes player, whereas others, such as stock market displays forgo aesthetic qualities for more pragmatic communication.
VISUALIZING DATA
010
Bottom: iTunes Music Visualization
VISUALIZING DATA
011
Data into Form Top: isometricblocks by Ben Fry, 2003. Visualization of human genome data.
VISUALIZING DATA
People have a remarkable ability to understand data when it’s presented as an image. As researcher Stuart K. Card says, “To understand something is called ‘seeing’ it. We try to make our ideas ‘clear,’ to bring them into ‘focus,’ to ‘arrange; our thoughts.” Like written words, visual language is composed to construct meaning. Our brains are wired to make sense of visual images. In contrast, it can take years of education to develop the ability to read even the simplest articles in a newspaper. The fundamentals of visual understanding, originally pursued by Gestalt psychologists in the early twentieth century, are now researched at a deeper level within the field of cognitive psychology. The findings of this research have been communicated within the visual arts by educators including Gyorgy Kepes, Donis A. Dondis, and Rudolf Arnheim, as well as through the work of visualization pioneers such as William Playfair, John Tukey, and Jacques Bertin. Data presentation techniques that combine our innate knowledge with learned skills make data easier to understand. In The Visual Display of Quantitative Information, Edward Tufte presents a data set and representation that supports this claim. Compare the tabular data to the scatterplot representation to see how the patterns become immediately clear when presented in the second format. In his book Semiology of Graphics: Diagrams, Networks, Maps, Bertin presents another clear example of the communicative power of visual representation. The maps of France on the left and right both present the same sociographic data, divided by canton (a French territorial subdivision). The representation on the right replaces each number with a circle sized to correspond to the numerical value. We can spend times analyzing the left map to see where there are concentrations of larger numbers, but on the right map we instantly comprehend the increased density in the upper left. In the same book, Bertin introduces a series of variables that can be used to visually distinguish data elements: size: value, texture, color, orientation, and shape. For example, a
12
bar chart distinguishes data through the height of each bar, and different train routes on a transit map are typically distinguished with color. For visualizations using only one variable, each element can be used in isolation. For multivariate visualizations (containing more than one variable) elements are combined. When applying form to data, there are always questions about goodness of fit, meaning how well the representation fits the data. Visualizations can mislead as well as enlighten. As Tufte warns in Visual Explanations: Images and Quantities, Evidence and Narrative, “There are right ways and wrong ways to show data; there are displays that reveal the truth and displays that do not.” In Bertin’s maps of France, the goodness of fit of the representations reveals information hidden within the data. Because each piece of data derives from a particular canton, associating that data with its location on a map allows us to see regional patterns. Presenting the data in a table that is organized alphabetically would not reveal ths pattern. By applying the same visualization technique to a different map—a map of Europe, for example— would also not work as well. The Bertin map works because each canton is roughly the same size, but the size differences among European countries would dilute the visual patterns needed for interpretation. In this case, the data is tightly linked to a source (geography), but in other instances data can be more abstract, such as when revealing patterns in language. There are hundreds of distinct visualization techniques that can be organized into categories, including tables, charts, diagrams, graphs, and maps. When creating a new visualization,, one technique is selected instead of another based on the organization of the data and what the visualization is meant to convey. Data representations that commonly appear in newspapers, such as bar charts, pie charts, and line graphs, were all developed before people relied on software; in fact, most commonly used data representation techniques are only useful for representing simple data
VISUALIZING DATA
13
Talkshow 03-09-07 Aaron Koblin, Catalogtree
VISUALIZING DATA
014
Left: Mapping a talkshow. by Aaron Koblin, Wilfred Houjebek and Catalogtree. 2007
VISUALIZING DATA
(1- and 2-D data sets). These techniques are automated within frequently used software tools such as Microsoft excel, Adobe Illustrator, and related program. Visualizing information, once a specialized activity, is becoming a part of mass culture. Writing new software is one approach to move beyond common data representations. New visualization techniques emerge as researchers and designers write software to fulfill their growing needs. The treemap technique is a good example to demonstrate the origins and evolution of a new visualization. It also shows how techniques often arise within a research group and are visually refined as designers use them in diverse context. A treemap is a visualization that utilizes nested rectangles to show the relations between one or more data elements. They are effective because they allow for easy 2-D size comparisons. The technique’s originator, Ben Schneiderman, a professor at the University of Maryland, documents the story of the first treemaps, from their origin to the present. The first treemaps were developed in 1991 as a way to show the memory usage on a computer hard drive. After subsequent applications and further development, the public at-large was introduced to treemaps by Map of the Market, an Internet application created by smartmoney.com in 1998. This application introduced the innovation of making the tiles close to square, rather than using the thin tiles of previous treemaps, to increase legibility, The circular treemap technique explored by interface designer Kai Wetzel in 2003 pushed the form of treemaps even further. Wetzel worked on this representation as one of many ideas for a Linux operating system interface. He recognized that the approach wastes space and the algorithm is slower, but the aspect ratio of each node is the same. The 2004 Newsmap application by Marcos Weskamp applied treemaps to the headlines of news articles compiled from the Google News aggregator. The treemap representation makes is easy to see how many articles are published within each news category. For example, the visualization makes clear that, in England, the highest volume of published articles is world news rather than national stories, while in Italy, the reverse is true. By 2007, through the refinement of these and other initiatives, the treemap technique had become so ubiquitous that it was used in the New York Times with the expectation that a general audience can understand it.
15
YOUR LIFE IN DATA Data is gathered from many sources. Websites like Facebook or Twitter collect personal data and can use it for something like studying the pattern of a certain behavior.
VISUALIZING DATA
In the not-too distant past, the web was about sharing, broadcasting, and distribution. But the tide is turning: the Web is moving toward the individual. Applications spring up every month that let people track, monitor, and analyze their habits and behaviors in hopes of gaining a better understanding about themselves and their surroundings. People can track eating habits, exercise, time spent online, sexual activity, monthly cycles, sleep, mood, and finances online. If you are interested in a certain aspect of your life, chances are that an application exists to track it. Personal data collection is of course nothing new. In the 1930s, Mass Observation, a social research group in Britain, collected data on various aspects of everyday life—such as beards and eyebrows, shouts and gestures of motorists, and behavior of people at war memorials—to gain a better understanding about the country. However, data collection methods have improved since 1930. It is no longer only a pencil and paper notepad or a manual counter. Data can be collected automatically with mobile phones and handheld computers such as that constant flows of data and information upload to servers, databases, and so-called data warehouses at all times of the day. With these advances in data collection technologies, the data streams have also developed into something much heftier than the tally counts reported by Mass Observation participants. Data can update in real-time, and as a result, people want up-to-date information. It is not enough to simply supply people with gigabytes of data, though. Not everyone is a statistician or computer scientist, and not everyone wants to sift through large data sets. This is a challenge that we face frequently with personal data collection. While the types of data collection and data returned might have changed over the years, individuals’ needs have not. That is to say that individuals who collect data about themselves and their surroundings still do so to gain a better understanding of the information that lies within the flowing data. Most
19
Above: Mass Observation. Topic Collection TC 28 Dreams 1937-1948 28-1-L Hallucination study 1947-48.
VISUALIZING DATA
of the time we are not after the numbers themselves; we are interested in what the numbers mean. It is a subtle difference but an important one. This need calls for systems that can handle personal data streams, process them efficiently and accurately, and dispense information to nonprofessionals in a way that is understandable and useful. We want something that is more than a spreadsheet of numbers. We want the story in the data. To construct such a system requires careful design considerations in both analysis and aesthetics. This was important when we implemented the Personal Environmental Impact Report (PEIR), a tool that allows people to see how they affect the environment and how the environment affects them on a micro-level; and your.flowingdata (YFD), an in-development project that enables users to collect data about themselves via Twitter, a microblogging service. For PEIR, I am the frontend developer, and I mostly work on the user interface and data visualization. As for YFD, I am the only person who works on it, so my responsibilities are a bit different, but my focus is still on the visualization side of things. Although PEIR and YFD are fairly different in data type, collection and processing, their goals are similar. PEIR and YFD are built to provide information to the individual. Neither is meant as an endpoint. Rather, they are meant to spur curiosity in how everyday decisions play a big role in how we live and to start conversations on personal data. After a brief background on PEIR and YFD, I discuss personal data collection, storage, and analysis with this idea in which can be generalized to personal data visualization as a whole. Ultimately, we want to show individual the beauty in their personal data.
20
PERSONAL ENVIRONMENTAL IMPACT REPORT (PEIR) PEIR is developed by the Center for Embedded Networked Sensing at the University of California at Los Angeles, or more specifically, the Urban Sensing group. We focus on using everyday mobile technologies (e.g. cell phones) to collect data about our surroundings and ourselves so that people can gain a better understanding of how they interact with what is around them. For example, DietSense is an online service that allows people to self-monitor their food choices and further request comments from dietary specialists; Family Dynamics helps families and life coaches document key features of a family’s daily interactions, such as colocation and family meals; and Walkability helps residents and pedestrian advocates make observations and voice their concerns about neighborhood walkability and connections to public transits. All of these projects let people get involved in their communities with just their mobile phones. We use to provide information. PEIR applies similar principles. A person downloads a small piece of software called Campaignr onto his phone, and it runs in the background. As he goes about his daily activities—jogging around the track, driving to and from work, or making a trip to the grocery store, for example—the phone uploads GPS data to PEIR’s central servers every two minutes. This includes latitude, longitude, altitude, velocity and time. We use this data to estimate an individual’s impact on and exposure to the environment. Environmental pollution sensors are not required. Instead, we use what is already available on many mobile phones—GPS and then pass this data with context, such as weather, into established environmental models. Finally we visualize the environmental impact and exposure data. The challenge at this stage is to communicate meaning in data that is unfamiliar to most. What does it mean to emit 1,000 kilograms of carbon in a week? Is that a lot or is that a little? We have to keep the user and purpose in mind, as they drive the system design from the visualization down to the data collection and storage.
VISUALIZING DATA
21
This page: The PEIR website functions.
VISUALIZING DATA
22
YOUR.FLOWINGDATA (YFD) Below: Data collected by your.flowingdata
VISUALIZING DATA
While PEIR uses a piece of custom software that runs in the background, YFD requires that users actively enter data via Twitter. Twitter is a microblogging service that asks a very simple question: what are you doing right now? People can post, or more appropriately, tweet, what they are doing via desktop applications, email, instant messaging, and most importantly as YFD is concerned), SMS, which means people can tweet with their mobile phones. YFD uses Twitter’s ubiquity so that people can tweet personal data from anywhere they can send SMS messages. Users can currently track eating habits, weight, sleep, mood, and when they go to the bathroom by simply posting tweets in a specific format. Like PEIR, YFD shows users that it is the little things that can have a profound effect on our way of life. During the design process, again, we keep the user in mind. What will keep users motivated to manually enter data on a regular basis? How can we make data collection as painless as possible? What should we communicate to the user once the data has been logged? To this end, I start at the beginning with data collection.
23
WORKING DATA COLLECTION INTO ROUTINE There are many ways to slice and dice data to better understand what it means.
This is one of the main reasons I chose Twitter as YFD’s data proxy from phone or computer to the database. Twitter allows users to post tweets via several outlets. The ability to post tweets via mobile phones lets users log data from anywhere their phones can send SMS messages, which means they can document something as it happens and do not have to wait until they have access to a computer. A person will most likely forget if she has to wait. Accessibility is key. One could accomplish something similar with email instead of twitter since most mobile phones let people send SMS to an email address, and this was in fact the original implementation of YFD. However, we go back to ddata collection as a natural part of daily routine Millions of people already use Twitter regularly, so part of the challenge is already relieved. People do use email frequently as well, and it is possbible they are more comfortable with it than Twitter, but the nature of the two is quite different. On Twitter, people update several times a day to post what they are doing. Twitter was created for this single purpose. Maybe a person is eating a sandwich, going out for a walk, or watching a movie. Hundreds of thousands tweet this type of information every day. Email, on the other hand, lends itself to messages that are more substantial. Most people would not email a friend to tell them they are watching a television program—especially not every day or every hour. By using Twitter, we get this posting regularity that hopefully transfer to data collection. I tried to make data logging on YFD feel the same as using Twitter. For instance, if someone eats a salami sandwich, he sends a message: “ate salami sandwich.” Data collection becomes conversational in this way. Users do not have to learn a new language like SQL. Instead, they only have to remember keywords followed by the value. In the previous example, the keyword is ate and the value is salami sandwich. To track sleep, a user simply sends a keyword: goodnight when going to sleep and gmorning when waking. In some ways, posting regularity with PEIR was less challenging than with YFD. Because PEIR collects data automatically in the background, the user just has to start the software on his phone with a few presses of a button. Development of that software came with its own difficulties, but that story is really for a different articles.
VISUALIZING DATA
24
Threads (2013) Variable, (heart) productions, Brand Culture
CASE STUDY
THREADS Threads is a visualization of constantly changing bids and offers on asian fuel markets. For the launch of new P4D software by Platts, Variable was invited to visualize market data as never seen before. In order to produce the animation Variable analized 1 year worth of data from 2012 provided by Platts. By looking at trends and behaviours the best visualization was chosen. Transactions by one company. Each day is one column.
VISUALIZING DATA
25
Different visualizations of data collected
VISUALIZING DATA
26
VISUALIZING DATA
27
Final product of Threads. Variable.io
STORYTELLING WITH DATA Think of all the popular data visualization works out there— the ones that you always hear in lectures or read about in blogs, and the ones that popped into your head as you were reading this sentence. What do they all have in common? They all tell an interesting story. Maybe the story was to convince you of something. Maybe it was to compel you to action, enlighten you with new information, or force you to question your own preconceived notions of reality. Whatever it is, the best data visualization, big or small for art or a slide presentation, helps you see what the data have to say.
More than Numbers Face it. Data can be boring if you don’t know what you’re looking for or don’t know that there’s something to look for in the first place. It’s just a mix of numbers and words that mean nothing other than their raw values. The great thing about statistics and visualization is that they help you look beyond that. Remember, data is a representation of real life. It’s not just a bucket of numbers. There are stories in that bucket. There’s meaning, truth, and beauty. And just like real life, sometimes the stories are simple and straightforward, and other times they’re complex and roundabout. Some stories belong in a textbook. Others come in novel form. It’s up to you, the statistician, programmer, designer, or data scientist to decide how to tell the story. Data, while objective, often has a human dimension to it. For example, look at unemployment. It’s easy to spout state averages, but as you’ve seen, it can vary a lot within the state. It can vary a lot by neighborhood. Probably someone you know lost a job over the past few years, and as the saying goes, they’re not just another statistic, right? The numbers represent individuals, so you should approach the data in that way. You don’t have to tell every individual’s story. However, there’s a subtle yet important difference between the unemployment rate increasing by 5% points and several hundred thousand people left jobless. The former reads as a number without much context, whereas the latter is more relatable.
VISUALIZING DATA
33
Art Top: Golan Levin’s The Dumpster Bottom: Kim Asendorf’s Sumedicina
Visualization is less about analytics and more about tapping into your emotions. Jonathan Harris and Sep Kamvar did this quite literally in We Feel Fine. The interactive piece scrapes sentences and phrases from personal public blogs and then visualizes them as a box of floating bubbles. Each bubble represents an emotion and is color-coded accordingly. As a whole, it is like individuals floating through space, but watch a little longer and you see bubbles start to cluster. Apply sorts and categorization through the interface to see how these seemingly random vignettes connect. Click an individual bubble to see a single story. It’s poetic and revealing at the same time. There are lots of other examples such as Golan Levin’s The Dumpster, which explores blog entries that mention breaking up with a significant other; Kim Asendorf’s Sumedicina, which tells a fictional story of a man running from a corrupt organization, with not words, but graphs and charts; or Andreas Nicolas Fischer’s physical sculptures that show economic downturn in the United States. The main point is that data and visualization don’t always have to be just about the cold, hard facts. Sometimes you’re not looking for analytical insight. Rather, sometimes you can tell the story from an emotional point of view that encourages viewers to reflect on the data. Think of it like this. Not all movies have to be documentaries, and not all visualization has to be traditional charts and graphics.
Compelling Of course, stories aren’t always to keep people informed or entertained. Sometimes they’re meant to provide urgency or compel people to action. Who can forget that point in An Inconvenient Truth when Al Gore stands on that scissor lift to show rising levels or carbon dioxide?
VISUALIZING DATA
34
Hans Rosling Presentation on Proverty
CASE STUDY
GAPMINDER No one has done this better than Hans Rosling, professor of International Health and director of the Gapminder Foundation. Using a tool called Trendalyzer, Rosling runs an animation that shows changes in poverty by country. He does this during a talk that first draws you in deeps to the data and by the end, everyone is on their feet applauding.
The visualization itself is fairly basic. It’s a motion chart. Bubbles represent countries and move based on the corresponding country’s poverty during a given year. Why is the talk so popular then? Because Rolsing speaks with conviction and excitement. He tells a story. How often have you seen a presentation with charts and graphics that put everyone to sleep? Instead Rosling gets the meaning of the data and uses that to his advantage. Plus, the sword-swallowing at the end of his talk dries the point home.
VISUALIZING DATA
36
VISUALIZING DATA
37
GALLERY OF DATA In this section, you will find interviews of artists who uses data to create art. These artists use their knowledge of the data around us and transforms them into pieces of artwork. The artworks they create tells a story and is presented in a way where we can understand them visually. Here are their stories . . .
AAAA - Visuals (2008) Aaron Koblin
VISUALIZING DATA
42
INTERVIEW
AARON KOBLIN Posted by Patricia McDonald in Uncategorized Aaron Koblin is the Technology Lead at Google’s Creative Lab in San Francisco. He maintains a portfolio of work at http://www.aaronkoblin.com. He received his MFA degree from UCLA Design Media Arts in 2006.
As you may just have heard (we’ve been a tad over-excited…) data visualisation maestro Aaron Koblin came into to talk to us yesterday. He kicked off with a showcase of his work, from his exquisite grad school visualisations of flight paths (see post below) to his latest embryonic projects for Google labs. Along the way he showcased extraordinary visualisations of the ebb and flow of information in cities and around the globe, experiments in crowdsourced sound design and perhaps his most famous project, the Radiohead “House of Cards” promo. In showcasing his extraordinary portfolio he touched on a number of powerful and provocative themes which we followed up on in our interview. Themes around the power of social context to make data compelling, the power of data visualisation to embrace the complexity of our lives today and the tension between the human and the machine present in crowd-sourcing engines. He also shared his key learnings from life at the front line of data visualisation:
VISUALIZING DATA
Looking at everyday things in new ways completely changes your perspective: there is no ”mundane” data when you set it in context. Use multiple visualisation techniques: there’s more than one way of seeing things Stay true to the data, not the “real world”: There is a randomness to data-it will make patterns you never anticipated. Respect the random-ness. You don’t have to use all the data: sometimes seeing patterns is about what you leave out Set the data free: open-source and let other people play with your data Why do you think the world has suddenly gone crazy for data visualisation? 18 months ago it was a struggle to get anyone interested in data and now it’s the new rock and roll… I guess it’s really the times that we live in, now you have tools like Twitter and Facebook and things that are widely used not just by the nerds but by everybody. Popular culture has also just all of a sudden embraced the power of storytelling through data and the relevance of all the data to their lives. All kinds of things have happened that simply weren’t possible before—the author you look up to, the musician, etc. they’re sharing all kinds of things—you can be intimately living their lives along with them and you see all different types of applications.
43
Aaronetrope
Do you think it’s partly about the explosion in the amount of data currently available, the data trail we leave behind us now or the fact that companies have more data than they can process so they end up giving it away? I think ultimately the biggest change is that the data is now relevant to people’s lives. Before most of the data was about infrastructure at best and a lot of it was locked away or presented in aggregate form. When you’re presented with a huge lump sum number that has no context it’s just not interesting, but now when you get these granular stories, things that are saying at this specific point in time here’s the way that things changed, just by giving it that context and social relevance it becomes interesting. Perhaps the big difference between what you do and the bar chart or the single number that it really embraces complexity rather than trying to reduce everything-ur lives are complex and this gives you a deeper understanding of that, not simpler, but richer I think what’s really nice is when you can have that kind of simplicity but then allow for investigation. Not necessarily forcing it into this sterile reality, but being able to present a story clearly and convincingly and simply but then allow for justification where you can say this is why, it’s fine to give a summary number but then be able to say this is why the number exists.
VISUALIZING DATA
44
So do you think data visualisation should be about immediacy or intrigue? Should it be “I see that and I get it” or “I see that and I don’t get it so now I’m going to play with it”? Definitely it boils down to your purpose. I think there certainly is a place for scientific visualisation. There still is a necessity for that type of clarity and objectivity but there’s also a place for design and I would argue for art, that to be able to use data to tell stories and to tell the right types of story requires different kinds of techniques and different means. Often times I think the whole medium-is-the-message sense of actually using the system to think about the system can be valuable and fun and productive What do you think is so fascinating about seeing one medium like music or dance portrayed through another? I think that’s also something that’s really picking up because of digital culture. Now that everything has become digital it’s so easy to run it through a completely different process. You can make it, just tweak the algorithm and sound becomes image and image becomes motion. It’s kind of a natural process, it makes a lot of sense, especially for people like myself who are visual thinkers and learners. I think translating a lot of these concepts and numbers and pure abstractions into something tangible, something to be seen and experienced and interacted with means the world, because for me it makes a completely different kind of sense. A lot of times that kind of experiment can reveal the underlying structure and point out the way that it makes sense. You talked letting the data do the talking and really embracing the random-ness of the data; do you think that’s what makes data visualisation so compelling as art, because art is very seldom truly random? I think it really gives character, because I think it’s really that kind of intricacy and detail that builds character and in a sense it’s the errors and flaws that make art. If you look at creative practice—like with the Sheep Market project for instance, if every person drew a perfect sheep they would all be the same and it would be a horrible project. It’s actually seeing the ways that people fail, the different intricacies and character that comes from the individual that adds a lot. You see that in all data visualisation, it’s the little variations that gives the character and makes it interesting.
VISUALIZING DATA
45
House of Card’s Music video
VISUALIZING DATA
48
On the crowdsourcing sheep project, you talked about it potentially being a very fragmented and alienating experience but you’ve drawn it into a coherent whole—there’s a certain ambivalence there I think it was the juxtaposition of those two qualities that makes it interesting for me. On the one hand you look at this huge grid that looks very much like a matrix of computer created content, but then juxtaposing it with each individual, looking down at the fine level you see there are actual little people in there. I’ve always been interested in microscopes, I bought a few microscopes and I have a family friend, Gary Greenberg, who makes these amazing oblique-lighting microscopes. Basically microscopes that produce images that are more like beautiful photographs than back-lit medical tools. He used to let me play with them and it was amazing fun. This notion that there’s a device that can completely transform the way you see something is really inspiring. You can look at the whole thing but you can also drill down and it’s a totally different world.
Visualisation of SMS messages in Amsterdam
VISUALIZING DATA
The wisdom of the crowd versus crowd-sourcing is a fascinating topic. The wisdom of the crowd seems to kick in when the crowd doesn’t know it’s being watched, whereas with crowd sourcing it can sometimes just feel like mass-sourcing. Do you think the future of crowd-sourcing is genuinely collaborative, with the crowd consciously working together and making things better? I think you already see that happening with all the Wiki projects which are really inspiring. To some extent I think it’s because the motivation is different, it’s not really about money. Money really complicates things. With the sheep the people who I had paid two cents felt totally ripped off and were really mad at me but the people who knew what the project was were asking if they could give me free sheep! So they had a completely different perspective on the situation which was interesting to note. I think the weird thing about crowdsourcing is that to some extent it feels like the inevitable evolution of capitalism which is just something to think about.
49
The Sheep Market (2006) Aaron Koblin
Which is weird, because on the other hand you could argue it’s socialism in action…. Right, it’s both—it boils down largely to the approach of working with the crowd. There probably needs to be a better disambiguation for actions involving the masses. There’s wiki-style collaboration, the kind of thing you see with open source projects, people working together to make something massive. On the opposite end of the spectrum is the current incarnation of the Mechanical Turk, where you have individuals being harvested for isolated menial tasks, and somewhere more towards the middle you find things like CrowdSpring— more of a massive sifting of the crowds. There are some themes which seems to recur in your work, such as the energy of cities or some themes about the flow of information. Is that about themes that interest you or is it about the data sets that are readily available? I wouldn’t yet say readily available, it’s still really tough to get at some of that data, but I think I am generally interested in all kinds of data that have anything to do with our lives and revealing the way that we live and build systems. So to me it’s something I’ve always been drawn to and part of it obviously is because I grew up in this computer and game culture and I’ve always been interested in algorithms and the way things work and the process behind things. There were a lot of films growing up about information and visualising information that inspired me a lot. What tends to come first for you, the data set or the visualisation technique? Based on the projects that I’ve done it’s usually either that I’m presented with an awesome dataset or it’s that there’s a data set I’d really like to create. I guess that’s answering it by saying both. So with the mechanical turk projects it was more about being interested in that tool and wanting to create data that would reflect the tool.
VISUALIZING DATA
52
Does the software you use have a big impact on the way you work? You talked about the impact of the Processing tool? I think it definitely has. The nice thing about Processing is that it’s an open source tool so it’s constantly being added to and growing and because it’s open source it makes it much easier not to be heavily influenced by it. I think because you can modify it to an extent that’s much more thorough than if you were using a closed source tool. In the sense that there’s all kind of things people have written that you can use but also in the sense that if something doesn’t work the way you want it to you can rip it apart and make it different and make it work the way you want. We are living in a golden age of data availability right now. Does it bother you either that there is such a rich data trail available about our lives or that people may start withholding data on that basis? I think that we will see people change, at least in terms of personal data, I think we’ll see people change their interest in sharing as much as they are. But I think that will probably come in the form of not necessarily less data acquisition just less public data sharing. I think what we’ll probably see is better disambiguation between aggregated public data and individual public data where I’ll be willing to opt in to something to share my information but not with my name on it and I think that that will end up being really valuable for all kinds of social studies and applications. But I think that will also potentially be less damaging to individuals as we see more of that. I feel bad for the younger generation that’s growing up right now. A lot of the stories that they’re bonding to their existence, will leave trails that will last with them for the rest of their lives. Forgetting is a wonderful ability, and one that technology is not currently adapted to.
VISUALIZING DATA
53
VISUALIZING DATA
54
Flight Patterns (2008) Aaron Koblin
VISUALIZING DATA
55
VISUALIZING DATA
56
Flight Patterns (2008) Aaron Koblin
VISUALIZING DATA
57
Fortune 500. Catalogtree August 2008 Global 500
VISUALIZING DATA
58
Right: Talkshow 0309-07, CatalogTree (2007) Mapping a talkshow with Aaron Koblin, Wilfried Houjebek and Catalogtree on 03-09-2007 as part of the Info Aesthetics Symposium.
INTERVIEW
CATALOGTREE Posted by DesignisBlank Catalogtree is a multidisciplinary design studio based in the Netherlands comprised of designers Daniel Gross and Jori Maltha. Catalogtree’s work is instantly recognizable for its complexity and exceptional clarity; a combination not easily achieved. Their ability to compress large amounts of data into these gorgeous infographics is unparalleled.
VISUALIZING DATA
What are your design backgrounds? We met at the Werkplaats Typografie—a two year MA program in Graphic Design—in 1999. The main medium of WT is undoubtedly print. We shared a strong interest in programming and media independent design—we somewhat illfitted that environment in that sense. So we worked on some projects together and continued doing so after graduation. Still, our approach and attitude have some WT roots: Technique is not seen as a necessity only but as a source of inspiration, too. How do you see the design in the Netherlands differing from American design? Difficult to compare: Our American clients are mostly magazines, our Dutch clients mostly not. We enjoy the directness and speed of magazine work but could not say if this is typically American. In the Netherlands, the projects we work on tend to be slower. But slow or not, it does not directly change our approach. Self-organization of content is an important tool to us. Instead of telling each word or data point where to go and what to look like exactly, we devise a set of rules by which content should behave. Form = Behaviour. We believe this way, a design can be more than the sum of its parts. It is exiting when a design has some ‘swarming behaviour’ and becomes, much like a flock of birds, a new organism in it self.
59
Info-graphics demand this approach of self-organization because graphic devises such as position, colour and size have a quantitative meaning first. Data-visualisation as a term is therefore almost a tautology: many info-graphics are just a visual version of a data-set. But we try to let content of books and websites ‘self-organize’ as well. To us, design is losing control in a controlled way, it is putting your hands in the air on a roller coaster and hope you don’t derail. Catalogtree is one of the more progressive studios around, is that something you are conscious of or does it just happen by accident? Because we like slow production techniques such as woodcut and screen-print, our work is often seen as retro actually. Choosing production techniques are such important design decisions to us, we are a little obsessed with not using Adobe (which we still use a lot of course) because it is such a default. We create our own hard- or software tools in an attempt to stay in control in a you-are-what-you-eat kind of way. Marshall McLuhans famous quote ‘we shape our tools and our tools in turn shape us’ (or whatever the exact quotation might be) serves as a warning sign for whenever we get all to comfortable in one way of working. This way of working is progressive in it self but not necessarily in outcome. We still hope it is though… Also, we try to balance commissioned projects and free work: Commissioned work has its’ benefits: there is a deadline, there is a budget and there are the clients wishes. Design projects are nicely one-dimensional sometimes. We do the job, clear our desks and go home. You can play loud music in the background while doing this. We have our own vocabulary when we discuss these projects, we could say: ‘lets do fat arrows with transurban patterns showing change’ or something, and we understand each other. We think up designs over the phone like that. The effort lies in creating this vocabulary. This is to us the purpose of our free work, to take the time to concentrate on new ways of processing and visualizing content and to create the tools to do that with. When we pursue new techniques and visualizations the atmosphere in our studio changes to quiet anticipation: will the plan work?
VISUALIZING DATA
60
What inspires you? We tend to skip the graphic design section in a book store and are hooked on natural self-organizing systems such as swarms, Penrose tilings, People standing in line, Voronoi patterns, Traffic jams, stock markets, cellular automata and the like. You are known for doing some of the best info graphics and handling data, do you have any specific philosophies or approaches to these projects that make them so successful? We think Graphic Design should trigger some emotional response. Also Info-graphics. Which has to do with a misconception of objectivity or rather, the miss-informed idea that a subjective view is the same as lying. It is often stated that data should be presented as objective as possible, but to our mind this leads to an exact reproduction of the original data set. When you’re visualizing a top ten of best-selling books this might be a good idea, but when the visualization represents tens of thousands of data-points, this approach makes quick interpretation impossible. Also, the authors of a book or the editors of a magazine use scientific research often to illustrate a certain story or a point of view. So the context in which a info-graphic is placed plays a role in determining the best visual form of the design. These editorial steps and design decisions help to interpret research without republishing the original paper. So we are aiming at designs a viewer can relate to (that might as well count as an emotional response right?) in order to make things clear. What is one thing you know now, that you wish you knew as a design student? Being professional is not a virtue. Right: NYT FAT SEPTEMBER 2009
VISUALIZING DATA
61
VISUALIZING DATA
62
Structured Light 01 door Catalogtree 2010
VISUALIZING DATA
63
VISUALIZING DATA
64
VISUALIZING DATA
65
VISUALIZING DATA
66
BIBLIOGRAPHY 00
05
10
17
23
31
33
INTRODUCTION Interactive Visualization: Insight through Inquiry By: Bill Ferster
40
CHAPTER 1 Interactive Visualization: Insight through Inquiry By: Bill Ferster Pg 9-19, Pg 120-121 Code + Form By: Casey Raes, Chandler McWilliams Pg 121-125
Artworks: Aaron Koblin http://aaronkoblin.com/ 56
CHAPTER 2 Beautiful Data: The Stories Behind Elegant Data Solutions By: Toby Segaran, Jeff Hammerbacher “Seeing Your Life in Data” By Nathan Yau, Chapter 1
Gapminder www.gapminder.org Content: Visualize This By Nathan Yau, Chapter 1
VISUALIZING DATA
Interview: Catalogtree By: DesignisBlank http://designisblank.com/2010/03/ interview-catalogtree/ Artworks: Catalogtree http://www.catalogtree.net/
Threads Variable.io http://variable.io/ Project description CHAPTER 3 Visualize This: The FlowingData Guide to Design, Visualization, and Statistics By: Nathan Yau, Chapter 1
INTERVIEW “I’ve always been interested in microscopes”: an interview with Aaron Koblin By: Patricia McDonald, http://bbh-labs.com/ ive-always-been-interested-in-microscopes-an-interview-with-aaron-koblin/
COVER PAGES Designed with Scriptographer in Adobe Illustrator CS5 02
CHAPTER 1 Script: Clouds By: Jürg Lehni
14
CHAPTER 2 Script: Tree By: Jürg Lehni
28
CHAPTER 3 Script: Wallblazer By: Pedro
36
GALLERY Script: Voronoi Tool By: Jonathan Puckey
67
DESIGN & EDITED BY: Angela Chu CLASS Typography 4, Art Center College of Design INSTRUCTOR Stephen Serrato TYPEFACES Display: Blender Pro Text: Akkurat Light