Forside til skriftligt arbejde/projekt Kursusafleveringer Kursus ved ITU-studienævn
Kursus ved EBUSS-studienævn Synopsis Miniprojekt Anden skriftlig opgave
Projekt (med projektaftale) ITU-hovedvejleder EBUSS-hovedvejleder ✓ Speciale Speciale Afsluttende projekt Projekt Sommerprojekt Projekt 4-ugers projekt 12-ugers projekt 16-ugers projekt Titel på kursus, projekt eller speciale: _____________________________________________________________________________________________
Kursusansvarlig/Vejleder(e): _____________________________________________________________________________________________ _____________________________________________________________________________________________
Navn(e):
Fødselsdato og år:

ITU-mail:
@ ____________@ ____________@ ____________@ ____________@ ____________@ ____________@ ____________
Kun for kurser med e-portfolio: Link til e-portfolio: _________________________________________________________________________________
Abstract Musikudforskning og -opdagelse er en almen og tilbagevende begivenhed for langt de fleste mennesker. Vi finder ny musik aktivt ved at søge online, ved at gå i musikbutikker og til koncerter samt ved passivt at lytte til radio og modtage anbefalinger fra venner og bekendte. Den konstante udvikling af Web-teknologi har imidlertid gjort det kompliceret både at lokalisere og udvælge den musik, som vi fortrækker at lytte til. Vi har pludselig fået adgang til alverdens musik igennem online streamingtjenester og -butikker. Det er ydermere blevet muligt for alle at komponere og dele musik med hele verden, hvilket bidrager til en øget mængde af potentiel interessant musik, men også en øget mængde af potentiel uinteressant musik. På denne baggrund kan det være ønskeligt og nødvendigt at bistå musikforbrugere i at etablere musikpræferencer samt at tilvejebringe behagelige og interessante oplevelser ved at finde og lytte til musik. I dette speciale udforskes det, hvordan timed comments (”tidskodede” kommentarer) på den sociale platform, SoundCloud, kan bidrage til alternative måder, hvorved online musik kan blive udforsket og opdaget. Der indledes med en på samme tid eksplorativ og kvantitativ undersøgelse af disse kommentarers basale egenskaber samt deres tekstmæssige indhold. Undersøgelsesresultaterne viser, at timed comments besidder egenskaber, som er forskellige fra kommentarer i andre sociale netværk. Desuden viser resultaterne også, at der er en række forskelle og ligheder mellem kommentarer, alt efter hvilken kontekst de indgår i. Dernæst præsenteres tre digitale artefakter, som er blevet til på baggrund af, og sideløbende med, denne undersøgelse. Artefakterne belyser hver især måder, hvorved timed comments kan inkorporeres i online musikudforskning og online opdagelsespraksisser.
1
Table of Contents 1. Introduction.........................................................7 1.1. 1.2. 1.3. 1.4.
The Problem ....................................................................................... 7 Alleviating The Problem ...................................................................... 8 Problem Statement ............................................................................. 8 SoundCloud & Timed Comments ........................................................ 9
1.4.1. What Are Timed Comments? .................................................................. 10 1.4.2. Timed Comments On SoundCloud ......................................................... 11
1.5. Personal Motivation .......................................................................... 11
2. Background .......................................................12 2.1. Music Information Retrieval .............................................................. 12 2.1.1. 2.1.2. 2.1.3. 2.1.4. 2.1.5. 2.1.6.
Descriptors About Music ........................................................................ 12 Content-Based Descriptors .................................................................... 13 Contextual Descriptors .......................................................................... 13 Combining Descriptors .......................................................................... 14 Timed Comments as Descriptors ........................................................... 14 Music Information Retrieval In This Thesis ............................................. 15
2.2.1. 2.2.2. 2.2.3. 2.2.4.
Passive Discovery................................................................................... 16 Active Discovery .................................................................................... 17 Exploration ............................................................................................ 17 Discovery & Exploration In This Thesis................................................... 18
2.3.1. 2.3.2. 2.3.3. 2.3.4. 2.3.5.
Games & Play ......................................................................................... 18 Game Rules ........................................................................................... 19 Games as Systems ................................................................................. 19 Games as Culture .................................................................................. 20 Game Design in This Thesis ................................................................... 20
2.2. Exploration & Discovery.................................................................... 15
2.3. Game Design .................................................................................... 18
2.4. Digital Music Games ......................................................................... 20 2.5. Dynamic Data Games........................................................................ 21
2.5.1. Three Distinct Examples ........................................................................ 22 2.5.2. Defining Dynamic Data Games ............................................................... 23 2.5.3. Meaningful Data Conversion .................................................................. 24
3. Research Design & Method ................................26 3.1. Research Approach & Design ............................................................ 26 3.1.1. 3.1.2. 3.1.3. 3.1.4.
Descriptive Research.............................................................................. 26 Experimental Computer Science............................................................. 27 Large-Scale Exploratory Research .......................................................... 28 Evaluation.............................................................................................. 28
3.2. Data Mining ...................................................................................... 28
3.2.1. What is Data Mining?.............................................................................. 28 3.2.2. Specialized Branches.............................................................................. 29 3.2.3. Data Mining Versus Statistics ................................................................. 29 3.2.4. Data Mining In This Thesis..................................................................... 29 3.2.5. The Process Of Data Mining ................................................................... 30 3.2.6. Data Preprocessing ................................................................................ 30 3.2.7. Data Characterization & Visualization .................................................... 30 3.2.8. Rules ..................................................................................................... 31 3.2.9. Attributes .............................................................................................. 31 3.2.10. Interesting Patterns & Useful Information............................................. 32 3.2.11. Data Mining Versus Information Retrieval ............................................ 32
3.3. Used Techniques .............................................................................. 33
2
3.3.1. 3.3.2. 3.3.3. 3.3.4. 3.3.5.
Natural Language Processing ................................................................. 33 Naive Bayes Classification ...................................................................... 33 Sentiment Analysis ................................................................................ 33 Part-Of-Speech Tagging ........................................................................ 34 TF-IDF Weighting................................................................................... 34
4. Exploring Timed Comments...............................35 4.1. Data Preprocessing........................................................................... 35 4.1.1. 4.1.2. 4.1.3. 4.1.4.
Data Reduction ...................................................................................... 35 Data Integration & Transformation......................................................... 36 Data Cleaning ........................................................................................ 37 Attributes of Timed Comments .............................................................. 37
4.2.1. 4.2.2. 4.2.3. 4.2.4. 4.2.5. 4.2.6. 4.2.7. 4.2.8. 4.2.9.
Comment Counts................................................................................... 39 Commented Tracks (Genres) .................................................................. 39 Timed Comments .................................................................................. 40 Comment Bodies ................................................................................... 41 Comment Distribution ........................................................................... 42 Comment Post Dates ............................................................................. 43 Part-Of-Speech Tags ............................................................................. 43 Useful Information ................................................................................. 44 Timed Comments Versus Regular Comments......................................... 45
4.2. Data Mining & Analysis ..................................................................... 38
4.3. Classification of timed comments ..................................................... 46
4.3.1. What constitutes timed content? ............................................................ 47 4.3.2. Training The Classifier ........................................................................... 48
4.4. TF-IDF.............................................................................................. 50 4.4.1. 4.4.2. 4.4.3. 4.4.4. 4.4.5.
Stop-Words Removal ............................................................................. 50 Genres Comparison ............................................................................... 50 Retrieving Information (Words) .............................................................. 51 Retrieving Information (Phrases) ............................................................ 52 Popularity Bias ....................................................................................... 52
5. Digital Artefacts.................................................53 5.1. Timestamp Player ............................................................................. 54 5.1.1. 5.1.2. 5.1.3. 5.1.4.
Visual Design......................................................................................... 55 Interaction ............................................................................................. 56 Related Work ......................................................................................... 57 Discussion & Evaluation ......................................................................... 57
5.2. Timed Comment Games ................................................................... 61
5.2.1. Emulating Classic Games ....................................................................... 61 5.2.2. SC Asteroids .......................................................................................... 63 5.2.3. SC Breakout ........................................................................................... 64 5.2.4. SC Missile Command ............................................................................. 66 5.2.5. Dynamic Difficulty ................................................................................. 67 5.2.6. Front-end Design .................................................................................. 67 5.2.7. Backend Design ..................................................................................... 68 5.2.8. Pseudo Level Editors .............................................................................. 68 5.2.9. Discussion & Evaluation ......................................................................... 69 5.2.10. Open Culture Games............................................................................ 71
5.3. Instamash......................................................................................... 74 5.3.1. 5.3.2. 5.3.3. 5.3.4.
Design ................................................................................................... 75 Interaction ............................................................................................. 76 Related Work ......................................................................................... 78 Discussion & Evaluation ......................................................................... 78
6. Discussion & Evaluation .....................................81 6.1. Music Discovery & Exploration .......................................................... 81 6.2. Complying User Needs ..................................................................... 82
3
7. Conclusions & Future Work ................................83 7.1. Conclusions...................................................................................... 83 7.2. Future Work...................................................................................... 84
8. Appendix A........................................................87 9. Appendix B ........................................................91 10. Appendix C......................................................92 11. Appendix D .....................................................94 12. Acknowledgements..........................................96 13. Bibliography ....................................................97
4
List of Figures
Figure 1.1: SoundCloud front page......................................................................... 10 Figure 1.2: Nico Nico Douga................................................................................... 11 Figure 1.3: The SoundCloud audio player ............................................................... 11 Figure 2.1: Taxonomy of Digital Music Games (Model) ........................................... 21 Figure 2.3: Tweet Land (in game) ........................................................................... 22 Figure 2.4: TaxiCity................................................................................................ 22 Figure 2.5: The Wiki Game ..................................................................................... 22 Figure 3.1: A comparison of the scientific model (left) with the role of experimentation in system design (right). From [50]. [51] ........................................ 27 Figure 4.1: The SoundCloud tag-cloud (12/4 2012) ............................................... 36 Figure 4.2: Upload dates for gathered tracks (genre sample) .................................. 38 Figure 4.3: Amount of comments in tracks (genre sample) (y-axis = tracks) .......... 39 Figure 4.4: Tracks versus commented tracks (genre sample) .................................. 40 Figure 4.5: Standard deviation of timed comments (genre sample) ......................... 41 Figure 4.6: Characters in comments (genre sample) ............................................... 41 Figure 4.7: Distribution of comments relative to track duration (genre sample) ...... 42 Figure 4.8: Distribution of comments relative to track duration (genre-based sample) .............................................................................................................................. 42 Figure 4.9: Comments relative to upload date (genre sample) ................................ 43 Figure 4.10: Part-of-speech tags (random sample) ................................................ 43 Figure 4.11: Top 10 adjectives (random sample) .................................................... 44 Figure 4.12: The SoundCloud player (in the process of adding a timed comment) .. 45 Figure 4.13: Comment counts (left Myspace comments on prodiles [69], right Youtube comments on videos [72]) ......................................................................... 46 Figure 4.14: Timed comment - Example 1 ............................................................. 47 Figure 4.15: Timed comment - Example 2 ............................................................. 47 Figure 4.16: Timed comment - Example 3 ............................................................. 47 Figure 4.17: Timed comment - Example 4 ............................................................. 48 Figure 4.18: Classification interface ....................................................................... 48 Figure 5.1: Timestamp Player (Overview) ................................................................ 54 Figure 5.2: Timestamp Player (the actual player) .................................................... 55 Figure 5.3: Timestamp Player (upper part).............................................................. 56 Figure 5.4: Timestamp Player (middle part) ............................................................ 56 Figure 5.5: Timestamp Player (lower part) .............................................................. 56 Figure 5.6: Keep Listening button .......................................................................... 57 Figure 5.7: Emulated games (from left: Breakout, Asteroids, and Missile Command)62 Figure 5.8: SC Asteroids (in-game)......................................................................... 63 Figure 5.9: SC Asteroids power-ups ....................................................................... 64 Figure 5.10: SC Breakout (in-game) (two different game modes) ............................ 64 Figure 5.11: SC Breakout level design system ......................................................... 65 Figure 5.12: Three different tracks ......................................................................... 65 Figure 5.13: SC Missile Command (in-game) .......................................................... 66 Figure 5.14: Game menu (Web site) for the Timed Comment Games....................... 68 Figure 5.15: Pseudo level editor – Example 1.......................................................... 68 Figure 5.16: Pseudo level editor – Example 2.......................................................... 69 Figure 5.17: Instamash (overview) .......................................................................... 74 Figure 5.18: Instamash - search fields and buttons................................................ 74 Figure 5.19: Words to colors conversion................................................................. 75 Figure 5.20: Instamash (complete overview of design and functionality)................. 75 Figure 5.21: VSampler 3.5 from MAZ Sound Tools.................................................. 77 Figure 5.22: Sample Editor ..................................................................................... 78 Figure 5.23: Interactive waveform .......................................................................... 78 Figure 9.1: Standard stop-words list in Natural ...................................................... 91 Figure 9.2: Harsh stop-words list ........................................................................... 91 Figure 10.1: Predetermined list of words that the program counts ......................... 92
5
Figure 10.2: Word frequencies in all the tracks of three different artists on SoundCloud ........................................................................................................... 92 Figure 10.3: Box2D Visualizer ................................................................................ 92 Figure 10.4: Comments Visualizer.......................................................................... 93 Figure 11.1: Example of artist object in the database ............................................. 94 Figure 11.2: Data about artist retrieved from database........................................... 95 Figure 11.3: Example of structure in the ................................................................ 95
List of Tables
Table 3-1: Attributes types .................................................................................... 31 Table 4-1: The attributes of timed comments ........................................................ 37 Table 4-2: Summarized & averaged statistics (random sample) .............................. 39 Table 4-3: Comment characteristics in various social networks.............................. 45 Table 4-4: Top 10 terms for each of the 25 genres using harsh stop-words list .... 51 Table 4-5: TF-IDF test retrieval (term frequencies in genres) ................................. 52 Table 5-1: Three main commenter types (examples from SoundCloud). ................. 59 Table 5-2: Acceptable comment types based on usage .......................................... 59 Table 5-3: Various differences between the selected games for emulation ............. 62 Table 5-4: Most significant data conversions in SC Asteroids ................................. 64 Table 5-5: Most significant data conversions in SC Breakout .................................. 65 Table 5-6: Most significant data conversions in SC Missile Command .................... 66 Table 5-7: The amount of comments required for a track to have stable versions .. 67 Table 5-8: Complete list actions that can be performed and their outcomes .......... 76 Table 6-1: Discovery & exploration aspects of the digital artefacts (X = Novel and essential, X = Not novel or essential) ..................................................................... 81 Table 7-1: Top 10 terms using only the standard stop-words list in Natural .......... 91
6
CHAPTER 1
Introduction
1.
…whenever you listen to a piece of music, you really are listening to the latest word in a conversation that you've been having ever since you started listening to music. You know, you hear every other piece in the piece you are listening to at that moment. - Brian Eno [1] As pointed out by Brian Eno, our taste in music is never stationary. It will change due to our individual history of listening to music. Every time we listen to a piece of music, it can thus be seen as having an impact on our overall understanding of and preferences in -music either consciously or unconsciously. For this reason, the discovery of music that we enjoy listening to is a continuous challenge for us. We want to find more of the music that we like and we want to share that music with others. The music that we like to listen to can vary based on our moods, the context we are in, the time of day and many other circumstances [2]. We may like a certain genre of music but that does not mean that we like all the music pieces in that genre. We may also like the sound of a piano but that does not mean that we like all music pieces that feature a piano.
1.1. The Problem Today, we are faced with an endless stream of music, which entails a corresponding stream of information about the music. We are no longer limited to the music selection that is available in our local record stores. Instead, we more or less have all the recordings ever made right at our fingertips through various music streaming services and online music stores. This can in many ways be seen as a convenience but such unlimited access may also entail what Richard Saul Wurman refers to as information anxiety [3]. This anxiety arises when we are overwhelmed with data and are incapable of processing it or by using Wurman words, ‘it is the black hole between data and knowledge’ [3 p34]. Unlimited access has a somewhat limited usefulness if we do not know what we are looking for or where to find it, which may bring about a sort of anxiety towards the data. Hence, we need to transform data into useful information to enable easy access and avert this confusion and anxiety from emerging. When transferred to music, unlimited access requires us to be selective, which may be difficult if an information anxiety is established but it is neither the less a necessity since there is far more music available than any one person can hope to listen to in a lifetime. Hence, in order to discover the music that we like, the amount of music needs to be limited somehow. In the absence of information anxiety, one way of limiting the amount of music is through search. But we often have problems with accurately describing the music we like and also why we like it. Some people might state that they enjoy listening to classical music when in reality; they exclusively like impressionist music from the late 19th century. The same people may have various reasons for liking the music. They may argue that they like the music because it is calming, romantic, in a major key, dissonant, or for some completely different reason. In the field of information retrieval, this phenomenon is commonly referred to as the vocabulary problem [4]. That is, we tend to use different words for referencing the same things. One person might expect to locate impressionist music by searching for “calming music” while another person expect the same results from a search for “dissonant music”. This problem can become visible if the designer(s) of a system provides the access terms for the items in the system themselves. This vocabulary problem is often sought 7
diminished by means such as: providing visual interfaces that do not depend on textual input, or by assisting the user in figuring out appropriate search terms. Another way of helping people find the music that they need is by filtering (and thus limiting) the possibilities for them. This can be done implicitly by interpreting their actions or explicitly, e.g. by asking them to fill out a questionnaire. In general, we call systems that work in this way for recommender systems [5]. Recommender systems can further be useful for personalizing information and removing redundant data based on our history of choices and/or actions. But this type of personalized filtering can also be considered problematic as we might all separately wind up in “filter bubbles” [6]. The idea surrounding a filter bubble is that automated filtering engines create a unique universe (bubble) of information for each of us [6 p9], which prevents us from seeing the full picture. It would seem that we are overwhelmed with possibilities and that we often do not know what kind of music we are looking for. On top of that, our taste in music is in perpetual change and there seems to be a risk in relying on support from modern technology to satisfy this ever-changing music taste, as we might then end up in isolation. Considering all of these circumstances, then how can we alleviate the problem of finding music that we like?
1.2. Alleviating The Problem There currently exists a vast amount of systems and tools for navigating the online world of music many of which can be considered enjoyable, effective, and aesthetically well conceived. Even so, the constant development of Web technology as well as the emergence of new digital music services, platforms, companies, organizations, and data types, entails new opportunities for developing efficient and immersive ways of discovering and exploring music online. In music, a timed comment is a relatively new type of music metadata. These are comments that are related to a specific time (timestamp) in an audio track. For this reason, they might potentially conceal a useful combination of both audio related information and social knowledge about sound and music. There is only so much information that we can draw from social music metadata that references music tracks in their entirety (e.g. tags). These types of metadata are not designed to hold detailed information about specific parts of an audio file. In order to reveal timespecific information, we often resolve to analysis of the actual audio-signal with the help from computers. State of the art audio-signal analysis systems can tell us a lot of technical details about an audio file such as the loudness, timbre, or pitch at specific timestamps. Conversely, computers can have great trouble with determining whether or not the music sounds “good”, if the tempo is “slow”, or what the song is actually about. These are all subjective interpretations of music that can vary from person to person and it requires human insight to evaluate other human’s interpretations and opinions. Hence, in this end of the spectrum, we often prefer some kind of human evaluation of the music (e.g. reviews). Timed comments can be said to hold hybrid properties, as they contain both a timestamp and (to varying degrees) an interpretation of that timestamp. In other words, they contain information about the audio-signal as well as the context surrounding that signal. For this reason, I believe that timed comments can contribute to the creation of more efficient and enjoyable ways of discovering new music as well as more immersive ways of exploring music collections and tracks.
1.3. Problem Statement
8
On the basis of the above outlined problem area, I seek to investigate the following problem: How can we utilize timed comments in order to provide new, enhanced or alternate ways of exploring and discovering music online?
-
How are timed comments different from regular comments and from other listener-generated music metadata? What are timed comments capable of telling us about the music that they are related to and how can we use that information? What separates timed comments from each other relatively in terms of genres, artists, users, tracks and the like? Which attributes of timed comments can be used in the creation of engaging music exploration and discovery experiences online?
I further find it worth emphasizing, that the overarching focal point in this thesis is the study of timed comments as a “material” (or data type), as the newness of timed comments calls for an investigation of their basic properties. I will study these properties exploratively through means of data mining resulting in some sort of understanding of and knowledge about -the comments. This knowledge will then be utilized for the construction of application and games for music discovery and exploration that incorporate timed comments. To give a basic understanding of what constitute timed comments, I will now cover their relatively short history and provide a description of the online music platform, SoundCloud, from which timed comments are accumulated and attempted utilized in this thesis.
1.4. SoundCloud & Timed Comments SoundCloud1 is a platform that allows users to upload and share sound and music. The site recently reached its 10.000.000 user and has pronounced itself as the worlds leading social sound platform [7]. The platform has a somewhat “democratic” structure. The front page (Figure 1.1) does not highlight or promote any music artists or tracks and it is the users actions that determines, which artist appears among the popular (hot) tracks. SoundCloud further has a very streamlined look, which contributes to the equality of users to some extend. The user profiles are not customizable, as it is the case with profile pages on the platform Myspace2. This entails a somewhat uniform experience from using SoundCloud where none of the artist’s visual identity stands out in the crowd.
1 2
http://soundcloud.com http://myspace.com/
9
Figure 1.1: SoundCloud front page
1.4.1. What Are Timed Comments? A timed comment is a comment that is related to a specific timestamp in a multimedia element. This type of comment is a relatively new type of social metadata on the Web. The Japanese video streaming site Nico Nico Douga3 was the first service to implement timed comments when it launched back in 2006. As it can be seen from Figure 1.2, Nico Nico Douga allows the direct overlaying of comments at specific playback times in videos. Youtube4 has used a similar form of timestamp since 2008 to let users deep link to specific a playback-time in a video [8], which can be combined with comments to a specific time in a video. Hence, these are regular comments with links to timestamps and not strictly speaking timed comments. Youtube also allows the creation of timed comments (annotations) in a way similar to that of Nico Nico Douga but this option is only available to the uploader of the video. There are also services that utilize timestamps for more organizational and navigational purposes such as Ted5 and Videolectures.NET6. On Ted, time-coded interactive transcripts accompany the video presentations (talks) on the site [9]. Each sentence (phrase) in the transcripts is a link to a timestamp in the video. Quiet similarly, the Web site Videolectures.NET allows users to jump to a specific part of lectures by clicking on slides that are displayed alongside the lecture videos.
3 4 5 6
http://www.nicovideo.jp/ http://www.youtube.com/ http://www.ted.com/ http://videolectures.net/
10
Figure 1.2: Nico Nico Douga
1.4.2. Timed Comments On SoundCloud To the best of my knowledge, SoundCloud was first music-streaming platform to incorporate timed commenting. Here, registered users can add timed comments by clicking a blue bar below a waveform in the platforms audio player (see Figure 1.3). This makes a popup window appear in which the desired comment text can be typed-in. Moreover, it is possible to add links in a comment but images and videos cannot be embedded. The comments are then displayed in popup windows during playback at the position in which they were entered.
Figure 1.3: The SoundCloud audio player
1.5. Personal Motivation My motivation for investigating timed comments has emerged from a curiosity about their nature as well as a fascination with SoundCloud as a social community. My initial impression of timed comments on SoundCloud was that their content has varying degrees of quality. It seemed that a majority of the comments contain somewhat bland and subjective statements such as “nice”, “great”, or “this sounds good” and also a lot of SPAM and self-promoting comments. On the other hand, there seemed to be instances of highly time and musically -relevant comments such “I like the bass in this part” and also more aesthetic evaluations of the music such as “this makes my head spin”. These inconsistencies in timed commenting and my curiosity about timed comments in general, has thus fueled my research throughout the work on this research project and is one of the main reasons why I chose the subject. Another important reason was that I personally have been experiencing a lack of motivation for searching for new music, which can be depicted as a sort information anxiety. I know that there is probably a lot of music available that is just the kind of music that I enjoy listening to but there is correspondingly loads of music that I do not care for. In this research project, my hopes are therefore that I can somehow contribute to existing ways of navigating the online musical landscape by making the filtering of unwanted music more efficient and/or making manual filtering of music a more enjoyable and perhaps even desirable process and activity. 11
CHAPTER 2
Background
2.
In this chapter I will introduce the interdisciplinary field of study known as Music Information Retrieval. I then go on to depict the way in which Music Information Retrieval relates to the task of discovering and exploring music online. The chapter ends out with a presentation of some foundational game design concepts and a framework and concept for understanding and discussing the games that have been developed as part of this research project.
2.1. Music Information Retrieval In general, an information retrieval system can be viewed as a system that is capable of storing, retrieving, and maintaining information [10]. In this context, information can be text, numeric data, audio, video, images and other multimedia objects. The task of retrieving information may be depicted as such: Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). [11 p1] [11] Hence, the goal (or aim) of information retrieval is to deliver desired information to an end user. A standard information retrieval task is ad hoc retrieval, in which a system aims to provide documents from within a collection (corpus) that are relevant to an arbitrary user information need [11 p5]. An information need is here considered a more or less arbitrary topic (area) that the user desires to explore. The user may not have a precise formulated description (query) that reflects what he/she is looking for. In conjunction with this information need, the objectives of information retrieval are to minimize the “overhead” of a user locating needed information [10]. That is, the time the user spend on locating the needed information including such tasks as generating a query, executing a query, and scanning the results. Hence, there is an aspect of efficiency to information retrieval. Music Information Retrieval (MIR) is a rapidly growing and highly interdisciplinary research area devoted to fulfilling people’s music information needs [12 p2]. Despite the emphasis on retrieval in its name, MIR encompasses a wide range of approaches aiming at music management, easy access, and enjoyment [13 p17]. The research term thus covers a broad spectrum and is not restricted to mean the retrieval of information (data) directly from music. Research areas in MIR include representation, retrieval, classification, law, perception, recommendation, feature detection, ontology, machine learning, user interface design and metadata [14]. It is often pointed out that management of music is a focal point in MIR research to make localization of desired music a pleasant task [15], [14]. This refers to the location of new music but also to the exploration of music that a given user of a MIR system is already familiarized with. For this reason, easy access might be considered an overarching focal point in MIR research whereas music management is an important prerequisite to access.
2.1.1. Descriptors About Music In his thesis “Learning The Meaning of Music”, Brian Whitman designates three common focus areas in Music Retrieval and analysis:
12
Retrieval tends to be either ‘score level,’ analyses of symbolic music data (MIDI, scores, Csound, performance data), ‘audio- level,’ analyses of recorded music using signal processing techniques, or recently ‘cultural,’ studying web communities, text, description, and usage. [16 p31] [16] Together, we usually refer to these data as music metadata but they are far more often separated into two main categories: content-based descriptors and contextual descriptors.
2.1.2. Content-Based Descriptors Most research in MIR, comprising techniques and developed systems, is referred to as content-based [12]. The ideology underlying content-based approaches is that a document can be described by a set of features, which are computed directly from its content. Content-based descriptors thus imply metadata that have not been manually produced. They are instead factual information about an audio-signal. Content-based descriptors about music are considered to encompass both audio features (acoustic features) and symbolic features [17]. Audio features are features that are extracted from a raw audio source (e.g. an mp3 or WAV file). These features can be both low-level and high-level. Low-level audio features include timbre, pitch and loudness whereas high-level features include form, melody and rhythm [18 p27]. The extraction of such features involves the use of audio-signal processing techniques. Some audio features are fairly easy to extract while others are more complicated to extract. For instance, the loudness is a simple measure of the volume of an audio file, which does not require any advanced filtering of the audio-signal. In contrast, extraction of a melody requires identification and isolation of the frequencies in the melody instrument or voice. This is a nontrivial task because explicit information about notes, voices or any musical symbol or tag is not encoded in an audio signal [19 p1]. Symbolic features are discrete encodings of music that are embedded in formats such as sheet music (music scores) and MIDI7 files. These formats hold information about what is to be played, and how, instead of the rendered music itself. For this reason, they are considered representations of music but at the same time (though often simplified) “objective” features and content-based descriptors. Symbolic representations of music can be highly informative and accurate representations but unfortunately, they are not available for music very often. MIDI data are only generated when the music is recorded with digital instruments (e.g. synthesizers) or otherwise appended in a music production environment that support MIDI. Similarly, sheet music is a written representation of music that can be hard to obtain (especially for new music) because music sheets are manually produced or sometimes (though more rarely) generated from MIDI representation.
2.1.3. Contextual Descriptors As opposed to content-based descriptors, contextual descriptors (cultural metadata) about music are more or less subjective interpretations and characterizations of an audio-signal [18 p27]. They are the words we use to qualitatively describe musical sound. For instance, two distinct discourses might be used about The Beatles describing them as either a “pop” band or a “rock” band. These are different and subjective interpretations of the same music to some extend. Contextual descriptors include such things as artists, titles, lyrics, but also social and otherwise usergenerated metadata about music.
7
Musical Instrument Digital Interface
13
Community metadata is a term that has been used to refer to this particular type of user-generated contextual descriptors [20]. Some examples of community metadata are social tags, reviews, tweets, ratings and also comments about music. The term implies contextual descriptors in the specific context of a (online) community whereas contextual descriptors are also present in local music repositories. Hence, community metadata is comprised by contextual descriptors. Furthermore, community metadata can be viewed as a byproduct of social networks, blogging services and similar Web 2.0 applications that promote user-generated content. Increasingly, this relatively new type of music metadata is being incorporated into MIR research in order to establish a more rounded understanding of music within its cultural context [19 p2], [21]. We have also seen new types of music services emerge such as Last.fm8, MusicBrainz9 and The Echonest10; services, that attempts to utilize community metadata in the provision of structured information, music recommendations, and personalized radio [15]. Services like these can be appealing for context-based MIR tasks as they often provide an API11, which can allow for easy retrieval of various kinds of information about music.
2.1.4. Combining Descriptors In the MIR community, there is a consensus that the combination of different types of metadata is useful in order provide more viable music management and retrieval systems [22 p32]. Studies have also shown that tasks such as music classification, similarity search and recommendation can benefit from combining contextual and content-based descriptors [23], [21], [24]. This is due to some advantages and disadvantages of both content-based descriptors and contextual descriptors. Content-based descriptors can be difficult to extract from the audio-signal and these descriptors do not characterize the human experience of listening to music [18 p28]. Also, content-based systems cannot invent or detect new tendencies such as the emergence of a new musical genre. Conversely, contextual descriptors about music are human labeling of music, which is in itself an error-prone process. These are often easy to extract but their subjective characteristics can make the utilization of contextual descriptors complicated. Hence, content-based descriptors can tell us something that contextual descriptors cannot and vice versa. To see an example of this difference, we can compare the way in which humans and machines might characterize the tempo of an audio track. A machine may depict an audio track as having 120 beats per minute. A person, on the other hand, might subjectively characterize the same audio track as being “fast” while another person might characterize it as being “slow” both of which could be justifiable. This is simply a matter of opinion and argumentation. Conversely, a person cannot accurately depict the beats per minute of a track, which is thus an advantage of the machine. So from combining these descriptors, we may decode both the specific tempo (the beats per minute), and a more or less subjective characterization of that tempo.
2.1.5. Timed Comments as Descriptors So which type of descriptor is a timed comment?
8
http://www.last.fm/ http://musicbrainz.org/ 10 http://the.echonest.com/ 11 Application Programming Interface 9
14
In a sense, timed (or timestamped) comments on SoundCloud might be considered to hold properties as both contextual and a content-based descriptors about music. Put differently, by delegating timestamps to comments about music, they obtain a kind of hybrid properties as descriptors about music. A timestamp may in itself be viewed as a content-based descriptor about music as it relates to the duration of a track, which is in itself a type content-based descriptor. In a sense, timestamps are intrinsic properties of an audio file. It may not be a very detailed content-based descriptor, but a content-based descriptor nonetheless. The actual comment is however not a content-based descriptor. When ignoring the timestamp, a comment on SoundCloud can simply be viewed as a textual representation of the commenter’s subjective attitude towards a track and thus a contextual descriptor. In this sense they are extrinsic descriptors of an audio file. Though more unpredictable, they might be said to hold information similar to, and perhaps in-between, social tags and reviews. That is, they sometimes seem to have characteristics of an index of as well as an opinion towards -music. I believe that these hybrid characteristics of timed comments make them particularly interesting and also a potentially useful type of music metadata if the information’s that they contain are properly utilized and it is this proper utilization of timed comments that I will seek out and explore throughout this thesis.
2.1.6. Music Information Retrieval In This Thesis In this thesis, emphasis is on providing easy access to music by developing effective and enjoyable MIR systems by utilizing information contained within timed comments. When considering the broad field of MIR, the applications and games that I construct might all be said to contain MIR systems to some extend. That is, they are all constructed with the purpose of providing easy access to music as well as an enjoyable experience. The International Symposium on Music Information Retrieval (ISMIR) has been encouraging its community to “pull up” more symbolic and metadata research [25], [26] and it has also been proposed to let the “R” in MIR stand “Research” as oppose to “Retrieval” as the field has developed to encompass more than retrieval. For this reason, I find my investigation of timed comments to be relevant to the field MIR, as they are indeed a type of music metadata.
2.2. Exploration & Discovery As previously mentioned, we possess an unprecedented, more or less unconditional access to music. To this extend, there are various ways in which we can discover and explore music online depending on the explicitness of our information needs. We may have a specific artist that we are looking for or perhaps even the title of a track by that artist. Conversely, we might simply be interested in a musical genre or the music from a specific time period. Yet another possibility is, that we have already discovered music we like and we now seek to know more about this music (e.g. the lyrics, the chords, or the names of musicians). There seems to be a myriad of possible variations to our music information need. Still, we might divide music exploration and discovery into three main categories (modes) in regards to our way of satisfying this need. These are: passive discovery, active discovery, and exploration. With inspiration drawn from [18], I will now present and describe each of these categories in turn.
15
2.2.1. Passive Discovery Traditionally, music has been passively discovered through recommendation by friends, radio hosts and the like [27]. Recommender systems (also called recommenders and information filtering systems [28]) are attempts to automate this process. These systems attempt to predict the preferences of users in order to suggest items that may be relevant in respect to those preferences [5]. The suggestions are found through similarity measurement. There are generally considered to be three types of data that recommender systems compare in order to make recommendations. These are the items, users, and transactions, i.e., the relations between items and users. Transactional data is basically log-data that can be explicitly supplied by the user, (e.g. through ratings of items), or implicitly inferred by interpreting the users actions. Even though this data generation process may actively engage the user, it will not be necessary for the user to provide more information subsequently in order for the recommendation to take place. This is one of the reasons why recommendation can be considered a passive form of discovery [18 p24]. In addition, the user may or may not actively initiate the recommendation process. Recommendations can be made without consulting the user (e.g. featured videos on Youtube) or initiated by the user (e.g. by turning on a radio). In the latter case, the information need with the person who whishes to discover music may often be vague and non-delimiting. Several techniques for recommendation have been proposed but most of the available systems use social-based, content-based, or collaborative approaches [29]. Social recommenders (also called community-based recommenders) compute similarities between users in social networks. The recommendation is based on preferences (e.g. ratings) that were provided by the users friend’s [5]. Content-based recommenders recommend items based on item-similarity and the analysis of transactional data. For instance, if a user has positively rated a track by an artist, the system might suggest other tracks by that artist. (Content-based recommenders are thus not associated with content-based descriptors.) In collaborative filtering, filtering decisions are based on human and not machine analysis of content. This can be advantageous for the filtering of text, images, music, and similar tasks where machine analysis might entail error-prone results [30]. The advantages include the ability to make filter based on complex concepts such as taste and the quality of items. A commonly used example of collaborative filtering is the taste-based recommendations on Amazon, which contain the heading: “customers who bought this item also bought that”. Collaborative filtering is widely used and relatively easy to implement but it is also subject for controversy because of some apparent problems that the technique handles poorly. Two commonly discussed issues are the previously mentioned cold-start-problem and a concept known as the long tail. The long tail is a concept popularized by Chris Anderson in a book of the same name [31]. Chris Anderson depicts a new market in which a tremendous amount of instances sells in very small quantities whereas just a few instances used account for most of the sales. Hence, it is no longer just “the hits” that account for all (or most) of the sales but a long tail of niche products. This new market has in particular emerged due to a shift from physical to digital -media.
16
The current state-of-the-art approaches to music recommendation exploit both contextual and content-based -descriptors about music [32]. These systems are often referred to as hybrid recommenders [33], [29].
2.2.2. Active Discovery I consider active discovery to be situated by search where the user has a more or less tangible and describable information need. An information retrieval system can then be used to actively and explicitly search for items (here music) that closely matches those characteristics [14]. Thus, in traditional information retrieval, a query on a collection may be viewed as a fragment of the desired item from the collection such as a specific word occurring in a document. Search will most often be related to text but there also exists speech-based querying (e.g. [34]) and in relation to music we have seen systems using query-by-humming (e.g. [35]). In [36], two subtle but meaningful differences between recommender systems and search engine are established. The first difference is, that recommender systems are concerned with a periodic or static information needs whereas search engines help solving a contingent situation. The second difference is that recommender systems filter objects that are embedded in an incoming stream of information whereas search engines retrieve information. I find this description to be adequate for distinguishing between recommendations and search whereas a distinction between discovery and exploration is somewhat more troublesome.
2.2.3. Exploration Exploration can be taken to mean the systematic investigation of something12. The term suggests an immersion into a given subject where an appreciation for enjoyment of, or excitement from -exploring is derived from the process in itself. Hence, exploration can often be motivated by curiosity rather than actual information need [37]. As oppose to a discovery, an exploration implies something more open-ended. By considering the archetypical versions of these two words, we also find that a discovery concerns a concrete subject (item) while an exploration can only be perceived as an abstract action. Put differently, a discovery may occur either passively or actively whereas an exploration must be seen as an activity to some extend. We cannot passively “make an exploration” in the same way as we can make a discovery. Moreover, the word exploration does not imply the newness of something. Unless we are “exploring the unknown”, exploration is done of an already discovered subject in order to gain a more detailed knowledge of the subject or to make new discoveries. To this end, exploration and discovery might also be
12
http://www.thefreedictionary.com/Exploring
17
seen as subsets of each other. Discoveries may often call for, or entail -exploration whereas exploration can lead to new discoveries. In the light of the above, it seems fair to say that there is a rather complex relationship between discovery and exploration. In this thesis, I conceive music exploration as an immersion into already discovered music involving an intention to improve ones relationship to a given track, artist or musical genre. Hence, I do not consider exploration to be the search for new music or a random discovery situated by recommendation. Instead, I consider discovery and exploration to be strongly linked in that exploration will often happen immediately after, and as a direct result of, a discovery.
2.2.4. Discovery & Exploration In This Thesis In this research project, I experiment with the utilization of timed comments both for recommendation, search, and exploration. The developed games and applications all facilitate discovery and exploration aspects to various degrees.
2.3. Game Design The creation of games that incorporate timed comments has turned out to be a particularly interesting way of investigating and experimenting with -the possibilities of utilizing them for music exploration and discovery. For this reason, I will present some fundamentals of game design, which will be used for discussing this utilization at later point. I will then go on to establish a high-level taxonomy of Digital Music Games and present a concept, which I call Dynamic Data Games. These are both considered intriguing ways of framing games that incorporate timed comments. In [38], a general definition of design is depicted as ‘the process by which a designer creates a context to be encountered by a participant, from which meaning emerges’ [38 p41]. A game designer can therefore be seen as a person who crafts a set of rules within which there are meaning and motivation to play [39 pXIX]. It is the person(s) who attempts to facilitate a meaningful gaming experience by indirectly designing the player’s experience. Hence, game design is not the same as graphic design or the programming of game. These are specific, practical tasks involved with the creation of a game whereas game design concerns the more high-level tasks involved with the creation of games.
2.3.1. Games & Play The words games and play are closely related while having distinct meanings [38 p72. The two concepts have a complex relationship in which both can be viewed as a subset of the other. In one sense, a game is a structured way of playing whereas in another sense play emerges from a game. Play can take many forms and the word play is used in various ways. Play can among other things be serious, aggressive, or explorative and we may play with toys, play the guitar or play around. Play can also be competitive (agon), chance-based (alea), simulation or make-believe (mimicry), or vertigo or physically-based (ilinx) [41]. Meaningful play can be taken to mean something along the line of player experiences that have meaning and are meaningful [38 p33]. We might furthermore determine that ‘the meaning of an action in a game resides in the relationship between action and outcome’ [38 p34]. However, these definitions do not really
18
clarify what constitutes meaningfulness. Meaningfulness is a complex, subjective measure that cannot unambiguously be defined. Hence, meaningful play only really occurs when a player somehow expresses that the play is meaningful. Games can be defined as systems ‘in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome’ [38 p80]. Gameplay is then a type of play that applies to a game. That is, the play that takes place in the game. In more details, gameplay can be viewed as ‘the formalized interaction that occurs when players follow the rules of a game and experience its system through play’ [38 p303]. When we talk about gameplay, we thus talk about our derived experiences from playing games. Gameplay is always non-linear, as it does not follow a predetermined path (line) such as reading a book (from A - B) [41 p125]. This nonlinearity gives gameplay meaning and without it game designers might as well be working on movies instead.
2.3.2. Game Rules Game rules are the formal structures of a game. The rules of a game are important because they facilitate the experience of players [38 p299]. Rules can further be defined on three levels: operational rules, constitutive rules and implicit rules. Operational rules (also known as procedures [39 p29]) are the explicit and concrete rules of a game. They are the instructions that we need in order to understand and play a game, which are meant to guide the behavior of the player(s) [38 p132]. We often find these rules in the manual accompanying a game. Constitutive rules (also referred to as game mechanics) are the way in which games work [42 p28]. They are the underlying structures that clarify what happens in various situations that might arise. Some constitutive rules define game objects and concepts while others limit player behavior. Constitutive rules are thus abstract and logical statements such as “there are eight types of weapons” or “players all begin with a value of zero”[38 p132]. They may also include conditional expressions that decide the outcome of the player’s actions such as: If you do X then Y happens [42 p28]. The constitutive rules of a game are not usually communicated to the player(s) but exists “below the surface” of the game [38 p130]. Implicit rules are the “unwritten rules” of games such as etiquette and good sportsmanship. An implicit rule when playing a multiplayer video game would be to place the television screen so that all players can see properly. It is inconceivable to compose a complete list of concrete, implicit rules for a game, as they will vary from context to context. In addition, implicit rules are not unique to any one game.
2.3.3. Games as Systems Games are essentially systems of rules and can be seen as a set of parts that together form a complex whole [38 p50]. In this sense, games are clearly systems. If components can be removed from a system without affecting its functionality, then it is a collection, not a system [39 p115]. There are furthermore four elements that all systems share: objects, attributes, internal relationships and an environment. These elements vary depending on the way in which a game system is framed. It can be framed as formal, experiential, or cultural and it can be open or closed. There are also many more detailed ways in which games can be framed as systems, one way being that of an emergent system [38 p152]. Simply put, emergence is the phenomenon of something complex arising from something simple. In games, the gameplay can be complex at the same time as the game rules are simple [43 p324]. For instance, the game of chess might be said to have very simple rules. Still, the progress and the possible outcomes of a game of chess are close to infinite. We thus say, that the whole of the game becomes greater than the sum of its parts [38 p159]. Emergence can also arise on a somewhat social level without being explicitly stated in the
19
game’s rules. Bluffing in poker is an example of this type of emergent behavior, as it is a behavior that somehow emerges from the game’s rules.
2.3.4. Games as Culture Culture is relevant to any game to some extend, as every system (or game) can be said to have an environment [38 p507]. Unlike rules and play, cultural schemas are extrinsic parts of games that come from the larger context in which they are played. A context might here be political, ideological, physical and so fourth. All games can further be said to reflect culture to some degree and some games also transform culture. Games may directly reflect (e.g. stereotypes such as Barbie) but the reflection might also be more implicitly visible, as the game designers can always be seen as products of their environments to some degree. A game can also transform the way in which people behave (or act) outside of the game. An example of this is playergenerated MOD’s13. MOD’s are (typically) not envisioned by the game designer and are thus related to the concept of emergence.
2.3.5. Game Design in This Thesis In this dissertation, I emulate and modify three vintage video games. For this reason, I have been able to skip a lot of preliminary game design steps. The focus have instead been on two main things: (1) to extract core game mechanics from the original games and (2) to advantageously employ dynamic data for generating game content and influencing the game play. In spite of the limited game design tasks, I still find it highly relevant to describe and discuss how the incorporation of external data affects the original game in relation to play, rules, game systems, and cultural context. In this respect, I also find it interesting to study how the games compare to similar contemporary music games.
2.4. Digital Music Games In order to properly discuss the games developed in this thesis, I find it necessary to propose a framework for what I call Digital Music Games. This, as I have not encountered a framework that simply and unambiguously provides an overview of the genre and it has indeed also been noted that it is a genre that refuses to be uniform [44]. This will not be a detailed walkthrough but simply a highlighting of some relevant terms and subsets of Digital Music Games. Figure 2.1 shows an overview of Digital Music Games including examples for each category. These categories have been constructed in respect to the way in which the player’s actions influence the music in the game as well as the way in which the game levels are designed. The model displays four main categories of Digital Music Games, which are Music Creation Games, Designed Music Games, content-based music games, and Contextual Music Games. Music Creation Games and Designed Music Games contain Adaptive Game Music. That is; music that responds appropriately to the events in a game [45] (the game states) rather than responding directly to the user. For this reason, adaptive music does not always comprise interactivity. I consider Interactive Music Games to be games in which the player’s actions have a direct (and apparent) influence on the music of the game and where the players have the abilities to affect the music in ways that was not entirely envisaged by the game designer. Interactive elements of these games are here understood as ‘those sound events that reacts to the player’s direct input’ [45 p4]. A majority of digital music
13
Modifications
20
games can be viewed as an extension of the Interactive Music Game. However, interactivity is a controversial word that can mean everything and nothing at the same time [38 p85]. Whether or not a music game is an interactive music games may thus often be disputable as will the degree of interaction. Music Creation Games can be viewed as an extension of interactive music games where the game adapts more blatantly to the player’s actions. In these “games”, the players’ do not engage in an artificial conflict. For this reason, they are often associated Sandbox-Games (or Non-Games) containing free-form play, as the game aspects are reduced in favor of the instrument and composition aspects making them seem more like musical toys than games [44]. Designed Music Games are also an extension of interactive music games. In these games, the game level for a music track has been manually designed (predetermined) by the game designer(s). The games can often be depicted as Rhythm Games [44], in which the player is supposed to hit the correct buttons in sync with the music. Furthermore, the player’s actions (interactions) are for the most translated into a layer of sound that is placed on to of (separate from) the rest of the music soundtrack. Content-Based Music Games are games in which the game content is generated directly from an audio file. The music in Content-Based Music Games may or may not be interactive. These games can be seen as having the opposite of adaptive game music, which is why I consider them to be Music Adaptive Games. The game here adapts to the music, which is just another way of saying that the game is created from the music somehow. In this category, we also find what I call Contextual Music Games, which are games that use music metadata to inflict their game content and/or gameplay. These are Music Adaptive Games in an implicit (and figurative) sense, as the game content is here inflicted by contextual descriptors about music. It will later be argued (in section 0) why I believe that the games that are developed in this thesis belong to this category.
Figure 2.1: Taxonomy of Digital Music Games (Model)
2.5. Dynamic Data Games
21
Lately, a new genre of games has been emerging in which external data is incorporated in order to generate game content and/or influence the gameplay of online digital games. These are games that do not yet belong to a particular group (or category) of games. In this paragraph, I will briefly describe three different games with these characteristics. This will lead to the definition of the concepts Dynamic Data Games and Meaningful Data Conversion. These two concepts have emerged from the process designing games that incorporate timed comments and will further prove to be essential for the description and discussion of the games later on.
2.5.1. Three Distinct Examples Tweet Land14 is a sidescrolling game for iPhone. The game collects tweets (Social Data) containing pre-determined keywords in realtime. These tweets are then converted into game content based on their textual content in the form of obstacles (pixel graphics) that clearly influence the gameplay. The tweets are furthermore converted into both a textual representation of the tweet and a visual representation of its text. The game highlights some ethical problems that may arise from Figure 2.2: Tweet Land (in game) using “real” data in games, as it incorporates tweets about tragic events such as natural disasters (in Figure 2.2 a tsunami occurs in the game) and accidents (e.g. car crashes) to fuel the gameplay. TaxiCity15 is a mix between Grand Theft Auto and Crazy Taxi where the goal to earn money by picking up and dropping of passengers within a time limit. The game uses Bing Maps (Big Data) to generate the city in which the game takes place. Hence, external data is used for level generation. Furthermore, the data are not Social Data as in the case of Tweet Land. They are instead Big Data in terms of factual map data about the world’s roads. For this reason, the gameplay is much more stable than that of Tweet Land, as the maps of the world’s roads are subject to change less frequently than tweets are posted on twitter. The game, though, has a somewhat strong physical connection to the “real world” whereas Tweet Land has a strong connection to people’s opinions. The Wiki Game16 is yet another variation of a game that incorporates external data. This game incorporates data in terms of articles on Wikipedia (User-Generated Content). The game is played by navigating from a random Wikipedia article to a pre-determined article by clicking only
Figure 2.3: TaxiCity
Figure 2.4: The Wiki Game
14
http://www.tweetlandgame.com/ http://blog.programmableweb.com/2010/05/06/taxicity-a-game-combining-open-dataand-bing-maps/ 16 http://thewikigame.com/ 15
22
on links within the current article. The goal is to arrive at the pre-determined article by using the least amount of mouse clicks or as fast as possible. From a game design point of view, the Wikipedia articles pretty much define the gameplay, as it is more or less only the applied operational rules that convert the data into a game. The game closely resembles that of a social game such as SpyMaster [46], but the important differences is that the game is not situated on Wikipedia and neither is the data social data (from social networks).
2.5.2. Defining Dynamic Data Games I use the phrase Dynamic Data Games to mean: any game that incorporates external dynamic data, which are used to generate game content and/or influence its constitutive game rules. The term is used as a hypernym (an umbrella term) for covering multiple game genres, data types and data sources. Dynamic Data The word dynamic can be defined as “something that is characterized by continuous change, activity, or progress”17. When I use the term dynamic data, it is thus used to indicate data with these characteristics. The word is first of all meant to imply that the external data is dynamic, as in the opposite of static, which instead characterizes data that we find in “regular” games. For this reason, the term can be said to provide a clear and imperative distinction between games that incorporate shifting and mutable data and games that do not. Data may often be dynamic in the scope of something and not in isolation. For instance, if we consider tweets as dynamic data, it is in the scope of (relative to) Twitter (or a Twitter profile). If we were to consider a single tweet in isolation, it either exists or it does not. A tweet can be added and deleted but not modified, which is not a dynamic characteristic (not continuous). In contrast, various contributors can modify Wikipedia articles. A Wikipedia article might thus be considered dynamic in itself. Hence, the scope in which we view the data may influence our interpretation of it as being dynamic. An advantage of the word dynamic is, that it may seamlessly be associated both with the game content and the constitutive rules of a Dynamic Data Games. For instance, it might be misleading to talk about “big game content”, “social game content”, or “open rules”. Conversely, it makes sense to talk about “dynamic game content” and “dynamic game rules”, as both of these expressions can be said to reflect the way in which games (and the experience of playing them) might be influenced by dynamic data. Dynamic Game Content In a dynamic data game, the game content is a representation of the data, which makes the game capable of sharing the dynamic properties of the data. Hence, dynamic data naturally entails that the aspects of the games that are based on these data become dynamic as well. I use the word capable, as the game designer may control the degree to which the content is dynamic. In other words, the game content may be constrained, thus limiting the dynamic nature of the data. Game content is here taken to mean gameplay related content (e.g. enemies) as well as non-gameplay related content (e.g. textures). Dynamic Constitutive Rules When dynamic data is converted into game content, it may be seen as an explicit way of incorporating it. Conversely, the data can also be incorporated implicitly (nonvisibly) for affecting the constitutive game rules. For instance, the occurrence of the
17
http://www.thefreedictionary.com/dynamic
23
words “up” and “down” in tweets could be used to shift the gravity in a game level thus affecting the gameplay “behind the scenes”. This type of conversion can hardly be seen as a conversion of data into game content. Instead, it is the formal game rules that become dynamic to some extend. Moreover, the constit utive game rulesof dynamic data games may often have to be more flexible than those in “regular” games, as the input data can be obscure. This flexibility can make the games themselves appear dynamic entailing a “dynamic experience” of playing the games.
2.5.3. Meaningful Data Conversion For a game to incorporate external data, it will need to convert the data into game content somehow. Hence, it becomes relevant to consider the way in which dynamic data might properly and meaningful be converted into game content. I use the phrase meaningful data conversion to mean: the conversion of data into game content with the goal of designing meaningful play. This will often entail close representation of attributes in the data as well as the support of the rules in the target game. Put differently, the data should be made into discernable information in the same way that discernable information should be displayed to a player within the context of a game world [38 p34]. Hence, a meaningful data conversion is perceptible by the senses or intellect18. The meaningfulness of a conversion is here considered a measure (as oppose to true or false) and a conversion can be found meaningful on various levels and for different attributes of the data independently. Meaningfulness is furthermore the degree to which the data is mediated to the player and the visible (ostensible) connectedness between the data and the game content. Hence, it should be possible to trace the original data in some respects. For instance, the game content in Tweet Land is generated from a single word in each tweet and might thus not be representative of the entire tweet. A tweet like “there is no tsunami” results in a tsunami in the game. This might thus not be considered a meaningful data conversion, as it can disturb the experience of playing the game for some players’. This indicates that the relationship between data and the representation of the data should be made discernable for the same reason that the relationship between an action and the result of that action should be made discernable, which is to enable meaningful play. In Tweet Land, a person’s tweets might actually be considered an action that leads to a result in the game from the games (Tweet Land’s) point of view, which emphasizes the importance of being reflective when converting data into game content. However, the example from tweet land goes to show that meaningful data conversion is not a necessity in dynamic data games. Tweet Land is still very much a playable game. Meaningful data conversion is simply an ideal that seems worth pursuing in the design and facilitation of meaningful play in dynamic data games. Example of (meaningful) data conversion: We might construct a constitutive rule for determining the color of an object based on the words in a tweet like so: if comment body contains “nice” then object color = red This can hardly be considered a meaningful data conversion, as there is no discernable relation between the word “nice” and the color red. A very good explanation would
18
http://www.thefreedictionary.com/discernable
24
thus be required to make this conversion meaningful. We might instead choose to look for the word “blood” in the comments like so: if comment body contains “blood” then object color = red This would be meaningful to a higher degree, as we know the color of blood to be red. The connection would however be implicit. To make an explicit connection, we could look for the actual word “red” instead: if comment body contains “red” then object color = red At this point, it becomes somewhat difficult and contextual dependent to decide which conversion is the more meaningful. Of course, the word “red” converts blatantly into a red object, but the word “blood” might be more suitable if the context is right or if the object is actually a drop of blood.
25
CHAPTER 3
Research Design & Method
3.
In this chapter I will present the “blueprint” of this research project. This includes my overall approach to conducting research as well as the methods that have been used to accumulate and investigate empirical data. I will begin by framing the main research question of this thesis. I then go on to define the exploratory research method data mining including some foundational concepts of data mining.
3.1. Research Approach & Design This research project may be deemed interdisciplinary, as it encompasses a large and mixed theoretical framework, intricate empirical studies, and the design, implementation, and evaluation of digital artefacts. For this reason, its associated research design is also somewhat extensive and multipart. It has not been possible to unambiguously “frame” the project as a specific type of research. For this reason, concepts, ideas, and theories from multiple research areas have been combined to form a whole. In what follows, it will be suggested that this thesis is situated somewhere in-between the areas of social sciences, computer science and exploratory research.
3.1.1. Descriptive Research Recall the main research question of this thesis: How can we utilize timed comments in order to provide new, enhanced, or alternate ways or exploring and discovering music online? As it turns out, there is little knowledge about timed comments to serve as guide for an investigation and to the best of my knowledge there have been no formal studies of the phenomenon. Hence, it seems necessary to establish some kind of understanding of what timed comments are before suggestions (and answers) on how to utilize them may be proposed. For this purpose, we might temporarily frame the main research question as: what are timed comments and how can we use them? In social sciences, one way of distinguishing between research designs is as being either descriptive research or explanatory research [47]. In this respect, this research project unambiguously classifies as a descriptive research project. Descriptive research can be purposefully applied to fields of study where little is known, whereas abundant information on a particular problem area calls for other approaches [48]. In its most basic sense, a descriptive research question asks: “what is it?”. As just explained, this might be considered necessary to answer the main question of this thesis. By answering the question: “what are timed comments?”, we can build a foundation of knowledge, which can then be used to explore ways of utilizing them. To this extend, I have limited the investigation of timed comments to be a question of how people comment and not why people comment as they do, which clearly frames the research as descriptive and not explanatory. With some kind of knowledge about what timed comments are and how people use them, this research project goes on to explore how this obtained knowledge might be utilized, which is really the main research question. This may also overly be depicted as a descriptive research question in which one of the goals is to challenge accepted assumptions about the way things are [47]; here in relation to online music discovery and exploration. But it further suggests an experimental research approach. Experimental research questions may be seen as the ”how” questions also called
26
situation producing researchable questions [48] in the area of natural sciences. In this connection, however, experimental research includes affiliation with an unintended research design scheme in which experiments are fixed process of steps. Instead, I thus choose to turn my attention to the field known as experimental computer science.
3.1.2. Experimental Computer Science When we ignore experimentation and avoid contact with the reality, we hamper progress. - Walter Tichy [49] Experiments are made to obtain new (and unexpected) knowledge [50]. As illustrated in the section about music information retrieval, there already exist substantial knowledge on the development of artefacts (systems) for music exploration and discovery. Hence, the embedding of timed comments in such systems is experimental and not merely descriptive research. Figure 3.1 depicts the difference between a purely scientific- and an experimental model. As shown, an experimental study is rooted in an idea as opposed to an observation. The idea of this research project is thus to utilize timed comments for music exploration and discovery.
Figure 3.1: A comparison of the scientific model (left) with the role of experimentation in system design (right). From [50]. [51]
Peter J. Denning describes the key affects of experimental science as ‘an apparatus for collecting data, a hypothesis, and systematic analysis to see whether the data supports the hypothesis’ [51 p2]. In this research project, the hypothesis can be viewed as the idea that systems can be constructed to facilitate music exploration and discovery by incorporating timed comments whereas the apparatus is a data mining engine. The systematic analysis is then the systems in themselves as depicted in Figure 3.1 (on the right). That is, the games and applications that are constructed. Figure 3.1 also depicts how iteration is an intrinsic part of experimentation. When we conduct an experiment, new ideas can emerge from both failure and success motivating us to repeat the experiment. As pointed out in [50], it is not really possible to determine whether or not an idea is any good before it have been tried out. The term iterative design is used in game design to mean ‘a method in which design decisions are made based on the experience of playing a game while it is in development’ [38 p11]. This research project can be seen as being iterative on multiple levels. These levels are, the collection of data, the construction of individual prototype, and also the process as a whole. The experiments conducted here in general seems to fit well with that of experimental computer science but there is a noticeable difference in terms of the validation and evaluation of results. Validity is important to experimental computer science and also to experimental research in general [52], [48]. In relation to the design and implementation of artefacts in this research project, the objectives are
27
more that of questioning and discussing their usefulness. (Validity is however relevant in the empirical studies of timed comments). Therefore, this research project may be said to have an explorative dimension.
3.1.3. Large-Scale Exploratory Research [Exploration] is arguably a more inviting and indeed accurate way of representing social research than treating it as a narrowing, quasi rule-based and discipline-based process that settles and confirms rather than unsettles and questions what one knows. [53 pv]] [53] As a whole, I consider this research project to be a large-scale exploratory research project. Through the investigation of timed comments, I seek to question, rather than confirm, their usefulness for- and utilization probabilities in, online music exploration and discovery contexts. As the above citation suggests, this is arguable a more exploratory approach than that of the average social science or computer science -research project.
3.1.4. Evaluation Because this research project attempts to question rather than confirming the usefulness of timed comments, evaluation criteria’s are not evident. A typical way of evaluating the proposed the developed games and applications would be through usability testing. However, this has not been considered within the scope of this thesis. Another way of measuring the results is in terms of efficiency. But the goal is not to create MIR systems and prove that they are faster or more reliable than other MIR systems; it is to change the outcome of using such systems. Evaluation will therefore take place on a somewhat discursive level by using reason and argumentation rather than intuition. I will use heuristic evaluation in the most basic sense of the term; as the act of looking at an interface and passing judgement according to ones own opinion [54]. I will further compare the created applications and games with similar artefacts through comparative analysis to study their novelty. In what follows, I will present the way in which data from SoundCloud has been collected and analyzed to form an understanding and description of timed comments, which is through the process of data mining.
3.2. Data Mining In this section, I will explore existing definitions of data mining resulting in a perception of data mining that will be associated with the term throughout this thesis. I will furthermore present some specialized branches as well as some relevant concepts and methods that are related to the process of data mining.
3.2.1. What is Data Mining? In a broad sense of the term, data mining can be seen as the process of extracting information from data. There have been proposed various (and varying) definitions of data mining, which makes it somewhat complicated to unambiguously and accurately define the concept of data mining in more details. When comparing definitions of the term, it seems that there is some dispute as to the differences between data mining and knowledge discovery. Some have equated data mining and knowledge discovery [55] while others have pointed out that Knowledge Discovery is slowly becoming synonymous with data mining [56]. Still more people seem to
28
interpret data mining as a step in the Knowledge Discovery process [57], [58]. Another varying aspect of the definition is related to the naming of the extracted “material”. Some definitions refer to the extractions as interesting knowledge [56], others call it potentially useful information [13], and yet others call it useful patterns [59]. Beside these differences between definitions, there seems to overly be an emphasis on the discovery of knowledge in data.
3.2.2. Specialized Branches The phrase music data mining can be used (broadly) to describe data mining in a music context. In [13], data in a music data collection (music database) is considered to encompass both music audio files, metadata such as title and artist, and even play statistics. For this reason, I find it reasonable to depict my research as music data mining. I do however prefer to use the wider term data mining where possible to avoid confusing my research with the mining of content-based descriptors about music and also because I do not in particular consider the data mining tasks that I am performing to be specific to music. Text mining is another variation on the field of data mining. This branch separates itself from regular data mining by focusing on the extraction of patterns from natural text (unstructured text documents) rather than the extractions of patterns from a structured database [60]. Hence, the mining of timed comments might seem like text mining on the surface when actually it is not. I use the SoundCloud API for extracting data from SoundCloud, which are then cleaned, organized and stored in a similar way in my own database. Hence, there are no unstructured text documents.
3.2.3. Data Mining Versus Statistics A key difference between the field of data mining and the field of statistics might be said to be the entry point. In statistics, data is collected in order to shed light on a predetermined subject and for hypothesis testing. Data mining is more exploratory. Data mining is the search for hidden (not obvious) patterns in large data repositories. In other words, the data are accumulated first and studied second in data mining, which is reversed from statistics. Another difference is the size of the data. Data mining is the extraction of information from large collections of data whereas statistics may also be used on small datasets. For these reasons, I find that statistics might indicate an entirely different process than the one involved with gathering and analyzing timed comments from SoundCloud and I further use some techniques that might be considered “actual” data mining.
3.2.4. Data Mining In This Thesis In this thesis, I broadly define data mining as the nontrivial extraction of implicit, previously unknown, and potentially useful information from a large collection of data, [13], [55], [61]. The data mining process must furthermore be automatic or semiautomatic [59 p5]. I choose to define the term data mining as both a step in the process of knowledge discovery and as synonymous with knowledge discovery. Differentiation is thus made between data mining and the process of data mining process. My definition intentionally leaves room for interpretation as to what constitutes usefulness. Information might simply be viewed as processed data whereas usefulness might indicate some kind of explicit advantage. However, due to the objectives in this thesis (being that of utilizing timed comments), information that is not usually considered useful might potentially be useful (e.g. for constructing game content). I will thus attempt to extract potentially useful information from data and not definite useful information. Information is here considered knowledge if it can somehow
29
contribute to informed decision making in relation to the development of experimental web applications and games.
3.2.5. The Process Of Data Mining In view of the foregoing and with inspiration from [56] and [13], I here present a simplified version of the iterative sequence of steps that have constituted the process of data mining in this research project: -
Data Preprocessing (preparation of the data before it is mined including data sampling, gathering, integration, transformation, cleaning) Data Analysis & Mining (the extraction of potential useful patterns through means such as characterization, visualization, discrimination, classification, and association) Evaluation (identification of interesting and/or useful information and patterns using measures of interestingness)
What follows these steps might be called knowledge utilization in which the discovered knowledge is utilized in the development of games and applications. Per definition, it would seem illogical to consider knowledge utilization as a step of the data mining process.
3.2.6. Data Preprocessing Data preprocessing is the part of the data mining process where data prepared for the mining tasks. Data preprocessing can further be organized in the following three categories: data cleaning, data integration and transformation, and data reduction [56]. Data cleaning involves tasks such as removing of noisy data and resolving inconsistencies in the data. Data integration is the merging of data from multiple repositories and in data transformation the data is consolidated into appropriate forms through means such as aggregation and generalization. Finally, data reduction is the task of limiting the dataset and attributes in the dataset to appropriate and/or necessary sizes. Sampling is a data reduction technique that allows a large data set to be represented by a much smaller (random) sample of the data. A random sample is a sample in which each instance in the original dataset has an equal chance of being included [59]. We use random sampling to increase the possibility of the sample being representative. A representative sample will have approximately the same property (of interest) as the original set of data [62]. There are furthermore two main ways of taking a sample: sampling with replacement and sampling without replacement. Simply put, the difference is whether same tuple (instance) can be selected multiple times (sampling with replacement) or if the tuple can only be selected once (sampling without replacement).
3.2.7. Data Characterization & Visualization In the process of data mining, data characterization is a way of summarizing the general characteristics of a target class of data [56 p22]. Data characterization can be helpful in locating outliers and noise and also just to get an overview of a data collection. For this purpose, it is often useful to use data visualization techniques. Data visualization can also be a practical way translating complex data into perceptible information. That is, so that the knowledge can be easily understood and directly usable by humans [56 p37]. This requires that the data be converted into visual information in a way that supports the characteristics of the data. In this research project, I visualize data using scatter charts, pie charts, line charts, bar charts, and tables.
30
3.2.8. Rules Rules can be used to look for useful patterns in data. A distinction can be made between classification rules and association rules [59 p11]. Classification rules can be used to predict an attribute whereas association rules can be used to describe the relationship between different attributes. A set of rules that are intended to be executed in a sequence, is called a decision list. Association mining is the search for patterns and has originally emerged from the analysis of market basket data. For instance, data from a super market might show that costumers who buy a walkman also tend to buy batteries [13 p11]. It might thus be considered advantageous to place the walkmen and the batteries closely to each other in the store, as this could provide easier access for customers and potentially also an increase in sales.
3.2.9. Attributes An attribute is a property (characteristics) of an instance in a dataset and different types of attributes must be treated differently. A distinction can be made between categorical (qualitative) and numeric (quantitative) attributes [62 p26]. Categorical attributes are labels (or categories) and can be either ordinal (ordered) or nominal (unordered). Examples include eye colors and music genres but also IP addresses and phone numbers. An example of ordinal attributes is a collection of descriptions of musical tempos such as slow, moderate and fast. These have a meaningful ordering ranging from slow to fast with moderate in the middle. An example of nominal attributes is musical genres such as pop, rock and soul. These cannot be ordered, as no distinct hierarchy exists between them. It does not make sense to perform mathematical operations on categorical attributes [59 p25]. For instance, it would be pointless to find the average of two phone numbers or to subtract pop music from rock music. Categorical attributes can be either equal, or not equal whereas ordinal attributes can also be greater than or less than each other. Numeric attributes are, and can be treated, as numbers. Examples include weight, dates, temperature and duration. In contrast to categorical attributes, numeric attributes have equal spacing between them, which allows for some or all types mathematical operations. Numerical attributes can be of the type interval or ratio. The difference between the two is whether or not a â&#x20AC;&#x153;trueâ&#x20AC;? zero point exists. Interval attributes do not have a true zero. An example of interval attributes is dates. We can calculate the difference (or distance) between the year 1992 and 2000 (8 years) but it makes little sense to consider the sum of the years (3992), as the year, 0 is arbitrary and culturally defined in this case. The years since the Big Bang, on the other hand, may be considered a ratio attribute. In contrast to interval attributes, ratio attributes have a true zero point and can be treated as real numbers. The number of plays on an audio track is an example of a ratio attribute. It makes sense to have 0 plays, as does the multiplication of the plays: 10 plays are in fact twice as many as 5 plays. As shown in Table 3-1, there is a hierarchical relationship between attribute types where the type of an attribute can be seen to reflect the computational properties that it possesses. Nominal =â&#x2030; X <> +*/ Table 3-1: Attributes types
Ordinal X X
Interval X X X
Ratio X X X X
31
Both categorical and numeric attributes can be discrete or binary whereas numeric attributes can also be continuous. Discrete attributes are limited such as the number of states in the USA. Binary attributes are a special case of discrete attributes where there are only 2 possible values such as yes or no and true or false. Finally, continuous attributes have real numbers as attribute values. They can be computed as accurately as instruments allow, which is why they are called continuous. For instance, the height of a person can be defined as 180 cm or 180,26478...
3.2.10. Interesting Patterns & Useful Information In [56], an interesting pattern is defined in the following way: â&#x20AC;Śa pattern is interesting if it is (1) easily understood by humans, (2) valid on new or test data with some degree of certainty, (3) potentially useful, and (4) novel. A pattern is also interesting if it validates a hypothesis that the user sought to confirm. An interesting pattern represents knowledge. [56 p27] I find this definition practical for determining whether discovered patterns are interesting and it will thus be used for evaluating this interestingness. However, interestingness is a rather fluffy concept and I will thus attempt to define what I consider to be interesting patterns in this thesis. First of all, a distinction is made between interesting information and useful information. I consider any derived pattern that can contribute to informed decision-making when developing applications and games to be useful information but not necessarily interesting information. Hence, for the purpose of making informed decisions, a pattern does not have to be interesting in order to be useful. For instance, a derived pattern might show that comments on all tracks labeled classical in general contain longer words than tracks labeled techno thus indicating a difference between genres. Whether or not this is interesting is debatable, but the supplied information can nonetheless be used for making applications capable of adapting to different musical genres, which would make the information useful to some extend. If the pattern shows that the word lengths in classical and techno music are the same, this is also useful, as it implies that no distinction can be made in that regard. To sum up, I thus consider interestingness to be related to the users perception of patterns whereas the usefulness is reflected by the way in which information is utilized.
3.2.11. Data Mining Versus Information Retrieval Data mining should not be confused with information retrieval. They can be considered closely related in that they often use similar techniques in terms of statistics and machine learning schemes but they are by no means synonymous. This becomes evident when considering their individual objectives. The aim of information retrieval is to provide access to information whereas the aim of data mining is to extract knowledge from data. Since the techniques of the two fields are very similar, I have chosen place them in their own separate section.
32
3.3. Used Techniques In this section, I will briefly describe the machine learning and statistical techniques that are used for both data mining and information retrieval tasks in this research project. Hence, these techniques are used to extract useful information from a dataset and also to retrieve (provide access to) information that is relevant to a specific userâ&#x20AC;&#x2122;s information need. I would like to note that the machine learning techniques that are implemented in this thesis are not scientifically validated and they thus serve mostly as proof-of-concepts.
3.3.1. Natural Language Processing Natural Language Processing (NLP) techniques are used to give computers the ability to process human language [63]. NLP is not separate from machine learning and statistics. It is simply a range of machine learning and statistical techniques that can be used to process language (both written and spoken). The techniques I use in this thesis may all (overly) be considered NLP techniques. We encounter NLP applications on a regular basis when using computer programs and the web. For instance, any text editor with the ability to count words in a document relies on NLP to perform this task. When the text editor is asked to count the words in a document it requires some sort of knowledge about what it means to be a word and it thus becomes a language processing system.
3.3.2. Naive Bayes Classification As discussed in the previous section, classification rules can be used to predict an attribute. In classification learning, the learning scheme is presented with pre-classified examples from which it is expected to learn a way of classifying new and unseen examples [59 p40]. Naive Bayes classification is a machine learning technique that is often used for text classification due to its simplicity and effective classification method [11 p255]. The process of training a Naive Bayes classifier is that of supervised learning. Studies comparing classification algorithms have found Naive Bayesian classification to be comparable in performance with decision trees and selected neural network classifiers [56 p310]. The name Naive Bayes is derived from the way in which the classifier classifies documents by making probabilistic assumptions. The technique is used for the automatic classification of text documents into various categories. This is useful when we have identified a set of classes and then wish to determine which class a given object belongs to. Some typical implementations of Naive Bayes classification is spam detection in emails and for sentiment analysis. In this research project, I use Naive Bayes classification experimentally in trying to determine the difference between timed comments with time-related content and timed comments without time-related content. I have chosen to use Naive Bayes classification, as it has been announced as easy to interpret and quick to implement, which was considered relevant, as I have no prior experience with supervised learning (and machine learning). This was also considered relevant, as I merely use the classifier as a proof-of-concept as well as a way of discussing timed versus nontimed content in timed comments.
3.3.3. Sentiment Analysis Another classification problem is sentiment analysis in which the purpose is the analysis of emotionally based attitudes towards a given subject. It is thus used for determining if an attitude towards a subject is positively or negatively charged [64].
33
In this particular classification problem, we can identify three classes as being positive, negative and neutral. When a classification problem is identified like this, the next step is to learn the classifier to identify documents. This is done manually. I have used sentiment analysis in a game called SC Breakout (see. section 5.2.3) and also in an initial experiment of visualizing timed comments and their sentiments.
3.3.4. Part-Of-Speech Tagging Part-Of-Speech (POS) tagging is the process of labeling words in a text (corpus) into particular part of speech categories. Part of speech categories are linguistic categories of words such as nouns and verbs. In 1992, Eric Brill presented a rule-based POStagger [65], which automatically acquires its rules and tags with accuracy comparable to stochastic taggers with an immediate error-rate of about 7.9%. This error rate is then reduced to about 5% through the use of patches (a rule set). I use POS-tagging to study the part of speech used in timed comments and also to generate objects (elements) in games. This thesis will not cover an in-depth analysis of the performance of Brillâ&#x20AC;&#x2122;s POS-tagger on timed comments but protruding results will be discussed when found appropriate.
3.3.5. TF-IDF Weighting TF-IDF is a statistical measure that can be useful for ranking documents in a collection (corpus). Document ranking techniques rank documents in order of relevance based on a query [56 p617]. Appropriate ranking of documents can allow for more effective information retrieval. Search engines, such as Google, use document ranking to select the most appropriate documents to display based on a search (query) by a user. The term frequency (TF) is the number of times a term occurs in a document [15 p226]. The document frequency (DF) is the number of documents in a collection in which a given term occurs at least once. The measure called inverse document frequency (IDF) represents the importance of a term in respect to a collection of documents. A term that appears in multiple documents will be considered less important to a single document in the collection. In other words, the IDF increases (becomes more important) with the rarity of a term in a collection of documents. With N being the number of documents in a collection, the IDF can be computed as: IDF(d, t) = log(1 + N / DF(t)) The logarithm is used to normalize the measure. This is necessary in order to avoid very frequent words from gaining too much influence when searching for multiple words. The TF-IDF is the term frequency multiplied with the inverse document frequency. In this way, the measure gives more importance to terms that are applied frequently to particular documents but are less frequent across all documents. In this research project, TF-IDF is used to rank the timed comments relative to both genres and tracks in order to enable retrieval of tracks based on the content of the comments (the text in the comments).
34
CHAPTER FOUR
Exploring Timed Comments
4.
In this chapter I describe the process of gathering data from SoundCloud, which is followed by a presentation of the most important findings from mining and analyzing timed comments. I then go on to explore the textual content of timed comments. While doing this, I construct a Naive Bayes classifier and establish a system for ranking comments by using TF-IDF measures.
4.1. Data Preprocessing In this section I will briefly present my reflections on and process of -accumulating and preprocessing data from SoundCloud. This includes thoughts on data gathering, reduction, integration, transformation, and cleaning. Several iterations were made before establishing an appropriate way of gathering the data. These will not be presented here, as the description of the process was extensive and somewhat trivial (trial and error). The purpose of gathering timed comments has been (1) to study their basic properties, (2) to compare these properties with the properties of â&#x20AC;&#x153;regularâ&#x20AC;? comments, and (3) to explore the similarities and/or differences in between timed comments. The here-described data preprocessing techniques have therefore been chosen with this purpose in mind. As of this writing19, there are roughly 45 million audio tracks on SoundCloud based on the ids of the most recent tracks that have been uploaded. These tracks together contain in the area of 55 million comments corresponding to an average of roughly 1.2 comments per track20. The number of tracks includes different types (both music and other audio) and the number may not account for tracks and comments that have been deleted. The continuous sampling of small collections of the latest tracks and comments indicated that their ids are not randomly generated numbers but conversely, that they are ordered numbers. Hence, the ids are ascending, meaning that the more recent tracks and comments are assigned higher id values.
4.1.1. Data Reduction Two main sample collections have been accumulated from SoundCloud: a genrebased collection and a random collection. The random collection consists of 543.413 (randomly selected) tracks whereas the genre-based collection consists of 532.968 tracks within 25 different musical genres. Two collections were assembled (as opposed to one) because it was considered useful for comparing and validating results. The size of the sample sizes was set at 500.000 corresponding to approximately 1% of all the tracks on SoundCloud, as this was considered to be an appropriate (and ample) sample size. As can be seen, the actual sample sizes deviates from this projected size, which is due to some outages during the process of gathering the data. The Random Collection The random collection was fairly easy to assemble. By identifying the id of the most recently uploaded track, a random number was generated between 0 and that id 19 20
May 3, 2012 Based on the id of a 1-day-old commented track and the ids of its comments
35
number. In this way, all tracks had an equal chance of being selected thus making the sample representative of the original source. It was furthermore decided to sample without replacements with the purpose of accumulating a versatile collection. Therefore, tracks were stored with their SoundCoud ids as a unique id, with the result of duplicates being ignored by the database by default. The Genre-Based Collection There reason for accumulation a genre-based collection was three-fold: (1) it was considered interesting to study the differences between timed commenting in various genres, which might, among other things, highlight their music cultural diversities if any. (2) Studies have shown that people widely use genres to search or browse music collections [66], [67]. (3), it was hypothesized that a random sample would not contain enough tracks to unambiguously represent delimited genres. Initial data gathering iterations had shown the genre labels of tracks to be highly inconsistent, as the user who uploads a track freely determines its genre label. This results in “endless” variations of genres making a comparison of genres seem somewhat precarious. Therefore, 25 more or less broad genres were selected based on their significance in the SoundCloud tag-cloud21. Genres and tags may not be the same thing but the tag cloud may be seen as a good indication of the “weight” of genres on SoundCloud. As it appears from Figure 4.1, not all tags clearly represents a genre. For instance, acoustic and electronic are not very delimited, as they may be viewed, as features of sound and it would seem somewhat illogical to compare e.g. rock with acoustic music. Despite their significance, such “non-genres” were excluded.
Figure 4.1: The SoundCloud tag-cloud (12/4 2012)
After selecting the genres, they were queried using SoundCloud’s hotness (popularity) parameter to secure rich commenting. The tracks (and their associated comments) were furthermore gathered in “chunks” of 200 using ranged queries22. This was done in order to make the sample as representative as the SoundCloud API allowed. In other words, the sample is not random, as the tracks are ranked by popularity but the tracks for each genre have still been gathered in the exact same way making the sample somewhat representative in respect to popularity.
4.1.2. Data Integration & Transformation
21 22
soundcloud.com/tags Here from one date to another (1 week at a time from January 2009 – April 2012)
36
On SoundCloud, tracks and comments are kept separately from each other and thus require different API calls in order to be retrieved. I have chosen to merge (integrate) tracks and comments into single documents. As previously mentioned, the user defines the genre when uploading a track. Hence, a track might be labeled as “rock”, “Rock”, “roCK”, “rock/pop”, “rock ‘n’ roll”, and s fourth. For simplicity and consistency reasons, every retrieved track that did not match the query exact (excluding case-sensitivity differences) was ignored. When saved in the local database, an attribute was added to all the tracks that matched a given query to indicate the genre that had been requested. This was done to facilitate efficient data mining. It was not obvious beforehand which attributes that might become interesting to investigate. For this reason, I chose to retrieve tracks with all of their properties and their associated comments. These where then merged (integrated) into a single object (called a document) and inserted in the given collection (either the genre-collection or the random collection). Also, each comment was POS-tagged at the point of retrieval and the tags were then merged with their respective comments. In retrospect, this was a poor decision, as the database became huge in size, which made complex querying (and especially pagination) very slow.
4.1.3. Data Cleaning It was not chosen to clean the data in any way or form. The careful considerations about assembling the collections made it seem likely that sound (or reasonable) decisions might be drawn. Also, the size of the collections (and the fact that there was two collections) made it seem improbable that major anomalies would pass by unnoticed. Finally, the intention was to depict the data in its raw form since useful information and patterns might not be those of a “classic” data mining process due to the rather unusual objectives of this research project. That is, the utilization of timed comments for the creation of music discovery and exploration games and applications.
4.1.4. Attributes of Timed Comments This section is rounded up by a presentation of the attributes of timed comments, which serves as the entry point for data mining tasks. Recall the different types of attributes that were presented in the data analysis and mining chapter: In this respect, a timed comment can be seen to have the following attributes: id created_at user_id track_id timestamp body uri user*
Nominal (= ≠) X
Ordinal (< >)
Interval (+ -)
Ratio (* /)
X X X X X X
Table 4-1: The attributes of timed comments * the user attribute contains an object of attributes and is not in itself an actual attribute.
From this we can determine, that the created_at (post date) and the timestamp attributes, are the once on which more advanced computations may be performed. A created_at is a date, and thus an interval attributes, which can be added together or subtracted from each other. A timestamp is a ratio attribute, as the duration of a track has a true zero. Hence, it is possible to perform any type of mathematical operation on timestamps. The ids of tracks, users, and comments as well as the comments body and uri are all nominal attributes. Hence, there is no ordering between ids (e.g. 5 is not a better or worse id than 5.000.000) and there is no immediate limit to the
37
number of an id. Furthermore, none of them are numeric attributes. It would be pointless to subtract one id from another or to compute the sum of two comment bodies. All the attributes of a comment are continuous. The body might however be discrete but there is no apparent (described) limitation to the number of characters that it can contain. The attributes of tracks are notably more than those of comments23 and enable various ways of investigating the relationship between tracks and timed comments.
4.2. Data Mining & Analysis In this section I will present the key findings obtained from the summarization, characterization, and visualization of the data samples gathered from SoundCloud. I will start by providing a basic walkthrough of the key findings resulting in an evaluation of the usefulness of these findings as well a comparison between timed and â&#x20AC;&#x153;regular commentsâ&#x20AC;?. Furthermore, three different samples will be analyzed simultaneously. These are: a genre sample (25 selected genres from the genre sample), the genre-based collection (25 selected genres) and also a random sample from the random collection (i.e., 90.000 randomly selected tracks from the random collection tracks). Figure 4.2 shows the frequency of uploaded tracks with selected tags relative to months in the genre sample. The graph suggests a fairly natural distribution for tracks, as the amount of uploaded tracks must be expected to grow exponentially alongside the amount of SoundCloud users, which (as mentioned in the section about SoundCloud) has increased over time. When comparing the graph with the tag cloud from SoundCloud (Figure 4.1), the tag frequencies are close to identical. The graph thus also indicates that the gathered tracks are somewhat representative of all SoundCloud tracks.
Figure 4.2: Upload dates for gathered tracks (genre sample)
Table 4-2 shows summarized and averaged statistics of the random sample from the random collection. In this sample, the amount of commented tracks is 23.6% and there are 92.1% timed comments. There are furthermore an average of 1,5 comments per track and 6,4 comments per commented track. The average body-length of a 23
http://developer.soundcloud.com/docs/api/reference/#tracks
38
timed comment is 40 characters (with spaces) and the average word count is 7 words with standalone symbols removed. For regular comments, the average body-length is 76 characters (with spaces) and the average word count is here 13 with standalone symbols removed. The amount of tagged words in timed comments is roughly 9 on average, which thus indicates that a little less than ¼ (22.2%) of all tagged words are standalone symbols. The sample also indicates that 3.4% of all the tracks contain links and 1.7% of these (half of the links) are links to pages on SoundCloud. Roughly ¼ of the comments in the sample contain an ampersand (‘@’) at the start of the comment, which means that the comments are replies to other comments on SoundCloud. These statistics are largely supported by the genre-based sample and the genre sample.
Table 4-2: Summarized & averaged statistics (random sample)
4.2.1. Comment Counts The scatter plot in Figure 4.3 shows the amount of comments for commented tracks in the genre sample. Not surprisingly, the plot indicates a negative correlation between the track count and the comment count. This is also called the zipfian pattern, which is a pattern found in most community-generated data24. Hence, there are more commented tracks with few comments than there are commented tracks with many comments. This same pattern is also found in the random sample and the genrebased sample.
Figure 4.3: Amount of comments in tracks (genre sample) (y-axis = tracks)
4.2.2. Commented Tracks (Genres) For the 25 selected genres, the genre sample contains 80.724 tracks of which 23.783 (29.5%) are commented tracks. The genres dubstep and techno have the most comments per track (5.3 comments each) on average whereas the genres punk and country has the least amount comments per track (0.3 comments each). There is no
24
39
apparent correlation between the amount of comments of a track and the amount of tracks in a genre. In the genre-based collection, dubstep again has the highest average amount of comments per track, 8.9 comments, which is more than twice the amount of comments of the second most commented genre (electronica with 4.2% comments per track). The genre techno here only has 3.6 comments per track on average, which might indicate an anomaly in one of the collections but conversely, the way in which SoundCloud determines popularity is unknown. The genres with the least amount of comments per track are once again punk and country with an average of 0.3 and 0.4 comments respectively. The bar chart in Figure 4.4 shows how the selected genres are represented in the genre sample (blue bar). It also shows the amount of commented tracks in each genre (red bar). This bar chart indicates that electronic genres (in general) are well represented in terms of commented tracks. The genres dubstep, techno, and house all have a fairly high percentage of commented tracks (54.4% for dubstep) whereas the genres rock, pop and hiphop all have fairly low percentages (11.6% for rock). This might indicate that commenting is somewhat more common in the electronic genres on SoundCloud. Dubstep and techno are the only genres with more than 50% commented tracks and as recently mentioned, they are also the genres with the most comments per track (5.3 comments on average).
Figure 4.4: Tracks versus commented tracks (genre sample)
4.2.3. Timed Comments In the genre-based collection, the total amount of timed comments is roughly 88.2% with a variance of 12.6% and a standard deviation of 3.5%. The genre minimal has the highest amount of timed comments with an average of 94.1% and the genre pop has the lowest amount of timed comments with an average of 80.6%. In the genre sample, the tendencies are for the most identical but there is a noticeably higher variance (38.4%) and standard deviation (6.2%) (see Figure 4.5). Here, the genre with the highest amount of timed comments is classical (91.7%) while pop (once again) has the lowest amount (75.4%). The genres Minimal and Ambient are here placed just beneath classical, both with 91.5% timed comments each.
40
Figure 4.5: Standard deviation of timed comments (genre sample)
4.2.4. Comment Bodies The most frequently occurring comment body-length in the genre sample is 9 characters. As Figure 4.6 shows, there seem to be an overly positive correlation between the number of characters in a comment and the number of comments up until 9 characters from where the correlation becomes overly negative. Furthermore, 93.3% of the timed comments in the genre sample have a body of 100 characters or less. This number is 95.6% in the genre-based collection and as much as 98.0 % for the random sample.
Figure 4.6: Characters in comments (genre sample)
As mentioned, the average character count in timed comments is 40 in the random sample from the random collection. The character count for the 25 selected genres in the genre sample is 42 (6.9 words) and 39 (6.5 words) in the genre-based collection, which may be deemed fairly similar. There are furthermore some noticeably differences between the individual genres. For instance, both the genre sample and the genre-based sample suggest that the timed comments of folk tracks have above 50 characters per comment on average whereas techno sits at around 30 characters per comment on average. However, the differences vary between the two samples in other areas (namely the genres blues and r&b) making it somewhat difficult to unambiguously determine the differences. But nevertheless, the data clearly indicates that the average character count in timed comments may vary greatly inbetween genres.
41
4.2.5. Comment Distribution The scatter plot in Figure 4.7 shows the distribution of all timed comments across their related track in the genre sample. A normalization of each timestamp has been performed like so: Math.round((timestamp / track_duration) * 100)
Figure 4.7: Distribution of comments relative to track duration (genre sample)
The points in the scatter plot forms an almost smooth curve and can be divided into two â&#x20AC;&#x153;ups and downsâ&#x20AC;? (or phases) as illustrated with the grey arrows. When ignoring the outliers (marked by red circles) the curve peaks at around 25 % with approximately 2000 comments and fall to half at 75% with approximately 1000 comments. Almost the exact same pattern can be found for individual genres and also for the genre-based collection (see Figure 4.8). In addition, the same measure on the genre-based collection indicates that the pattern scales fairly accurately. Here the curve peaks at around 20 % with approximately 12000 comments and fall to half at 80% with approximately 6000 comments.
Figure 4.8: Distribution of comments relative to track duration (genre-based sample)
42
4.2.6. Comment Post Dates In the genre sample, 32.0 % of all timed comments are added within 24 hours of a tracks time of upload whereas 54.3 % of all timed comments are added within a week (see Figure 4.9). Data from the genre-based collection shows somewhat different and lower result. Here, 24.7 % of all timed comments are posted within the first day whereas 43.5 % are posted within the first week. This difference was expected to be reversed (if any), as popular tracks should supposedly generate more hype (and thus get more comments) more quickly. However, the difference is most likely due to the ratio of tracks in each category. While genres are somewhat evenly represented in the genre-based collection, the genre sample has a skewed ratio, which results in genres such as dubstep (which is in general more popular on SoundCloud) having a higher impact. Hence, the result indicated by the genre-based collection seems to be plausible for popular tracks whereas the result indicated by the genre sample is unreliable.
Figure 4.9: Comments relative to upload date (genre sample)
4.2.7. Part-Of-Speech Tags
Figure 4.10: Part-of-speech tags (random sample)
Figure 4.10 shows the part-of-speech (POS) tags for all timed comments in the random sample using Eric Brill’s POS-tagger (described in section 3.3.4). The majority of all words (and symbols) are tagged as singular or mass nouns (35.1%). Some common words are here “man”, “love”, “track”, but also emoticons (e.g.
43
smiley faces), slang and abbreviations (e.g. “lol” or “thx”) and misspellings windup up in this category. The word “love” seems a bit peculiar in this context but also a word such as “one” is present indicating that people use these words as nouns (e.g. “nice one” or “one love”). The second largest category is final punctuations (8.4%) followed by adjectives (8.3%). The ten most used adjectives are shown in Figure 4.11.
Figure 4.11: Top 10 adjectives (random sample)
4.2.8. Useful Information In the chapter data analysis and mining, we established 4 reasons why a pattern can be considered interesting: A pattern is interesting if it is (1) easily understood by humans, (2) valid on new or test data with some degree of certainty, (3) potentially useful, and (4) novel. In general, all of the patterns presented here are novel, as there has been no previous formal analysis of timed comments (at least to the best of my knowledge). It has also been shown, that most patterns are largely valid on (identical for) the three different samples (collections). In order to determine whether or not the patterns are useful, it seems natural to discuss some key findings in isolation. Commented tracks The amount of uncommented tracks on SoundCloud (70.5% for the genre sample) indicate that systems that incorporate timed comments will only be useful for a minority of tracks. For instance, tracks without (or with few) comments may not meaningfully be translated into an interesting and challenging game. The findings furthermore suggest, that commenting is more common in some genres than others. This might be considered useful information when determining a target group for applications and games that incorporate (or perhaps depend on) timed comments. Timed Comments A lot of the comments are timed comments (on average 92.1% in the random sample) with a slightly varying amount from genre to genre. This can be seen very useful information, as a successful incorporation of timed comments depends on it. It would be rather pointless to create the application and games if people did not use the option of posting timed comments and just posted “regular” comments instead. Both the genre sample and the genre-based collection indicate that common contemporary genres such as rock and pop contain a lower percentage of timed comments than a genre such as techno. To me, this seems a little peculiar, as it could suggest mainstream genres to be somehow less interesting in parts or (perhaps conversely) more interesting as a whole. This is of course only speculation. Comment Bodies Knowledge of the basic properties of comment-bodies may become very useful when constructing games and applications. We may for instance whish to decide the size of an object based on the characters or words in a comment, select specific words to trigger events, or otherwise inflict content. To this end, it thus seem interesting that 44
the length of the comment-bodies as proposed by the data samples are fairly predictable. The vast majority has less than 100 characters (98% in the random sample) meaning that the systems that utilize this knowledge may not have to be flexible in constraining or modifying the data. Comment Distribution The distribution of comments can become relevant in relation to the course of a game. If the comments are converted into game content at the point of their timestamp, it may be considered useful that they are not clustered up. The curve of the distributed comments (as seen in Figure 4.7) seem interesting, as games containing such a level progression might be somewhat natural. (The curve is though merely a summarization of multiple tracks). It further seems worth emphasizing that the cause of the distribution cannot unambiguously be determined. The distribution might be influenced by the SoundCloud player interface, which can be said to encourage people to spread out their comments (as indicated by Figure 4.12). Conversely, people might also place comments at strategic timestamps to get noticed (e.g. at the beginning of a track). Hence, the graph only indicates how people comment and not why they comment as they do.
Figure 4.12: The SoundCloud player (in the process of adding a timed comment)
The findings suggest that new (newly uploaded) tracks on SoundCloud are relatively quickly being commented. For instance, roughly 25% of the comments in the genrebased collection are posted within the first 24 hours. This is useful information, as tracks can quickly become useable in application and games that incorporate timed comments.
4.2.9. Timed Comments Versus Regular Comments Due to their timestamps, some of the basic statistics of timed comments presented here are not very comparable with regular comments. The body-length though, is not dependent on the timestamp and there seem to be some interesting differences between timed comments and other comment in this respect.
Avg. comment body length Avg. words Responses to other comments
SoundCloud Timed 40
SoundCloud Regular 76
Twitter [68] 95
Myspace [69] 248.5
7
13
14
26.1
25%
Not available
Not available
Not available
Youtube [70] 95.5 (from [71]) 12 (popular) 15 (random) 23.4%
Table 4-3: Comment characteristics in various social networks
As Table 4-3 suggests, there are some noticeable differences between character and word count -between comments (and tweets) from different social networks. There are also very noticeable differences between timed comments on SoundCloud and regular comments on SoundCloud. The data indicate that regular comments on SoundCloud are almost twice as long as timed comments. Also, timed comments are considerably shorter than regular comments in other social networks. We might thus also say that an expression towards a given subject is formulated using fewer words in timed comments. This suggests that the means (or capacity), which SoundCloud provide for timed commenting, has some sort of influence on the way in which
45
people post comments. This may of course be influenced social factors and norms but an immediate and interesting interpretation might be that timed commenting somehow limits the amounts of words needed to express an opinion. A commenter might simple post the word “nice” at a specific timestamp where a specific event occurs (e.g. a guitar solo) with the intention of expressing “nice guitar solo”.
Figure 4.13: Comment counts (left Myspace comments on prodiles [69], right Youtube comments on videos [72])
As seen from Figure 4.13, both Youtube (comments on videos) and Myspace (comments on user profiles) show the same zipfian patern that was found for SoundCloud tracks in regards to the comment count of items. Furthermore, the study of Myspace indicates that 90% of the comments contain up to 226 characters. In the study of Youtube, it is similarly suggested that about 95% of comments have lengths less than 344 characters. As described earlier, my findings indicate that 98% (for the random sample) of the timed comments on SoundCloud contain less than 100 characters. Proportionally, SoundCloud thus seem vary similar to Youtube in this respect whereas Myspce comments would seem to have a somewhat more uniform length.
4.3. Classification of timed comments In this section I describe the experimental process of training a classifier to differentiate between timed and non-timed -content of timed comments. The classifier serves mostly as a proof-of-concept, as it has not been properly trained and tested as of this point. I begin, by discussing the way in which we might distinguish between timed comments with timed content, and timed comments without timed content. I then go on to briefly describe the process of training a Naive Bayes classifier, which has been on the basis of the outtakes from defining timed content. Data characterization and visualization can tell us overall information about timed comments, but these techniques cannot really tell us any detailed information about the content of the comments and the meaning which the hold. When is a timed comment actually “timed”? And what constitutes a timed comment in terms of content? To approach these questions we can use classification. As earlier described, the process of training a Naive Bayes classifier is a supervised process where a person (teacher) tells the classifier how to distinguish between classes. In the case of timed comments, this can actually be rather difficult because the distinction between “timed” and “non-timed” can be very subtle and fuzzy and sometimes subjective to determine. I believe, that seeking an understanding of this difference and training a classifier to “endorse” our understanding is evident. Both the process of training a classifier and the trained classifier might help us study and understand what constitutes an actual “timed” comment.
46
4.3.1. What constitutes timed content? A timed comment is per default related to a specific timestamp but this doesn’t necessarily make its content timestamp related. Whether or not this is the case is a matter of opinion and criteria. To objectively determine whether or not a timed comment is time-code related is a fairly complicated classification problem as the embedded meaning of a comment can sometimes be ambiguous. To elucidate the problem, I will briefly discuss four examples of comments ranging from having (what I consider to be) very unambiguously time-code related content to having very ambiguously time-code related content.
Figure 4.14: Timed comment - Example 1
Figure 4.14 shows an example of a timed comment. In this example, the user is clearly and unambiguously referring to the part of the track where the comment has been posted. This is clear because of the key phrase: “this part”. Another similar phrase that might always be considered definitely timed would be a phrase like “right here”. For instance, “I like the bass right here”. Hence, there are some types of comments that we can always presume contains timed content. (It is of course possible that the commenter makes a mistake or intentionally posts wrong.)
Figure 4.15: Timed comment - Example 2
In Figure 4.15 the comment text says "nice mellow intro", which is a music related opinion but not necessarily timed. It is highly likely that it is timed but may, however, be debatable. If the sentences had instead been "this intro is nice and mellow" then the sentences would not make sense as non-time related. However, when a comment like this one is known to be a timed comment, we might make the assumption that it is related to its timestamp. In fact, this can be perceived as an advantage of timed comments as the notion of timed automatically implies timed meaning. As pointed out in the previous section (4.2.9) timed commenting somehow limits the amounts of words needed to express an opinion. Also, the comment simply makes more sense to post where the intro is actually playing as opposed to in the end of a track. The same comment would also make sense as a non-timed comment but if we know that a comment like this one has a timestamp, it seems somewhat reasonable to presume that its content is timestamp-related.
Figure 4.16: Timed comment - Example 3
The user in the above example (Figure 4.16) expresses a satisfaction towards the beat and bass in the track, but as in the previous case (though more unclear), the user may 47
or may not referrer to the specific time where the comment is posted. Here, notion of the word “this” does not have the same effect as in “this part”. “This” might here simply referrer to the track as a whole. It seems similarly unclear whether or not the “love the beat/bass” part of the sentence referrers to a timestamp or just the entire track. In order to get adequate insight as to the relevance of its timestamp, we might actually need to listen to the track in this case.
Figure 4.17: Timed comment - Example 4
While the previous example might be determined by listening to the track, the example in Figure 4.17 cannot be unambiguously determined without consulting the user that posted the comment. This is the case for sentences like “this is nice” or “this is dope” and also for those omitting the “this is” part and simply states “nice”, “dope” or “brilliant”. Based on my own encounters with timed comments on SoundCloud, this last example is probably the most frequent way in which sentences are structure, which is also somewhat supported by the findings from data mining that indicated the most frequent amount of characters in a comment to be 9 and thereabout.
4.3.2. Training The Classifier Iteration 1 Based on the above reflections as to what constitutes timed content, I have attempted to train a naïve Bayes classifier on 2500 comments (100 comments from each of the 25 selected genres). For this I have used Node.js with a module called Natural (see Appendix A). The standard settings of the naïve Bayes classifier in Natural was used, which include automatic stemming and tokenization of the documents (comments). I decided to treat the classification problem, as a binary classification problem, meaning that instances were interpreted as either timed or non-timed with no inbetween. I further chose to only identify comments that clearly related to a given timestamp (equivalent to the examples in Figure 4.14 and Figure 4.15). In order to train the classifier, I constructed a basic html form to load, display and submit the classifications, as shown in Figure 4.18.
Figure 4.18: Classification interface
The classifier was instructed to classify each comment to the best of its knowledge before emitting the comments to the front-end of the application in the hope that the process of labeling the comments would slowly become more and more automated. This first iteration resulted in the following ratio of timed and non-timed comments in the classifier:
48
"classTotals":{"nontimed":2126,"timed":276},"totalExamples":2401}25 Roughly 13% of all the comments were labeled as timed giving some indication to the amount of timed comments that may actually contain timed content. The classifications were of course subjective assessments, but after all, with a basis for making decisions derived from the definition of timed content. However, the skewed distribution of timed and non-timed comments caused a problem for the classifier. In relation to classifiers in general, studies have shown that minority class-examples accounts for a disproportionately large percentage of the errors [73], [74]. Minorityclass predictions tend to perform worse that majority-class predictions and the minority class-examples are also misclassified much more often than the majority class-examples. This problem is often refered to as skewed data bias. One way of tackling the poor assumptions of a Naive Bayes classifier is by using complementary naïve Bayes. While in regular Naïve Bayes, weights are estimated using training data from a single class, c, Complementary Naive Bayes estimates parameters using data from all classes except c. However, the “easy way” of dealing with the skewed data bias is by using the same amount of training examples for each class. Since I was using the pre-made Naive Bayes classifier in Natural, I chose this “easy way” of dealing with the problem. I thus decided to train a new classifier where excessive examples in the majority-class were ignored. Iteration 2 From training the first classifier, I had learned that it was extremely difficult to differentiate between timed and non-timed content in many cases and it seemed as if most music-related comments were also time-related. For this reason I chose to change my criteria from timed content to music-related content. However, the manifestations in the comments were still required to be specific and/or aesthetic to some extend. For instance, a comment such as “nice track” was not considered musicrelated whereas a comment such as “I love the sound of techno” was. I furthermore decided to use a “stop word list”26 in the hopes of optimizing the accuracy additionally. The classifier was then trained on 2400 (different) examples: "classTotals":{"music_related":335,"nonmusic_related":335},"totalExamples":669} There was an 18% increase of training examples (59 examples) classified as musicrelated as oppose to timed indicating that most comments containing music-related content might often be timed as expected. The new classifier was however not trained on the exact same training set. However, there was no immediate improvement in the results. An option could be to treat the comments as multi-labeled instances [59]. In multi-label classification, instances can belong to several classes simultaneously. In this way, comments may be labeled as timed and/or music related independently. Yet another way of categorizing comments might further be to look at the explicit intention that is contained within them. At this point, I decided not to spend any more time on training classifiers. The classifier worked to some extends and with some tweaking (requiring it to be very confident of its classifications) it seemed to achieve a decent performance.
A minor error must have occurred while training this classifier as the total examples are 2401 and not 2400 and when adding together the timed and non-timed comments, the amount is 2402. 26 http://geeklad.com/remove-stop-words-in-javascript 25
49
4.4. TF-IDF Term frequency documents have been assembled for both data mining and information retrieval purposes. This has been done using Node.js with the module Natural (see Appendix A). The objectives have been to (1), investigate differences between the content of timed comments in the 25 selected genres and (2), to facilitate the retrieval of relevant parts of tracks based on a query. In other words, the idea is to create an efficient way of actively searching for timed comments. The retrieval task has been separated into three separate steps: 1. Locate the most relevant genre for a query 2. Select the most relevant track in that genre (for the query) 3. Select the most relevant comment in that track (for the query) To allow the first step to take place, the comments from 1000 tracks in each of the 25 genres have been combined into single documents and stored in a single JSON27 file (here called the genres file). Hence, the file contains 25 documents; one for each genre. On this level, it can thus be estimated which of the genres that a given search term or phrase is most closely related to. To allow the second step to take place, the comments from 10000 tracks in each genre have been combined into documents representing each of the tracks (10000 documents). These are stored in separate JSON files (one for each individual genre). On this level, it can be estimated which of the tracks in a genre that a given search term or phrase is most closely related to. In the third step, the track that has been identified in the second step is retrieved from SoundCloud (in “real time”), as are its associated comments. Each comment for the identified track is then ranked based on the search query and the most relevant comment can then be retrieved. On this level, it is thus estimated which of the comments that are the most relevant to the query that initiated the search.
4.4.1. Stop-Words Removal When retrieving the top terms for each of the genres, it became apparent that there were a lot of indistinct words such as “nice”, “love” and “thanks”. At this point, the standard stop-words list in Natural had already been implemented. Therefore, a rather harsh stop-words list was used to remove indistinctive words, spam, and foreign words. This stop-words list includes words such as “fantastic”, “facebook”, “music”, and “download”. All words which might be considered irrelevant for distinguishing between genres. The list could have been even harsher by only accepting music terms but words such as “remix” and “sick” were deemed interesting and distinctive. The just described stop-words list as well as the initial retrieval results can be seen in Appendix B.
4.4.2. Genres Comparison The result of implementing the harsh stop-words list can be seen in Table 4-4. These results indicate that the content in timed comments does in fact differ from genre to genre. They further suggest some subtle differences of musical terms in-between the genres, some of which have been highlighted with colored squares. A couple of peculiar examples in the table are that:
27
JavaScript Object Notation
50
-
Classical is the only genre where the word “composition” occurs in top 10 The word “smooth” only appears in Jazz, RnB, and Soul The genres in which people talk most about guitars are rock and heavy metal Most genres has the genre (word) as a top term
These examples are of course only true to the top 10 terms with the implemented stop-words list. It does however seem fair to say, that most of the words in the genres are words that we normally associate the genres and not merely random words. As with the above examples, there are can be made countless connections such as: “guitars” are common in rock and not in hiphop, and “atmosphere” is only discussed in ambient music (based on the top 10).
Table 4-4: Top 10 terms for each of the 25 genres using harsh stop-words list
4.4.3. Retrieving Information (Words) The querying of different music-related words was performed in order to get an indication of how the ranking would work. Table 4-5 depicts the results of a search for 10 music related terms where the most relevant genre is highlighted with a red square. It seems somewhat reasonable to assess the results as correct. Some expected connections are: pop and chorus, jazz and chords, and folk and lyrics. It thus seems plausible that the system will be able to predict the correct musical genre for a given word.
51
Table 4-5: TF-IDF test retrieval (term frequencies in genres)
4.4.4. Retrieving Information (Phrases) Another task might be to retrieve phrases instead of just words. In this case, we are interested in the combined TF-IDF of multiple terms. In Natural, however, the IDF is not normalized; it is simply the count. For this reason, some words in a phrase may gain more influence than others and we are interested in retrieving documents that match the entire query. Therefore, a normalization of the term counts is done using the following formula: Term frequency (document, term) = 1+log(1 + log( term count ))
4.4.5. Popularity Bias A significant issue of the TF-IDF model developed here is that it favors tracks with a lot of comments. In retrospect, it would be more natural to â&#x20AC;&#x153;weight-downâ&#x20AC;? tracks that has been commented a lot to avoid such a popularity bias. This has however not been considered important for the purpose of testing the technique. The location of an appropriate genre does not suffer from the same bias, as the TF document here has been populated with an equal amount of comments from each genre. Likewise, when selecting the actual comment, each comment should have a somewhat equal change of being selected. Comments with longer bodies may however have a greater change of being relevant, as they contain more words.
52
CHAPTER 5
Digital Artefacts
5.
I will now present three digital artefacts, which are the results of practical experiments that have been conducted throughout this research project. The artefacts incorporate timed comments in different ways in order to facilitate online music discover and exploration. Before these three artefacts were constructed, I performed some initial experiments, which are briefly described in Appendix C but not prerequisites for the upcoming presentations. A description of the various Web technologies and tools that have been used to construct the artefacts can be found in Appendix A. The first system that will be presented is the Timestamp player, which is envisioned as a music discovery system. Secondly, a suite of games, which have been dubbed Timed Comment Games will be presented. As opposed to Timestamp Player, this suite mainly demonstrates ways in which timed comments might be utilized for music exploration. Finally, I will present Instamash, which is a music composition application. Like the Timed Comment Games, Instamash is designed for exploration but in a different and more explicit way than the games. (This difference will be clarified at a later point.) Both Instamash and the Timed Comment Games do however have discovery aspects but they have not been designed for discovery purposes. The presentations are each divided into a descriptive section and a discussion and evaluation section.
53
5.1. Timestamp Player Try it at: http://tsplayer.herokuapp.com Timestamp Player is a simple experimental audio player, with an embedded recommendation system and search engine. The system accumulates tracks from SoundCloud based various criteria, which are used for automatic playlist generation. The player (Timestamp Player) then initializes playback from the timestamps of timed comments in the tracks and will automatically skip to the next track every eighth second. The user can postpone this skipping behavior of the player by pushing a “Keep Listening” button.
Figure 5.1: Timestamp Player (Overview)
Timestamp Player is designed for discovery and not for exploration. Hence, it is indented for situations where a user whishes to find new and unknown music. There are three different ways in which a user can use the system, which can each be viewed in terms of the users level of activity. Passive Discovery One way of using the system is by listening to music from the timestamps in tracks where a selected user or following28 has posted a comment. A following can be selected either by connecting to SoundCloud through the player or by typing in a username of a SoundCloud user in the search field. When a user has been selected, each track that the given user has posted a comment on is retrieved and played-back from the timestamp of the comments. Comments on tracks, which are uploaded by the selected SoundCloud user, are left out by the system. One reason for this is that the interest is expected to be in the selected users listening habits and not in his/her own tracks. Also, users tend to express gratitude in comments on their own tracks in response to the comments by other users (e.g. “thank you” or “appreciate it”) and often multiple times on the same track thus resulting in repetition of tracks and the playback from potentially indifferent timestamps.
28
On SoundCloud, a following is a other users that you follow
54
Active & Passive Discovery The second way of using the Timestamp Player is by searching for one or multiple tags. This way of using the system thus requires the user to have some idea as to the type of track that he/she desires. This might be a certain genre of and/or mood in tracks. For instance, some rewarding search phrases might be “happy jazz” or “hard dubstep”. Even though this way of using the system requires the user to supply an input query, the system still automatically filters information (the comments of a track) for the user. It does this by attempting to locate the most timestamp-related (or music-related) comment using the classifier described in section 4.3.2. Active Discovery The last way of using the system is by searching for timed comments. Moreover, it is the search for timed interpretations of the audio signal that are contained within timed comments. To successfully use the system in this way, the user needs to specify an event that might occur in tracks. For instance, it is possible to search for a “guitar solo” or a “breakdown”, which can potentially result in the initialization of playback from a guitar solo or a breakdown if the retrieved comment has been posted with accuracy by the commenter. To enable this type of search, the TF-IDF ranking described in section 4.4 has been implemented, which therefore entails that only the tracks in the document frequency documents can be located in this way.
5.1.1. Visual Design The Timestamp Player is separated into three parts: An upper part (containing track and artist information), a middle part (contain comment information), and a bottom part (containing a single button). The player has been equipped with a grey-striped and fuzzy (noisy) surface. The font colors have been limited to a white base-color and also an orange accent color, which is inherited from SoundCloud.
Figure 5.2: Timestamp Player (the actual player)
55
The upper part of the player shows information about the playing track including its artwork, title, artist, and the current playing time. The time has a prominent size to indicate that the time (and thus timestamps) is an important part of the application.
Figure 5.3: Timestamp Player (upper part)
The middle part of the player displays metadata about a comment and the comment itself. A “handwritten” font has been chosen to emphasize the fact that comments are subjective interpretations of the audio (contextual descriptors). The background of the comment area is a dark grey and “noisy” background with a little bit of innershadow applied in order to imply depth. The comment-text is centered and placed in the mid-part of the design because it is considered the focal point of the player.
Figure 5.4: Timestamp Player (middle part)
In the bottom of the player we find the interactive element, which is the “keep listening” button. The button is a standard HTML <button> element that has been enlarged with the purpose of making it clearly visible. The color of the button is kept white in order to make it stand out from the dark grey background, as an interactive (clickable) element.
Figure 5.5: Timestamp Player (lower part)
The simplistic design of the player is done with respect to the swift changes between tracks every eighth second. If the interface displays too much information it might become stressful to use the player. The user is both meant to read the comment (and optionally its metadata), listen to the music, read the track title, and read the artist name. Based on these inputs, the user then makes a decision as to whether or not further listening is desired. These might be considered abundant tasks to perform within eight seconds and especially if the comment contains a lot of words.
5.1.2. Interaction The Timestamp Player (excluding the general interaction on the Web site) is very simple and allow for very little interaction. In fact, the only interactive element is the
56
“keep listening” button, which is therefore an important part of the player. Due to this design decision, the Timestamp Player may be said to become highly accessible and also, to have a very flat learning curve. As the name suggests, the “keep listening” button inverts the usual function in an audio player of “skipping” to the next track. Instead, the track (as mentioned) automatically skips to the next track after 8 seconds unless the user pushes this button. When pushed, the value (text) of the button is changed to “stop listening” and its functionality is reversed to that of a more normal skip button.
Figure 5.6: Keep Listening button
As it was discovered in the data mining section, the average length of a timed comment is relatively short in comparison with regular comments on SoundCloud and comments in other social networks. Furthermore, the most comments (98%) was actually less than a hundred characters. For this reason, 8 seconds seems like a fairly decent amount of time for managing to read the text of the comments.
5.1.3. Related Work Music recommendation is common and prevalent in many of today’s digital music services and shops [75]. The most common ways of recommending new music to users is by artist and track similarity. Spotify29 has Artist Radio, Itunes has Genius, and popular music services such as Pandora30 and Last.fm31 even has artist and track recommendation as a focal point. Recommendation is also an active area of research in the Music Information Retrieval community. Here, some approaches are occupied with interlinking several music-related data sources from the Web to provide more relevant recommendations [76] while other approaches use more fractural data such as lyrics [77]. In [78] an audio player called NextOne Player is presented. The player recommends tracks based on a users behavior. For instance, if the user skips a track, other tracks that have similar audio features will not be recommended in the future. The system MusicSence [79] instead takes a contextual music recommendation approach. Here, music is matched to the text in Web documents. Hence, music is here selected based on the emotions that are expressed in a given document.
5.1.4. Discussion & Evaluation Timestamp Player As A System Timestamp Player is a clear example of a music discovery system, as it only really supports the discovery of new music. The user cannot really predict the outcome of a search in terms of tracks or artists and neither is this intended. Hence, it is not possible to locate specific items based on a search query and discovered tracks will change in conjunction with activity on SoundCloud. Moreover, the users control has been made very limited. As described, there is only one way of interacting with the system, which is using the “keep listening” button. Among other things, this forces the user to constantly move forward. For these reasons, the system does not openly call for the immersion in and exploration of -music but for discovery of new and
29 30 31
http://www.spotify.com http://www.pandora.com http://www.last.fm
57
unknown music. That is, when exploration is viewed in relation to music (here tracks) and not in relation to SoundCloud as a whole. It is instead a system focused on efficiency. Recommendation When using the system to discover tracks that another user or a following has commented on, it can be seen as an overly passive activity but perhaps more importantly, there is no expressed (describable) information need of the user. The user is not searching for a genre, artist, or track but merely adopting the discoveries of another user, which makes the discovery passive from an intentional point of view. For these reasons, the system here works as a recommendation system and it can furthermore be depicted as a social recommender as the recommendation is based on preferences that were provided by the users friend’s (followings) in terms of timed comments. Moreover, the system does not need the user to provide additional information in order for the recommendation to take place. It is however debatable whether or not comments can actually be viewed as preferences. A comment might be posted to express appreciation towards a piece of music but it may just as well be posted to display aversion, promote oneself, or with some complete different agenda. Also, the problems may not be posted for any sort of organizational purpose, which may be deemed different from a preference. But this is also what makes Timestamp Player seem novel to some extend. Sure, it may not be very interesting to “follow” the commenting of a user that posts blatantly but it might be considered interesting to be exposed to critical comments about music in the same way as critical music reviews can be interesting to read. Search Enabled Recommendation When using the Timestamp Player to search for tags, the application serves as a more of a hybrid system. It has characteristics of both a search engine and a recommender system. The user must have a somewhat describable information need but the system is not designed for discovering the one best match for the query. As a recommendation system, the system can further be viewed as a hybrid recommender. The system compares comments based on their classifications, which can be seen as a type of content-based recommendation. To this end, the system is based on human evaluation of tracks (log-data in the form of timed comments) reinforced by machine analysis. It would therefore be somewhat misleading to call this social recommendation, as the comparison between comments is done independently of the users preferences and identity. The system analyses the content of comments, not the user who posted it. The filtering of tracks, however, is done using collaborative filtering. The tracks are gathered based on their tags in combination with the hotness (popularity) parameter provided by the SoundCloud API. A track’s hotness is determined by a number of factors, including the number of favoritings [80]. Hence, SoundCloud users determine the tracks that are presented to the user of Timestamp Player whereas the timestamp for each track is determined by classification by the system itself. Search As described, the system can also be used to explicitly search for timed comments. In this case, there is a negligible aspect of recommendation left. The system no longer filters information in an incoming stream of information: it simply retrieves information thus turning it into a search engine. When performing a search for timed comments, the system is only as accurate as a search query allows it to be. It cannot be expected to retrieve parts that contain guitar solos if the query is only for “guitar” (though this might coincidentally happen). Hence, the user must have explicit and describable information need in order to properly utilize the system to
58
its full potential in this scenario. The system still automatically constructs a playlist but this is simply a TF-IDF ranking of the tracks that match the query. Utilizing Timed Comments In continuation of the discussion about what constitutes timed content that was presented in section 4.3.2 there can be said to exist three overlapping categories of timed comments. As depicted in Table 5-1, these categories are bland, reflective, and specific. In general, bland timed comments seem to be the most common on SoundCloud. These are all the “hollow” praises, the links, and the self-promoting comments. The reflective timed comments may be music related but they can also simply be an aesthetic judgement or feeling obtained from the listening experience. Finally, we have the specific timed comments; the comments that would not make sense if they were posted as regular comments. Bland nice track
Reflective (Aesthetic) Sounds way better than singing into a plastic bag!
Specific (time-related) Awesome build up, super chill jam, so summery! Nice house rhythm on the xylo. these synth effects are scary good, dood.
WOW!!! love this remix man
I absolutely love this track, but the mix is the worst.
ahh yeah!
Kinda reminds me of daft punk's Tron legacy solar sailor
Please take a second to follow me on Facebook, I would appreciate your support! dope
Dam funky!
the parts that you slowly filter in are awesome. and also leading up to 4:43. awesome. chorus status: excellent
This is my new alarm, instead of snooze, I get this.
Ugh, I want this snare so badddd. Great drum samples.
Table 5-1: Three main commenter types (examples from SoundCloud).
There is further a hierarchic relationship between these timed comment-types and the three ways of using Timestamp Player, as either a pure recommendation system, a combined search and recommendation system, or a search engine (see. Table 5-2). The way in which the system is used can be said to determine the types of comments that might be interesting. Recommendation Search & Recommendation Search
Bland X
Reflective X X
Specific X X X
Table 5-2: Acceptable comment types based on usage
In terms of pure recommendation (following another persons commenting history), all the comment types may be deemed acceptable. When following a users commenting history, we may be able to tolerate bland commenting if we share his/her musical preferences to some extend. For instance, we may think that a track is indeed a “nice track”, if we agree with the commenter but otherwise, the comment “nice track” is basically noise. When getting recommendation enabled search for tracks with certain tags, bland comments are no longer interesting enough, as we no longer have any affiliation, or insight in the commenter’s music taste. Hence, it will be completely random whether or not we enjoy the same music that he/she does. The comments must here be either
59
reflective, time specific, or a combination of both. For instance, if a search is made for “happy jazz” and the returned recommendation display comments such as “nice track”, it seems unlikely that we will actually share that interpretation, as we know nothing about the commenter’s music taste. When using the system for pure search, we are virtually only interested in timed comments that are time specific. For instance, if a search is made for “guitar solo” and the results are simply bland comments such as “great track” or irrelevant reflective comments such as “dam funky” it defeats the purpose the search. We might then just as well search for tracks (tags) instead. Regardless of the way in which the system is used, we might also say that it challenges (or questions) the way in which users post comments. When disregarding music taste, the experience will arguably be more engaging if the commenter has taken timed commenting seriously. In relation to recommendation, the system might further be said to provoke a question of: who is the better commenter? Or in a less competitive sense, we might also say that the system encourage timed commenting. Audio Previews Audio previews can be useful for music recommendation where the intention is to quickly determine whether or not further listening is desirable [37]. It is also a common technique for giving users a brief introduction to music that is often used in online music stores such as Itunes and Beatport in a “listen before by” fashion. In these cases, it may be considered imperative that the user gets to listen to an interesting part of a given song. For instance, a user might prefer to skip straight to the chorus of a pop song with this purpose in mind. Timestamp Player is specifically designed to facilitate this type of swift and effective discovery of new music. An innovative aspect of Timestamp Player is its ability to purposefully jump (or skip) to a certain position in a track and play a preview. When the Accuracy & Efficiency Timestamp Player is somewhat unreliable and unpredictable in its current state. When using the system for pure search, it often yields what may seem to be arbitrary results. As mentioned, TF-IDF ranking is used for locating tracks. The system thus first locates a genre, then the highest ranked tracks in that genre (using all of the tracks comments), and then the most relevant comment for the highest ranked tracks. For this reason, some comments can be more relevant than others. Imagine that we searched for “guitar solo” and two tracks are located: track A, and track B. Track A contains 100 comments but only 1 of the comments containing the words “guitar” and “solo”. In contrast, track B contain 100 comments in which 50 contain the word “guitar” and the other 50 containing the word “solo”. So which track is the most relevant? In this case, track B is chosen by the system, as it is most relevant as a whole even though the first track has the most relevant comment. This is somewhat of an issue with the current ranking system and the same actually applies to the systems distinguishing between genres. However, it would be probably be inconvenient for the system to look through every single timed comment to find the most relevant ones.
60
5.2. Timed Comment Games Try the games at http://timedcommentgames.herokuapp.com In this section, I describe the process of transforming three classic video games into dynamic data games by incorporating timed comments. I then go on to discuss and evaluate the impact of the transformations in relation to formal and cultural aspects of the games.
5.2.1. Emulating Classic Games I have constructed a suite of three games from the “golden age” of arcade video games, which were all originally created by the game company Atari32. The suite has conveniently been named Timed Comment Games. The suite consists of three games: Asteroids, Breakout, and Missile Command. To avoid confusion, the emulations will be referred to as SC Asteroids, SC Breakout, and SC Missile Command (SC here stands for SoundCloud). The immediate reason for emulating games was two fold. (1) The impact from incorporating timed comments may become more discernable in the light of (and comparison with) the original games and (2), the tasks of designing entirely new games could be limited. In terms of this second reason, the need for (and explanation of) operational rules in well-known games can be deemed less necessary, which allows the focus to be on the conversion of timed comments into game content. (I will thus spend very little time explaining the gameplay of the emulated games.) Furthermore, constitutive rules have largely been “borrowed” from the emulated games and made susceptible to timed comments through modification. Additional constitutive rules have then been added where found appropriate or necessary. The selected games were chosen, as they each have a distinct, characteristic, and simple gameplay. They were thus expected to emphasize and elucidate the possibilities of using timed comments for generating game content differently. Also, it was estimated that each game would, to some extend, be susceptible to dynamic data without its gameplay becoming corrupted or ruined. In striving for games that were susceptible to dynamic data, some games quickly became more suitable to emulate than others. For instance, a classic game like Tetris has a very open-ended level design allowing infinite play with no specific time limit. The game ends when the player can no longer manage to place the bricks properly. This is a feature that is irreconcilable with audio tracks, as these always have exact durations. Another challenge with Tetris would be dealing with the time that passes between each block that is provided to the player. These are provided one at a time and not before the previous brick has been placed. In contrast, timed comments may be stacked on top of each other (when replying to a comment) and they may also be within milliseconds of each other. It would thus require modification of the comments in order to make Tetris compatible with the average SoundCloud track. Hence, it seems improbable to be able to generate meaningful data conversions in games like Tetris where the gameplay and the generation of game content differs greatly from the “nature” of timed comments. Another substantial selection factor was a visual one. The three selected games al have simple graphics that can be created procedurally as oppose to manually. Some examples of popular games from the same era that would need manually created graphics (sprites) or otherwise complicated (or at least sizeable) drawing algorithms
32
www.atari.com
61
are Donkey Kong and Super Mario. Procedural generation allows for highly interchangeable graphics and can adapt to varying inputs whereas sprites cannot be easily altered without loss of quality (i.e. pixilation). Premade sprites would have to be stretched or skewed to adapt in size and the change of colors would be similarly intricate. Simple procedurally generated graphics and dynamic data thus seemed to go well together, as procedural graphics are somewhat more flexible/dynamic. Also, the workload of manually creating graphics was not considered necessary and worthwhile in respect to the goals of this thesis. The Original Games
Figure 5.7: Emulated games (from left: Breakout, Asteroids, and Missile Command)
Asteroids (1979) is a space shooter where the player controls a spaceship in the middle of an asteroid field. The operational rules of the game are essentially: shootand avoid asteroids. The space ship can rotate left and right, fire shots forward and fly forward. When the player shoots an asteroid it is divided up into two smaller asteroids. However, the smallest asteroids in the game do not break into smaller pieces but simply disappear. Breakout (1976) is a game in which the player is equipped with a paddle and attempts to break bricks with a ball. The game begins with eight rows of bricks, with each two rows having a different color. The player can only move left and right and set the ball in motion at the beginning of a turn. Missile Command (1980) is perhaps the least famous of three. In this game, the player aims with a crosshair and pushes one of three buttons in order to launch missiles. The purpose of the game is to defend from enemy missile attacks. The game highly emphasizes timing (and precision), as an essential part of its gameplay. Table 5-3 shows some basic differences between the games. Aspect/Game “Enemies” (graphics) Movement
Asteroids Asteroids (polygons)
Game Type Player role Setting “Enemies” (appearing) Infinite levels Enemy movement
Action (space-shooter) Attack/Defense Space Entering (from all sides) False True
Free (all directions)
Breakout Bricks (squares)
Missile Command Missiles (lines)
Action (Ball & Paddle) Destroy? Sports? Visible (from start)
Static/Free mouse movement (aim) Defense Defense Land Entering (from top)
False False
False True
Left & right
Table 5-3: Various differences between the selected games for emulation
The Emulations Several iterations of all three emulated games have been made in parallel with the process of data mining. New findings from the process of data mining have made ways for new ideas in the games whereas new game related ideas have required more exploration of the data (timed comments). I will now present each of the emulated games in turn. These are merely descriptive walkthroughs of each game and not discussions or evaluations. It should be mentioned, that all three games are
62
prototypes and thus all contain several bugs. For one, the games all crash if the player is killed or otherwise looses the game. It should also be pointed out that all of the emulations use tracks from SoundCloud for background music in the games. That is, the track that the timed comments belong to is played back while playing a level for that track. When presenting a game, I will only describe data conversions that are found specific or particularly interesting -to that game. Additional conversions of relevance are depicted in tables.
5.2.2. SC Asteroids In my emulation of Asteroids, timed comments are primarily converted into asteroids in the game. Various attributes of the asteroids such as shape, color, movement speed and position are inflicted by the comments. Table 5-4 shows some of the most significant data conversions for the game.
Figure 5.8: SC Asteroids (in-game)
SC Asteroids is the only game in the suite in which comments are used to inflict the size and shape of the enemy objects. The number of characters in a comment determines the radius (in pixels) of an asteroid. However, a maximum of 100 characters was imposed on the game to prevent some tracks from becoming unplayable, as some comments might otherwise fill out the entire screen. Comments that exceed 100 characters instead gain shield, which makes them capable of enduring more shots from the player. The number 100 was chosen in the light of a finding from the data analysis process, which showed that 95.6% of all comments have less than 100 characters. Small asteroids are given a faster movement speed than large asteroids again based on the character count in the comment. Object Attribute Asteroid Shape Asteroid Size Asteroid Color (border) Asteroid Health (life) Asteroid Start Coordinates Asteroid Movement Speed (x, y) Asteroid Movement angle (direction) Asteroid Motivation
Inflicted By (comment attribute) Word count (one side per word in the comment) Character count (radius, max 100px) POS-tags (JJS = green, WRB = blue, MD = Red, CD = Yellow) Character count (for each 100 characters, one health point is added). Post Date (angle from post date, relative to year ď&#x192; between 0 and 365) Character count (the less the faster) User id (user id normalized to degree between 0 and 360. Hence, an id of 460 = 100) POS-Tags (JJ (adjective))
63
Word Asteroid Power up (Item) Asteroid Childs (splits) Asteroid Rotation Background Stars
POS-tags (JJS = earth, WRB = water, MD = fire, CD = air) Word count (A) Post Date (Based on hour of the day ď&#x192; 24 = fastest, 0 = slowest) Character count (each comment of a track is turned into a star)
Table 5-4: Most significant data conversions in SC Asteroids
The game deviates from the original game in a couple of areas. For one, four different power-ups have been added in SC Asteroids: fire, water, earth and air. These power-ups are inspired by the four elements (the classic elements) as depicted in Figure 5.9. If they are all gathered they can be combined into a fifth element: spirit33. It was first considered to convert comments that contain one of the words fire, water, earth or air into these power-ups but this would be highly unpredictable. These words have no natural connection to music and it seems illogical that they should occur at regular intervals. Therefore, some of the part-of-speech tags that were found to occur in appropriate amounts were used instead.
Figure 5.9: SC Asteroids power-ups
5.2.3. SC Breakout In my emulation of Breakout, timed comments are converted into bricks. Table 5-5 shows additional data conversions that are key to the game.
Figure 5.10: SC Breakout (in-game) (two different game modes)
Out of the three games, SC Breakout has the most focus on utilizing the textual content of the timed comments and the game also displays full comments when bricks are hit. The colors are determined through sentiment analysis using Musicmetricâ&#x20AC;&#x2122;s Sentiment Analysis (see Appendix A). Red bricks are negatively charged comments, green bricks are positively charged, and yellow bricks are neutrally charged. In addition, the more negatively charged comments (bricks) must be hit more times with the ball to be destroyed.
33
This feature is not implemented yet
64
Figure 5.11: SC Breakout level design system
The comments are not converted into bricks at the given timestamp of each comment, as this would completely deconstruct the constitutive rules of breakout. Instead, the bricks are used to give an overview of all the comments at once. The bricks are ordered (positioned) in one of two ways resulting in two different game modes. In the first way, the timestamps of a track are used to order the bricks. As shown in Figure 5.11, the bricks are placed from left to right based on the comments timestamps. If there are more than 30 comments within a minute a line break is created resulting in two consecutive rows. When all comments from the first minute have been converted into bricks, an empty line space is inserted, which illustrates the start of the next minute of the track. This process continues until all comments have been converted to bricks. Figure 5.12 shows how different tracks are converted into levels and how different the results can be.
Figure 5.12: Three different tracks
In the second way of ordering the bricks, they are sorted based on their colors (and thus their sentiment score) and placed with negative (in the top) to positive (in the bottom) (see Figure 5.10 on the right). This was done experimentally with the purpose of providing a simple visual expression closely resembled that of the original and also to alter the gameplay. In this game mode, comments with more than 100 characters are turned into grey indestructible bricks. Object Attribute Brick Color Brick Health Brick Position (timestamp based) Brick Position (sentiment based) Comment Text, confidence, and score Power-ups
Inflicted By (comment attribute) Sentiment analysis (negative = red, somewhat negative = orange, neutral = yellow, somewhat positive = light green, positive = dark green) Sentiment analysis (negative has more health). Timestamp (from left (low) to right (high) ď&#x192; new line when more than 29 in a minute. 30) Sentiment analysis (ordered by sentiment from most negative at the top to most positive at the bottom. 30 per row. Empty space when new sentiment) Sentiment Analysis + Comment (full comments are shown when brick are hit) TF-IDF (Top terms for the genre that matches the comments of the track)
Table 5-5: Most significant data conversions in SC Breakout
SC Breakout integrates TF-IDF ranking, as described in section 4.4, in order to determine the most likely genre of a track based on its comments. The 10 most important terms for that genre can then be used to trigger events in the game with the goal of taking into account some cultural aspects. As of right now, these 10 terms
65
are used to add the same power-up to the bricks but this feature has not been properly developed as of yet.
5.2.4. SC Missile Command In SC Missile Command, timed comments are mainly converted into enemy missiles. Table 5-6 shows an overview of the most important data conversion.
Figure 5.13: SC Missile Command (in-game)
In the original Missile Command game, enemy missiles are sometimes split-up into several missiles when approaching the ground. This is emulated in SC Missile Command. For every 8 words in a comment, the missile gets an extra â&#x20AC;&#x153;split missileâ&#x20AC;? embedded. This is a vague abstraction of average word count (8 words) discovered in the data mining section. The split-time (vertical position point) is calculated from the comments character count. Comments with more than 100 characters are converted into enemy planes (again inspired by findings from data mining), which also exist in the original game. For each 50 characters in a comment, the plane gets an extra missile onboard. The cities (the six structures that are to be defended) are generated procedurally from the characters and words in the comments. Object Attribute Cities Shape (pattern) Missile Splits Missile Launch Missile Direction Display Words Text
Inflicted By (comment attribute) Word count and character count (one side per word in the comment) Word count (1 split for every 8 words) Timestamp (attacks from above) Date (day of the week excluding (Sunday attacks a launch tower)) Timestamp (from left (low) to right (high) ď&#x192; new line when more than 29 in a
Table 5-6: Most significant data conversions in SC Missile Command
In the original missile command, a player has 10 missiles in each of the 3 launch bases. This is highly improbable when the input data is dynamic. The duration of a track can vary as well as the number of comments making some levels impossible to complete with only 30 missiles. Conversely, the game becomes very easy if an unlimited amount of missiles can be fired at an unlimited rate. Therefore, the missile fire-rate has been limited and each base starts out with 3 missiles (9 in total). Whenever a missile is fired, it takes 1 second before a new missile is ready for launch. This forces the player to think more carefully before going about streaming missiles out.
66
SC Missile Command is the only game with an actual scoring system at this point. A player’s score from a fired missile is determined by three factors. (1) The precision of the shot (closer is better), (2) the flight distance of the missile (longer is better), and (3) the number of simultaneously destroyed enemy missiles (more is better).
5.2.5. Dynamic Difficulty The most mind bubbling challenge of incorporating timed comments in games has probably been the challenge of keeping scoreboards, as these are closely tied to the difficulty of a game. Hence, if the amount of comments of a track changes, the difficulty of the game changes as well, which makes existing high scores less reliable and also inaccurate in the “new” (or changed) version of the game. The two obvious ways of handling changes to the state of a track would be to either change the existing high score or to change the scores given in the future. However, these methods are problematic, as we cannot really compare the experience of playing a level with 10 comments to the experience of playing a level with 100 comments. It would be somewhat superficial to consider the later to be 10 times more difficult. The better solution instead seemed to be the creation of different “branches” for each level (track). In this way, new sub-levels can be created for each new comment that is posted (and potentially for each possible combination of comments). However, a track with 100 comments would thus have 100 branches (difficulties) each with their own high scores, which seems like an unnecessary amount of difficulties. For this reason, difficulties have been established using ranges of comment counts as shown in Table 5-7. Findings from the data mining and analysis chapter indicated that SoundCloud comments tend to be distributed somewhat evenly across tracks. This suggests that there is a good chance that the comments will not cluster up even when it is only the comments from a given time period that are selected. For this reason, difficulties have been established using ranges of comments from a specific time period (dates). A very easy track consists of the first 25 comments that have been added to a given track. A branch is thus a range between two dates relative to the tracks upload date. Stable version (Count)
Very Easy Comment s >= 25
Easy Comment s >= 50
Normal Comment s >= 100
Hard Comment s >= 150
Very Hard Comment s >= 250
Insane Comment s >= 500
Table 5-7: The amount of comments required for a track to have stable versions
5.2.6. Front-end Design The Web site surrounding the games is simultaneously a game menu, a music search engine. Like the games, the site is an unfinished prototype where not all its functionalities actually work. The site allows active discovery for music on SoundCloud by typing in an artist name in the search field. To this extend, the search functionality is inherited from SoundCloud. Any track on SoundCloud can be retrieved by actively searching. The site also allows passive discovery (recommendation) to some extend. As seen in Figure 5.14, tracks that have previously been played are displayed in a list with the most hot (popular) tracks appearing at the top of the list. This happens automatically when a user visits the site. It is also possible to view the latest tracks by clicking the tab of the same name. In this case, the most recently played games (tracks) are displayed instead. Users can login to the site using a feature of the SoundCloud API called “connect with SoundCloud”. By using this connection flow, existing SoundCloud users can
67
easily use the games without having to create a new profile on the site. It is not necessary to have an account to play the games but it is necessary in order to compete for high scores and save stats.
Figure 5.14: Game menu (Web site) for the Timed Comment Games
5.2.7. Backend Design The backend design is an important part of the Timed Comment Games. It is â&#x20AC;&#x153;inâ&#x20AC;? the backend that comments are analyzed and from where the play statistics and high scores of tracks are retrieved. The design of the backend is however not considered essential for discussing the games and it has therefore been moved to Appendix D
5.2.8. Pseudo Level Editors An advantage of using comments on SoundCloud is that the process of creating game levels becomes automated and an integrated part of SoundCloud. To this end, the SoundCloud Player becomes a sort of pseudo level editor. By pseudo, I hereby mean that the player is in fact not a level editor but it can nevertheless be viewed and used as such. Figure 5.15 shows an example of commenting on a SoundCloud track with the intention of creating a level for SC Missile Command. The comments are added at strategic rhythmic points in sync with the music, which entails a synergy between the music, the comments, and the game that is unlikely to occur otherwise.
Figure 5.15: Pseudo level editor â&#x20AC;&#x201C; Example 1
Players can misuse this link between SoundCloud and the games. For instance, players can make tracks difficult to play as games by posting a huge amount of comments, which may be a nuisance to the owner and the listeners of the track. It would thus seem logical to discourage this type of behavior. A simple way of doing this would be to limit the amount of comments used in the games to one per user (perhaps with the exception of the uploader of the track). Conversely, this can be seen as an unnecessary constraint of the data and players might also simply register multiple accounts at SoundCloud. In the above example, the track has been
68
commented by the uploader (me) and not by anyone else. This can be seen as a strength of the pseudo level editor. Such behavior might thus be encouraged by allowing SoundCloud users to provide dedicated game versions of their tracks by including a phrase such as “Timed Comment Games” in the title as seen in Figure 5.16.
Figure 5.16: Pseudo level editor – Example 2
5.2.9. Discussion & Evaluation In the previous section, it was described how and to some extend why data has been converted into game content. I will now turn my attention to the meaningfulness of these conversions. When this meaningfulness has been discussed, I will proceed to frame the games as both Open Culture Games and Contextual Music Games. Data Conversions Recall that a dynamic data game is here taken to mean any game that incorporates external dynamic data, which are used to generate game content and/or influence its constitutive game rules. Timed Comment Games incorporate timed comments, which are dynamic data relatively to a SoundCloud track. As previously described, these data are converted into objects in the games as well and have further be used to influence the games constitutive rules. Hence, Timed Comment Games are clearly Dynamic Data Games. However, the games are not dynamic in “real-time”, as it is the case with the game Tweet Land that was presented in section 2.5.1. When a level has been initiated, all data has been accumulated by the system, whereas Tweet Land incorporates tweets dynamically as they are posted on Twitter. But the real-time dynamic gameplay in Tweet Land causes some problems for the game, as the dynamic nature of tweets (people tweeting) makes the gameplay (including the difficulty) inconsistent. It seems to be the data that control the game and not the game designers. This may result in players having an ambivalent relationship towards the game, which also goes to show from the mixed reviews the game has received [81]. It thus seem to be troublesome to give the dynamic data free rein, as oppose to constraining the data to some degree. In all three of the Timed Comment Games the data has been somewhat unnoticeably and discretely reproduced in the games. By this I mean, that the game objects do not appear as data attributes and do not draw attention to their origins. Hence, the player is not reminded that the game is based on external data. This is somewhat opposite of a game such as TaxiCity (section 2.5.1), in which the map in the game is not a graphical abstraction of Bing Map; the maps are transferred to and retained in the game. The same might be said for The Wiki Game in which entire Wikipedia pages are replicated in the game. These games arguably draw more attention to the external dynamic data than the Timed Comment Games. This I not to say that the one thing is better than the other but merely to point out that games might be designed to reflect data in various ways. The three Timed Comment Games can further be said to convey the data in very different ways. In relation to the comment-bodies, the asteroids in SC Asteroids arguably represent the content of timed comments more clearly than the missiles in SC Missile Command. Missiles only represent that there is a comment whereas
69
asteroids represent the amount of characters, words, and sometimes also the type of words (POS-tags). We might thus say that asteroids in the games developed here, convey the content of comments in a more discernable way than missiles. In SC Breakout the comments are all converted into bricks that are similar in shape and size but different in color. In relation to the shape, the content of the timed comment is less discernable than that of an asteroid. But in terms color, the bricks convey a simplification of an opinion contained within the comments through means of sentiment analysis. Hence, with knowledge of how the colors are determined, the bricks represent the meaning of timed comments. I do not consider this conversion more meaningful than that of timed comments into asteroids; it is only different. Hence, an asteroid represent a formal description of a comment attributes whereas a brick conveys the meaning of the comment. As opposed to the conversion of comment-bodies, the connection between the timestamp of a comment and an enemy object seems to be more perceptible in SC Missile Command than in SC Asteroids. Missiles all appear from the top of the screen at the exact point of timed comment whereas asteroids are created outside of the visible game canvas and then set in motion. The emergence (appearance) of objects is more important in the original game of Missile Command than it is in Asteroids, as the player needs to react as fast as possible to the entering of a new missile. In Asteroids, the player might potentially dodge an asteroid for an eternity whereas a missile will hit the ground within a fixed limit of time. Hence, timing can be seen as a more important aspect of Missile Command resulting in timestamps having a great importance in SC Missile Command. SC Breakout has an entirely different relationship to timestamp. In this case, there would be no major difference between using timed comments and regular comments. The timestamp is not really used in a natural way when it is just for ordering the bricks at the beginning of each level. For this reason, I consider the utilization of the timestamp to be the least meaningful in SC Breakout. Out of the three games, SC Missile Command seems to be the most playable when the comment count for a track is sizeable. This is partly because several enemy missiles can be destroyed simultaneously with one player missile. Hence, even if there were to be 100 comments at the same time in a track it would still be possible to survive. But it is also in part because the missiles visually are very simple. They are just a thin 1 pixel wide line and it thus take a lot of them to fill up the screen. In both SC Breakout and SC Asteroids, the screen quickly becomes crowded visually, which makes the experience of playing the games somewhat more staggering. As a Dynamic Data Games, SC Missile Command might thus be perceived as the most flexible of the three. The Influence of Timed Comments The original games have been changed in many ways. Some of these changes have been intentional whereas others have been necessary or unavoidable. For instance, the implementation of a dynamic difficulty scheme was necessary whereas the inclusion power-ups was an intentional change influenced by data mining and analysis results. We might say that Timed Comment Games change the way in which we think about elegant rules. In [38], elegant rules are said to â&#x20AC;&#x2DC;allow the player to focus on the experience of play rather than on the logic of the rulesâ&#x20AC;&#x2122; [38 p139]. This is somehow different for Timed Comment Games and also Dynamic Data Games at large. It is instead an innate part of playing the games to consider how the game content has come to be and it is probably one of the main things that can make the games interesting from the playerâ&#x20AC;&#x2122;s point of view. This is becomes particularly clear when considering the difference between the original games and the emulated games. If
70
the player is not aware of the way in which the content is created the Timed Comment Games are not that much different from the originals. Hence, part of what makes the games interesting is imagining where the content comes from. One of the most significant differences between the emulated games and the original games is the way in which they provide highly varied gameplay. That is, the way in which timed comments changes their otherwise repetitive play [38 p323]. Early digital games are commonly criticized for being too repetitive to play. As with the customized soundtracks, there are endless variations in the Timed Comment Games, which might be considered an improvement to the original games. Conversely, the incorporation of timed comments can lead to inconsistent play when the data is not constrained, as it was discussed above in relation to the comment counts. The game music is also extremely varied. As mentioned, the emulations use tracks from SoundCloud for background music in the levels. This is a very significant change, as the original games do not actually have soundtracks at all and also because the music provides a sort of connection between the play and the game content. The music is a reminder of where the content comes and the duration of a level is tied to the duration of its associated track. The music is further imperative, as it can give playersâ&#x20AC;&#x2122; a motivation for playing games beyond that of the original gameplay. Hence, playersâ&#x20AC;&#x2122; may be motivated by a gameplay and a music information need simultaneously. Even though there are countless variations of the game levels, the core game mechanics remains the same. There are no explicit changes to the game rules because of the dynamic data in the Timed Comment Games. It would however be interesting to attempt the incorporation of data to change the more foundational rules of the games in the future.
5.2.10. Open Culture Games With knowledge about how the game objects in Timed Comments Games are created, the objects become more than just an abstraction of real-world objects. An asteroid in SC Asteroids is not merely a representation of a physical asteroid but also a representation of a timed comment from SoundCloud. In this way, they reflect their cultural context. But not only do the asteroids reflect cultural context, they are also highly dependent on it. Without SoundCloud and timed comments, SC Asteroids is just a space shuttle flying around in empty space, SC Breakout is just like playing a ball up against a wall, and SC Missile Command is just three missile launch bases with nothing to defend against. In these scenarios, there is no longer any conflict, defined by rules, that results in a quantifiable outcome, which contradicts our definition of what constitutes a game. Hence, if there is no cultural context, there are no games. It would seem, as if Dynamic Data Games (and especially those who incorporated social data) are more dependent on their cultural context than regular games. Timed Comment Games can thus be said to allow and encourage transformative play [38 p305] in a bidirectional sense. The Timed Comment Games have the ability to transform the way in which people post comments, which would inevitably transform the game itself. A player might for instance figure out that the adjectives in the comments are used for motivational words in the games and choose to post swearwords on a track thus influencing the game experience of others. In this case, the playerâ&#x20AC;&#x2122;s actions have been transformed in the course of play (or influenced by the game) leading to the game itself being transformed. We might thus frame Timed Comment Games as open culture systems [38 p538] with a potential of fostering truly emergent gameplay. This can also be said for games that depend strongly on player-generated game content such as Spore34 or The Sims35. But in these games,
34
http://www.spore.com
71
player-generated content is an intrinsic part of the games making them seem more closed in comparison. The characteristics of Timed Comment Games, instead seems more similar to those of player-created MODS36, which are not always intended by the game designers. MODS in turn require that the persons’ who create them have some coding and/or graphic -design skills. When a person knowingly posts comments to affect the Timed Comment Games, it can be seen as a MOD to some extend but without the usual requirements for skills and time. In the light of this elucidation, we may frame Timed Comment Games as culturally emergent and open systems. The games are culturally emergent systems, as ‘the complex play produced by the game occurs not only on a formal or experiential level, but on a cultural level as well’ [38 p538]. They might even be considered emergent to a whole new extent by virtually embedding a cultural context. The games are also very much open culture systems since the exchange of meaning between the games and their surrounding context has the potential of changing and transforming both the games and their environments. There is however a problem with the definition of Games as Open Culture found in Rules of Play: Games as Open Culture imply a game design model in which the structure of a game offers players explicit creative agency [38 p539]. Players are not explicitly encouraged to modify the games. On the contrary, they are implicitly (and unknowingly) made level designers. However, I still find Open Culture to be an suitable way of framing Dynamic Data Games in general but the structure of the games does in fact offer the players’ implicit creative agency as oppose to explicit creative control. We have now framed Timed Comment Games as Games as Open Culture and we have furthermore established that they are extremely dependent on cultural context. But this merely one intriguing way of framing the games. They might also be framed as Contextual Music Games. Contextual Music Games I consider Timed Comment Games to be a subtype of Digital Music Games, which might rightfully be dubbed Contextual Music Games. As it was emphasized in the above paragraph, cultural context has an immense significance to the games, which makes the notion of Contextual Music Games seem appropriate. Moreover, the games incorporate what is often called contextual descriptors about music. Hence, there can be said to be a suitable association embedded in the term. Recall the model of Digital Music Games that was presented in section 2.4. From this model, we might extract two key factors that can help us frame games as Digital Music Games: (1) The degree to which the music is interactive and (2), the way in which the game content is created. In relation to the first factor, games base on timed comments might not be classified as music games. The games do not embed any direct interactions with the music, only with the contextual descriptors about the music. Also, the audio can arguably be muted without interfering with the core gameplay and without significantly affecting the player’s ability to play the game. On the other hand, this might also be considered true in other types of music games to various degrees. The music in a game such as Guitar Hero might similarly be muted without highly affecting the
35 36
http://www.thesims.com Modifications
72
player’s ability to play the game. The experience of playing Guitar Hero would arguably be very different without audio but the game would not be unplayable. However, the audio-side of Guitar Hero is important for the player’s abilities to play the game to a much higher degree than in the timed comment games developed in this research project, as the musically gifted player can use the rhythm in the music to time his/her actions more precisely in the game. In Timed Comment Games the music serves strictly as background music and to some extend, as a motivational and inspirational factor. The player cannot interact with, or be influenced by -the music any differently than in “regular” games. Therefore, the formal importance of listening to the music is not high in the Timed Comment Games and the listening experience is not contributing to make the games music games. The timed comments do however serve as the connection (link) between the audio and the player; a connection that might become indistinct if the audio was to be removed. What is noticeably, however, is that this connection cannot be considered an invaluable and essential part of the gaming experience. For these reasons, timed comment games do not contain any aspects of Interactive Music Games nor are they similar to Music Creation Games. In relation to the second factor, the game content is implicitly but still directly inflicted by the music. For this reason, the Timed Comment Games are clearly a Music Adaptive Games. As recently pointed out, there would not be any timed comments without the music and thus no game content. Hence, the games are dependent on the comments, which in return are dependent on the music. We might also say that the games are dependent on the music listeners’ subjective interpretations and characterizations of an audio signal, which corresponds to our definition of contextual descriptors about music. Hence, both the music listeners and the music itself somehow inflict the game content. Even though the games are Music Adaptive Games, the music might potentially be made adaptive. For instance, music filters might be added to the music or the music could be slowed down and speeded up in order to emphasize game states and events. However, the games developed in this thesis do not contain adaptive music in this sense. Music Discovery & Exploration The Timed Comment Games first and foremost encourage the exploration of newly discovered or familiar music and its associated community metadata and they can potentially provide coherent gaming and music experience. While people may at times listen to music passively (as background music) or only listen to part of a track, the games encourage immersion in the music to a much higher degree. For instance, the mere attempt to obtain a decent score requires listening to as much as the track as possible. Failing (or dying) in the games results in disruption of the music. Making it all the way through a level is thus tied to that of listening all the way through the track. There is however also an intrinsic discovery aspect in the games. As mentioned, the input data is used to determine the difficulty of a track. This means, that a track with a single comment will be extremely easy whereas tracks with 1000 comments will be nearly impossible. At first glance, this may seem like an issue but it might also encourage discovery. Even though discovery is not central to the games, players will have to navigate the space of SoundCloud tracks to locate playable tracks. Music discovery can here be motivated by the search for a specific gameplay experience. Moreover, the displayed search results will be influenced by play statistics and thus not necessarily be determined by the tracks by an artist that are the otherwise popular tracks.
73
5.3. Instamash Try it at http://instamash.herokuapp.com Instamash is an experimental cloud based sampler and sequencer. As the name suggests, Instamash lets users create mashups instantly and directly in the cloud by streaming tracks from SoundCloud. A new conceptual aspect of Instamash is, that the mashups can only be saved as references to the original tracks37. Hence, the music never really leaves the cloud. Timed comments serve an important role in the sampling process of Instamash, as they are used to automatically select samples of a track (parts of a track). These samples can then either be modified by the user or used immediately to construct new pieces of music (mashups). The application uses the Web Audio API for scheduling and manipulating sounds (see Appendix A).
Figure 5.17: Instamash (overview)
Currently, the only way to use the Instamash is by setting two parameters: a BPM38, and a musical key (see Figure 5.18). These parameters are then used to locate two tracks on SoundCloud that are similar in key and BPM. This has been chosen to make it easy to get started since findings from the process of data mining indicated that there were very few BPM indexed tracks on SoundCloud (3.6%). The BPM is imperative to Instamash, as it (in conjunction with the timestamp of a timed comment) is used to determine the exact points in the tracks from where a music sample is taken. Hence, without a BPM, the samples would invariably be “off” (out of sync).
Figure 5.18: Instamash - search fields and buttons
After they have been gathered, the tracks comments are analyzed using a combination of the Naïve Bayes Classifier described in section 4.3.2 and the TF-IDF ranking system described in section 4.4. Through this analysis, the eight most relevant comments for each of the two tracks, which are then made available to the user. That is, samples of the track are made at the timestamps of the selected comments.
37 38
This feature has not been implemented at this point in time Beats Per Minute
74
5.3.1. Design Instamash consist of three different parts as seen in Figure 5.20: a library (1), a sequencer (2), and a sample editor (3). The library and the sequencer share the same design in which samples are displayed as rounded square blocks containing the text of the timed comments that was the reason for their selection. Both of these are also equipped with a “+” symbol for creating new tabs, which can be used to hold additional tracks (for the library) or additional patterns (for the sequencer). The colors of the blocks are inflicted by the text in the comment. This is done by splitting a comment into words and then selecting the first letter in each word as shown in Figure 5.19. Each letter (from the English alphabet) has in advance been assigned a number between 0 and 9 and the starting HEX color is initialized at #999999. For this reason, two samples rarely have the same colors.
Figure 5.19: Words to colors conversion
The initial length of the samples is also based on the word count. Comments with more words are given wider squares in order to display all their text and also because it seems plausible that longer comments will describe and address longer parts of a track more frequently. This is however only a temporary solution.
Figure 5.20: Instamash (complete overview of design and functionality)
The sample editor has a different design and is somewhat sketchy at this point in time. From the top, it consists of a drop-down menu (4) from which a filter can be chosen and a slider (5) that can be used to adjust the cutoff frequency of the filter. Next are 10 “genre buttons” (6), which contain the 10 most frequent words for the genre of the track that the sample belongs to. These have been found using TF-IDF ranking of all timed comments of a track. Then there are the search field (7) and its accompanying search button (8) and display area (9). The search field and the search button are used to search for occurrences of a word in the timed comments in the
75
associated track. If there are any matches, these are then displayed in the display area. Finally, there are the waveform of the entire track (10) and a close-up waveform displaying the specific sample (11).
5.3.2. Interaction The possible interactions of Instamash are not directly communicated to the user in the current prototype. It thus requires knowledge of the keyboard shortcuts to use the application to its full extend. In other words, the application is not very user friendly. Table 5-8 shows a complete list of all the actions that can be performed and the results (outcomes) of those actions. Due to the extensiveness of the list, I will not be giving a walkthrough of all of the possible actions. Instead some of the significant ones will be pointed out in the upcoming paragraphs. Action Mouse click on a sample Mouse click on + symbol Mouse click on waveform in sample editor Mouse click on drop-down in sample editor Mouse click on search field Mouse click on search button Mouse click on “genre buttons” Mouse drag of sample Mouse drag of slider knob in sample editor Keyboard Left Keyboard Right Keyboard Alt + Left Keyboard Right + Alt Keyboard Left + Shift Keyboard Right + Shift Keyboard Shift + Up Keyboard Shift + Down Keyboard S + Up Keyboard S + Down Keyboard Space Keyboard Return Keyboard Cmd+Backspace
Outcome A sample is selected A new (empty) pattern where samples can be dragged to is created Sample selection is moved to the point of the click (sample duration remains the same) Enables the selection of a filter effect (lowpass, highpass, bandpass, lowshelf, highpass, peaking, notch or allpass) Allows the entering of a search query (search for comments) Performs a search for comments that contain the word in the search field (in the “parent” track of the selected sample) Performs a search for comments that contain the word of the button Copies a sample from the library to a pattern (only works when dragging samples from the library to sequencer) The frequency of the selected filter is changed The sample is shortened (the length is divided by two) The sample is extended (the length is multiplied by two) Push (move) a sample left (by sample length) Push a sample right (by sample length) Push all samples of the source track left by 0.05 seconds Push all samples of the source track right by 0.05 seconds Increase volume Decrease volume Increase sidechain amount Decrease sidechain amount Play the samples in the selected pattern Play the selected sample Remove sample (only applies to the sequencer)
Table 5-8: Complete list actions that can be performed and their outcomes
The Library & The Sequencer Modern virtual samplers and sequencers are often packed with functionality, which gives them a steep learning curve and makes them suitable for use mostly by skilled music producers. One example is the VSampler 3.5 as shown in Figure 5.21. In contrast, Instamash is designed for (and intended to be used by) the average music consumer and listener (though particularly people with an interest in SoundCloud). For this reason, the amount of music instruments, knobs and sound technical terms have been limited to make the application more accessible. This especially applies to the library and the sequencer. As with the visual design, the library and the sequencer have the same basic interface design and functionality whereas the sample editor has its own distinct interface and
76
functionality. In terms of interactivity, they are designed to be very straightforward and intuitive to use. The samples are envisioned as a buildings blocks that can be combined in various ways without any knowledge of music composition. In the most basic way of using Instamash, the blocks can simply be dragged to the sequencer from the library in a desired order without any alteration of the samples. If more detailed control is desired, the user most transition to keyboard interactions (which will not be covered here) or by using the sample editor.
Figure 5.21: VSampler 3.5 from MAZ Sound Tools
The Sample Editor The sample editor provides some more complex functionality but it is however still indented for use by the average music explorer. As the name suggests, the sample editor provides means of altering a selected sample whereto a novel feature is an ability to change the sample position through search. To this end, the sample position can be changed either by typing in a search word or by clicking one of the â&#x20AC;&#x153;genre buttonsâ&#x20AC;? as mentioned in the above design paragraph. When a search is performed, Instamash instantly changes the sample to the location of a timed comment that contains an instance of the query and automatically plays back the new selection. A query further can be performed for the same term multiple times whereby the system continually locates the next comment that match the query. Another way of changing the sample position is by clicking the waveform of the track. The waveform is gathered from SoundCloud after which a HTML5 Canvas element is layered on top of the image in order to draw Figure 5.22: Sample Editor the sample selection and allow interactivity. As seen in Figure 5.22 the currently selected sample is highlighted with a blue line (square) and when hovering the waveform, a red line follows the mouse cursor to suggest that the waveform is interactive (clickable). Clicking results in a change of the sample position to the clicked position of the track and also the immediate playback of the new sample selection.
77
Figure 5.22: Interactive waveform
5.3.3. Related Work Digital audio workstations (often called DAW’s) have been common in music production for decades. Some popular examples are Cubase, Logic, Reason, and Ableton Live, which all support sampler plug-ins (such as Battery and Kontakt). Lately, cloud based music production environments have started to emerge, which seems like a natural evolution due to the emergence of cloud computing. A couple of prominent examples are Soundation39, Audiotool40, and Studio One41. In this context, Studio One is a particularly interesting example, as it in 2010 announced integration with SoundCloud that allows users to upload their audio directly to SoundCloud without having to “bounce down” audio first [82]. A more simple but very relevant music tool is AudioJedit42. AudioJedit is an audio editor that allows users to slice and rearrange sounds from SoundCloud. The new arrangements can then be uploaded to SoundCloud. AudioJedit has not directly inspired Instamash (as a concept), but most of the back-end functionality has been adopted from AudioJedit in order to bypass some technical difficulties with CORS43 when using tracks from SoundCloud with the Web Audio API. As mentioned, Instamash enable people to create mashups. A mashup combines information from two or more sources to create something new and without changing their original source of information [83]. The term mashup is often used in relation to software applications that combine data from two or more sources but it is a term that has its origin in music [84]. Hence, in music, a mashup is the combination of to or more tracks to create a “new” piece of music. The kinds of mashups that can be made with Instamash are subsets of remixes, which are sometimes referred to as regressive mashups [85]. It is however also possible two make a somewhat more classic extended remix by using only one of tracks supplied by Instamash. It should further be mentioned, that Instamash does not currently provide the option of overlaying different audio tracks, which is a common approach to “mashupping”. This feature might however be implemented in the future.
5.3.4. Discussion & Evaluation Instamash as a System As described, it is possible to search for comments in the loaded tracks by clicking on the waveform, “pushing” sample selections left and right (alt key + left or right), or by typing a search word into the search field in the sample editor. These are three complementary ways of exploring the music tracks and locating samples in Instamash. When performing a search, the exploration of a track is enabled by the timed comments that are provided by the listeners. To use the system in this way requires a describable information need by the user such as “chorus”, “piano”,
39 40 41 42 43
http://soundation.com http://audiotool.com http://studioone.presonus.com http://audiojedit.herokuapp.com http://www.w3.org/TR/cors
78
“happy” and the like. Such a need can be present among users in some situations but they may also lack the vocabulary for describing a certain attribute of a music sample that is desired. This is what has earlier been described as the vocabulary problem. In order to address and avert vocabulary problems, buttons containing the “top terms” for the estimated genre of the selected track have been added to the application, as previously described. The advantage of this is two-fold. Firstly, the terms are inflicted by (extracted from) the data (the timed comments) and can be self-contained within the system. Hence, there is no need for the designer of the system (me) to guess the words that people might use in searches. The system is not self-contained currently but it would be a logical extension to let it update the top terms of the genres periodically. Secondly, the terms are probabilistic weighted. The system suggests to the user, terms that are likely to occur in the timed comments of the estimated genre. Hence, if the user has no idea about what he/she is looking for, these terms might be sufficient as they are the ones that are normally used by people who listen to the genre of a given track. The collection of buttons might thus be seen as a small social recommendation system that is embedded into each track, as a way of accommodating the users. As mentioned, Instamash only allows mashing up tracks that have been labeled with a BPM and a key on SoundCloud. The application is dependent on consistent BPM’s to make editing and sequencing easy for the user, which are often found in electronic music or music that’s recorded to a click (metronome). At the same time, the application is dependent on commented tracks to be able to select relevant and interesting samples. Hence, Instamash will probably be most useful for (and most likely to retrieve) electronic music tracks that contain a lot of timed comments. This can be said to fit well with the findings from the data analysis section. The findings indicated that electronic music genres (e.g. dubstep or hiphop) on SoundCloud tend to receive more comments than “regular” genres (e.g. pop or rock). Hence, this suggests that electronic tracks (with BPM’s) are likely to be commented tracks. Sampling & Copyright When constructing an application that allows users to create new pieces from existing pieces of music, it seems evident to discuss some emerging issues of ownership. This is both in terms of who owns the music and who is the creator of the music. In order to form a basis for such a discussion, I find it appropriate to give a very brief overview of the relation between sampling and music copyright. There are basically two types of music copyrights: (1) music works and (2) sound recordings [86 p2]. Simply put, sampling can be viewed as the act of taking part of an audio recording and using (reusing) it in another piece of music. Hence, sampling can be a copyright infringement of an audio recording. An important element of copyright infringement is unlawful appropriation. In order to prove unlawful appropriation a plaintiff has to demonstrate that the use of his/her work by the defendant is substantial and material. If this use is not found to be substantial and material, the use is de minimis meaning that it is to small to be considered material. However, the most common way of avoiding copyright infringement is the fair use exception. The use of original work for purposes such as criticism, comment, research and the like, can often be considered fair use. The purpose of this fairly perfunctory review of music copyright is simply to designate and emphasize the potential advantages of cloud-based sampling. Cloud-Based Sampling Instamash explores an idea that I like to call sampling by reference. That is, the “samples” are sort of references to online audio sources as opposed to copies of audio
79
sources. It thus seems appropriate to refer to this activity as cloud-based sampling. The audio source is however buffered on the users computer but this is no different from other cloud-based solutions. What makes solutions cloud-based, is more that the data is not stored on the users computer but in the cloud. For this reason, Instamash is not really different from other cloud based DAW’s in relation to the editing. The difference is conceptual, in that the results (compositions) are saved as references to the original audio source(s) as opposed to being saved as an actual audio file44. They are furthermore saved in the cloud and not on the users own computer. A key advantage of the concept is the required space for saving a mashup. The only data necessary to save is a textual representation of references to timestamps in the used audio tracks, which can be reduced to a couple of bytes. This idea would seem highly impractical a couple of years back as music consumers only listened to music on their local hard drive, but the shift towards online streaming services could make the concept of Instamash a potentially useful application for creating and listening to music. Another advantage is that the structure of one creation can easily be translated into another creation. The references () for a mashup can be saved as a sort of template that can be placed on top of various tracks. Then there is the question of copyright. The mashups that can be created consists exclusively of copyrighted material but when the sampling is done “by reference”, there can hardly be said to be a copyright infringement. The data is, as mentioned, never stored on the users computer. As described in the paragraph about related work, music production is shifting towards cloud-based solutions, which is also the case of music listening in general (e.g. SoundCloud and Spotify). It thus seems plausible that mashup environments such as Instamash could become a commonality in future music production environments, as music listenining takes place in the cloud anyway. Hence, it might not be necessary and desireable for “mashuppers” to upload their music mashups to Youtube, SoundCloud or the like. To this extend, it is also becomes interesting to consider ownership in general. When a mashup is made using Instamash, then who is actually the composer? Instamash can be said to enable sampling with the help from SoundCloud users (timed comments). Hence, it is sort of a semi-automated sampler that allows people to compose music by using samples that have implicitly been selected by others. We thus have three authors: (1) the composer of the original piece, (2) the persons who have commented the piece, and (3), the “mashupper”. Hence, there are in a sense three authors where the commenters become links between the mashupper and the original artist resulting in a novel type of music mashup. Discovery & Exploration Instamash has an aspect of discovery but emphasizes and is designed to demonstrate ways of exploring tracks with the help from timed comments. However, the way in which tracks are selected (by specifying a BPM and a key) can be seen as the discovery aspect. It is a serendipitous way of discovering music in which the user has no real way of knowing whether the returned music is classical music, dubstep, hiphop, rock, or some entirely different genre. When the tracks have been loaded, Instamash transitions into an exploratory application. Instamash lets the user playand interact with the timed comments. In other words, the application enables and encourages the user to explore the music tracks by navigating the comments. This exploration process is further situated in a creative music creation environment thus inviting the user to immerse oneself in the exploration.
44
This feature is not currently implemented
80
CHAPTER 6
Discussion & Evaluation
6.
In the previous chapter, insight was given to the experimental design and development of applications and games (digital artefacts) that incorporate timed comments. These experiments have each been discussed and evaluated individually but at the close, it seems appropriate to briefly discuss and evaluate their differences and commonalities.
6.1. Music Discovery & Exploration As mentioned in the beginning of the last chapter, each of the three digital artefacts (experiments) can be said to emphasize and focus on -discovery and exploration differently. In essence, all of the artefacts have an aspect of discovery, as it is possible to discover new music using them. The distinction can instead be made between their designed purpose and their novelty, as seen in Table 6-1. Timestamp Player Timed Comment Games Instamash
Discovery X (Recommendation & Search) X (Search for artist & tracks)
Exploration
X (Search for BPM & key)
X (explicitly)
X (implicitly)
Table 6-1: Discovery & exploration aspects of the digital artefacts (X = Novel and essential, X = Not novel or essential)
Both Instamash and the Timed Comment Games have an emphasis on exploration but not in the same way. The Timed Comment Games might be said to enable a somewhat implicit and slightly passive form of exploration, as the games do not explicitly express the information contained within the timed comments. Neither is the intention that the player should focus exclusively on the origin of the game content, as this might conflict with his/her ability to experience meaningful play. Put differently, the player has no explicit, real-time control of the data that are incorporated in the games. Conversely, a user of Instamash has explicit and real-time control of the samples and is also encouraged to utilize this control through explicit search. In this case, there is also a more clear connection between the interaction of the user and the displayed output in terms of an auditory response (a playback of the timestamp of the comment) and a visual response (a display of the associated comment). Timestamp Player is not designed for exploration of tracks and neither does it emphasize or encourage exploration. It is designed to locate tracks that are new and interesting to a given user and thus for discovery. In their analysis of the video streaming site Nico Nico Douga, Hamasaki, M. et al. notes that the overlay comments of the videos can give people a sense of sharing the viewing experience virtually [87 p2]. The same might thus be said for the listening experience on SoundCloud. This experience has further been inherited to various degrees by the three experiments presented in this research project and might potentially contribute to making the process of discovering and exploring music more enjoyable. A shared listening experience seems to be particularly noticeably and successful in the Timestamp Player. By using a handwritten font and placing the comment body in the center of the application, the experience arguably becomes even more intense and intimate than it is on SoundCloud. It seems somewhat more like a personal recommendation that requires attention, than a random comment among tens or hundreds on a given track.
81
6.2. Complying User Needs In a recent publication it is pointed out, that ‘it seems relevant to explore whether relationships characteristics could be exploited to improve social filtering algorithms in music recommender systems that include social networking features’ [88 p1]. It thus seems fair to presume that there could be an interest in online music discovery systems that incorporate timed comments for search and recommendation. Since all the artefacts are designed for music exploration and discovery, it further seems relevant to discuss how they actually comply with peoples music information needs. In the MIR community, there have been various studies aimed at establishing an understanding of such needs. In relation to recommendation, studies have shown that informal channels such as friends and acquaintances are the most common triggering sources for instigating music information search [67], [89]. The purpose for this is found to be two-fold: (1) acquaintances know the taste of the music seeker and (2) the music seeker knows the taste of an acquaintance. The Timestamp Player utilizes the users familiarization with a followings music taste. When Timestamp Player is used for recommendation, this might optimally be motivated by an understanding of another users musical preferences (taste) and perhaps even with an expectation to the level of sophistication in that users way of commenting. In other words, if a user is known or expected -to have a compatible taste in music and his/her comments are musically relevant, timestamp related, or otherwise interesting, it might be enlightening to listen to the parts of the music where that following has posted comments. It has further been indicate that artists tend to form connections with artist of the same genre in online social networks [90], which makes it plausible that an artist on SoundCloud is likely to follow and comment on artists who operate in the same genre as the artist him/her-self. Hence, knowing the genre of a following’s own music may often be equivalent to knowing that following’s music taste and thus the only thing necessary to give an indication of the kind of music that following comments on. In relation to search and exploration, it seems a bit more difficult to evaluate whether or not the systems provide some needed information, as it is not really a common task to search for specific parts of an audio track. Hence, user needs on the subject has not previously been investigated. There are however some general findings that might be deemed relevant. For one, it has been indicated that the title of music works, the lyrics, and the artist information are among the most common things to search for online [67]. To search for lyrics and artist information can be seen as an exploratory activity, which indicates that people are in fact interested in exploring music metadata. To this extend, the study also show that people most often search for music for entertainment purposes. It thus seems somewhat likely that people will be open to enhanced entertainment through games and music creation.
82
CHAPTER 7
Conclusions & Future Work
7.
In this final section, I present core findings of this thesis and conclude that timed comments can most certainly be useful for various music discovery and exploration tasks. I then go on to define some future perspectives of the applications and games that have been conducted throughout this research project.
7.1. Conclusions As it was emphasized in the chapter entitled research design, the main objective of this research project has been to explore possible ways of utilizing timed comments. That is, the goal has been to question rather than to settle and confirm their usefulness. For this reason, most of the provided â&#x20AC;&#x153;answersâ&#x20AC;? in this section are not particularly settling. I will however attempt to provide some answers nonetheless and I will start by covering the limbs (sub questions) that was asked when this research project was initiated. How are timed comments different from regular comments and from other listenergenerated music metadata? The answer to this question is somewhat ambiguous. When looking at the formal attributes of timed comments, there are definitely disparities between them and regular comments. A noticeable difference is the length of the text in timed comments, which is much shorter than that of regular comments. The reason for this difference is however not so clear. There might be various reasons why people write short statements whereto an educated guess, which was provided earlier, might be that timed commenting somehow limits the amounts of words needed to express an opinion. A way in which timed comments differs from other listener-generated music metadata is that they hold somewhat hybrid properties. A timed comment can be said to consist of a contextual descriptor about music (the comment) as well as a content-based descriptor about music (the timestamp). This is a peculiar combination that somehow extends the comments information capacity. This observation thus leads us to our next question: What are timed comments capable of telling us about the music that they are related to and how can we use that information? When posted with an intention of describing a particular timestamp, timed comments are human interpretations of specific parts in audio tracks. For this reason, they have the ability to provide information about music that has earlier been obtainable only through machine analysis. The question is then rather if people actually use timed comments in this manner. This is not always clear and can sometimes be hard to determine without either listening a track or consulting the person who posted the comment. However, based on the supervised learning of a Naive Bayes Classifier involving the labeling of 2500 timed comments as having either timed or non-timed -content, it was indicated that approximately 13% of the comments were clear descriptions of a specific timestamp, as they would not make sense otherwise. Hence, wherever subjective, the labeling unambiguously indicated that the vast majority of timed comments are posted without using the full potential of timed commenting. As to the question of how we can use the information that timed comments provide about the music they are related to, there seems to be endless possibilities. The information might be utilized for music management, easy access, enjoyment, or similar Music Information Retrieval related tasks.
83
What separates timed comments from each other relatively in terms of genres, artists, users, tracks and the like? In general, there seems to be both formal and opinion related difference between timed comments when framed in various ways. For instance, the findings presented here show a difference in the percentage of timed comments between genres. It also shows that the words that are most frequently used tend to vary between genres, which becomes apparent when removing common words from the comments. To this extend, it has been argued that the words used in the genres are also the words that are normally associated with each of them (see section 4.4). Even though differences have been discovered in terms of genres, the above question has turned out to be less relevant than originally assumed. There has been plenty of work involved with investigating timed comments in general and thus despite their relations to artist, users, and tracks. Which attributes of timed comments can be used in the creation of engaging music exploration and discovery experiences online? There are many ways in which this question might be answered. A simple answer is that all attributes of timed comments can be used. It really depends on whether or not they are used meaningfully. The question further seems to be somewhat attached to the main question. That is, the way in which we can utilize timed comments is dependent on how we use them and what we use them for. I thus return to the main question: How can we utilize timed comments in order to provide new, enhanced or alternate ways of exploring and discovering music online? In this research project, it has been demonstrated how timed comments can be utilized in order to provide novel ways of exploring and discovering music online. This demonstration has further led to the construction of some rather unique digital artefacts. It has been shown how basic properties of timed comments can be utilized for creating Dynamic Data Games and it has been shown how analysis of the embedded meaning in timed comments can be used to provide alternative music recommendations and new ways of searching â&#x20AC;&#x153;insideâ&#x20AC;? of audio tracks. To this extend, I have provided solutions that can help avert information anxiety in various ways. An application such as the Timestamp Player makes it easy and efficient for people to assess whether or not further listening to a music piece is required and it also helps the user in limiting the available music. The application Instamash and the Timed Comment Games provides another solution to the problem, which is that of enjoyment. It has further been showed how the collective intelligence retained within timed comments might be utilized in order to provide ways of dealing with the vocabulary problem. As described, Instamash alleviates this problem by providing search suggestions based on the content of timed comments thus assisting the user in selecting appropriate search terms when exploring the content of music tracks.
7.2. Future Work It would be a logical next step to perform some sort of usability testing of the artefacts. The focus in this thesis has been confined to the practical investigation and utilization of timed comments. For this reason, very few people have actually played the games and tried out the applications. It would first of all seem relevant to explore whether or not people get the connection to SoundCloud and if they even find the artefacts entertaining or useful to some degree. Besides testing the
84
applications and games, there are also various interesting prospects that could be explored. Combining Descriptors As mentioned in 2.1.4, there is a consensus within the field of Music Information Retrieval that it can be beneficial to combine contextual and content-based descriptors. I consider this to be a very interesting prospect (and a next step) for applications that incorporate timed comments and it seems most likely that timed comments in combination with content-based descriptors might result in more viable and complete applications and games. Content-based music games might use timed comments to inflict non-gameplay related game content. Some examples of this are background graphics and “motivational words”, both of which have been demonstrated in the Timed Comment Games. This could allow a stable gameplay (based on the audio-signal) but with an ever-changing visual side (based on timed comments). Content-based descriptors might also be advantageously incorporated in the Timestamp Player. For instance, signal-analysis could be used to indicate whether or not a correlation between an expressed opinion and the actual music is present. For instance, instrument recognition could be applied where comments contain notions of an instrument in order to imply whether or not the commenter has commented with accuracy. One particular challenge that the experiments face is the long tail, which was described in section 2.2.1. All of the experiments favor tracks that are heavily commented. For instance, it is not easy to discover new or rare tracks (from the long tail) using Instamash or the Timestamp Player. These systems both filter information based on popularity. Instamash uses the hotness parameter on SoundCloud, when locating tracks in order to insure commented tracks and the same goes for the Timestamp Player when searching for tags. Conversely, there is little sense in retrieving tracks that do not contain any timed comments in these applications, as they are dependent on them. For this reason, it seems that applications like these may not be suitable for discovering brand new tracks. However, popular tracks on SoundCloud might be considered new in a larger scope making the problem somewhat less intrusive. Creative Music Marketing There are some interesting potentials of the Timed Comment Games and Instamash in relation to online music marketing. They both provide easy for the average musician or indie labelways to engage an online fan base. In the case of Instamash, musicians could be enabled to launch mashup (or remix) competitions to the convenience of both themselves and their fans. It might be considered convenient for musicians if they could escape the need for making actual remix/mashup parts. Instead, the creation of parts could be as easy as posting some comments on a track in the same way as custom levels might be created in the Timed Comment Games (see section 5.2.8). Hence, musicians could use timed comments to highlight the samples that should be provided in Instamash. Some musicians might also consider it convenient that they do not need to make their music freely available. As explained, the mashups are made in the cloud, and not supposed to be downloadable. Instamash might thus enable new and alternative possibilities of marketing music online for any artist that has music on SoundCloud. An artist might similarly use the Timed Comment Games for promoting new music. It seems plausible that music fans would enjoy competing to become the “number one fan” (e.g. the fan with the highest score for a given track). Similarly to Instamash, the Timed Comment Games limits the amount of work that an artist would need to
85
perform in order to offer some alternative experiences to his/her fans, as levels can arise automatically when people comment a track or otherwise be easily designed by the artist if he/she uses the audio player on SoundCloud as a pseudo level generator. Improvements The digital artefacts presented here are still in the early stages. They have only been developed to the point where they demonstrate the potentials of an idea. For this reason, there is plenty of room for improvement. For one, the Timed Comment Games are envisioned as multiplayer games, as music is considered to be a social thing as are games. People would arguably enjoy playing together. It would also seem evident to further develop the game menu in order to situate the play experience and establishing a social community. For instance, player profiles, chat functionality, and similar utilities might be necessary to explore the full potential of games based on timed comments. Instamash is fairly well defined as a concept but it still lacks some functional aspects in order to work properly. For one, it is not possible to save a mashup yet, which is a key cornerstone of the concept. It also seems imperative to provide some clear instructions on how to use Instamash. I consider Timestamp Player to be the most functional of the three artefacts but it is however also the simplest one of them. Still there are various ways in which it could be improved. For instance, it might be interesting and simple to provide automatic recommendations based all the followings of a user by selecting and mixing the newest comments from all of them. Both Instamash and the Timestamp Player would further benefit from a higher quality of classification and search algorithms.
86
Appendix A
8.
USED TECHNOLOGY
In this appendix, I briefly present the most important technologies that have been used to develop and deploy Web applications and games in this thesis. I will not be providing detailed elucidations of how the technologies work but simply document (and give an insight in) some practical and technical design and development aspects of this thesis. HTML5, CSS3 & JavaScript HTML5 (Hyper Text Markup Language) is the latest HTML standard and defines the fifth major revision of the core language of the World Wide Web45. HTML5 is not yet an official standard and is not yet fully supported by any browser46. Furthermore, browsers currently tend to treat html5 elements differently, which can be problematic when attempting to develop Web applications that perform and displays alike across all browsers. The applications and games developed in this thesis have only been intended for use in Google Chrome at this point and will not work properly in other browsers. Some of the HTML5 related elements and API’s that have been important in this thesis are the canvas element (for 2D drawing), the WebSocket API, and the audio element and the WebAudio API. Many of the capabilities that these provide have only been possible with the use of plug-ins such as Adobe Flash. Alongside HTML5 comes the CSS3 (Cascading Style Sheets) standard. CSS is used to control the style and layout of Web pages47. CSS3 provides easier control over the design of web pages and also some new possibilities of browser based animation that have previously only been possible using JavaScript. I have not been heavily concerned with CSS3 but have though experimented with some of its features at times. JavaScript is the scripting language of the Web and it is used to add functionality to Web pages48. A popular and commonly used JavaScript library is called JQuery49, which presents itself as “the write less, do more” JavaScript library. I have occasionally used JQuery but in general I have avoided it, as my ambition (as mentioned) has been to learn “actual” JavaScript. Some of my coding (e.g. menu’s) could thus have been more efficiently and easily coded if I had used JQuery instead of pure JavaScript. From a personal standpoint, I have been interested in exploring the capabilities of HTML5 and JavaScript for building web based applications and also simply, to develop skills in JavaScript. For this reason, I made a decision to strictly use JavaScript and JavaScript based platforms. Canvas The <canvas> element in HTML5 is used to draw graphics on a web page50. It might for instance be used to draw lines, boxes, circles, characters, or adding images.
45 46 47 48 49 50
http://dev.w3.org/html5/html-design-principles/ http://www.w3schools.com/html5/html5_intro.asp http://www.w3schools.com/css3/default.asp http://www.w3schools.com/js/default.asp http://jquery.com/ http://www.w3schools.com/html5/html5_canvas.asp
87
Shapes can furthermore be filled with colors, have transparencies, be scaled, and a lot of other things that are usually associated with graphic design software. It is also possible to erase and redraw all or a portion of the canvas, which makes it possible to create animations by drawing and redrawing elements in various positions. The <canvas> element is only a container for graphics. It is necessary to use JavaScript in order to draw the actual graphics51. The <canvas> element has been extensively used in the development of my applications and games. The most detailed use, however, has been in the drawing of actual game graphics. The WebAudio API The WebAudio API is an (currently) new API with little documentation. It is a highlevel JavaScript API for processing and synthesizing audio in web applications52. A key feature of the API is the ability to schedule sound playback. This has not previously been possible directly in a browser. With this possibility comes the possibility of creating music software such as sequencers and drum machines that require a high degree of rhythmic precision53. Other features include modular routing (enabling mixing and multible sends and submixes) and various possibilities of modulating sounds such as panning, filtering, and room effects (e.g. reverb)54. The WebAudio API has only been used in the development of InstaMash. It might also (favorable) have been used in the other applications but it was not until late in my process that I discovered a way of using it with audio from SoundCloud. InstaMash is however the only concept idea that is highly dependent on the WebAudio API and its tightly scheduled sound playback capabilities. Music Metrics I make use of the Music Metric’s Sentiment Analysis55 v1.0 beta in an initial experiment of visualizing timed comments and their sentiments and in the game SC Breakout. However, Music Metric’s claim that their sentiment analyzer is much more advanced methods than simple word detection and naïve methods56. To the best of my knowledge, it is not publicly known how their sentiment analyzer works, but this description definitely suggests that they have not used naive Bayes classification. Google Chart Tools Google chart tools57 have been used to visualize data in chapter 4 and also to visualize game levels in the menu of the Timed Comment Games. Node.js Node.js58 (also called Node) is an emerging platform powered by Google’s JavaScript engine, V8,59 and enables server-side applications using JavaScript. Node.js uses an event-driven, “non-blocking” (asynchronous) I/O model60. “Non-blocking” basically means, that clients will not have to wait “in line”. Instead, Node.js is capable of
51 52 53 54 55 56 57 58 59 60
http://www.w3schools.com/html5/html5_canvas.asp https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#Features https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html#Features http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/ http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/ https://developers.google.com/chart/ http://nodejs.org/ http://code.google.com/p/v8/ http://nodejs.org/
88
handling multiple threads simultaneously, which makes it very fast in combination with the V8 engine. I have furthermore used the Expressjs61, which is a high-level web framework for Node and the Jade62 template engine. In this thesis I use a node module called Natural63. Natural is still in the early stages (currently version 0.0.17) but it has a Naive Bayes classifier, which by default tokenize the corpus and stem it with a Lancaster Stemmer64. Stemming can increase the computational speed of the classifier, as fewer word variations will be stored while over stemming on the other hand can lead to worse performance. I use the Naive Bayes classifier in an attempt to learn a classifier to differentiate between timed and non-timed comments. I also use a part-of-speech tagging module called pos-js65, which is based on Eric Brill's trained rule set and English lexicon. Pos-js contains 45 tag categories, a lexer (which basically splits sequences of characters into tokens) and a JavaScript version of Eric Brill’s English lexicon66. Node.js has been used all the application and games developed in this thesis to different extends. WebSockets & Socket.IO The web has previously been build up around the so-called request/response paradigm where the client loads up a web page and nothing happens until the user click onto the next page67. WebSockets is a part of the HTML5 “toolset” that enables Web applications to maintain bidirectional communication with server-side processes68. What is key to note on a high-level (conceptual) plan is simply that WebSockets make it easy to stablish a constant communication flow between client and server. WebSocket are thus great for highly interactive and real-time Web applications that may have a lot of clients online and a lot of communication between the clients and the server. Examples of real-time application are online multiplayer games and voice chat. Socket.IO69 abstracts away WebSockets and provide simplified communication between the clients and the server using only JavaScript. Socket.IO will use WebSockets if available and fall back to other techniques for communication otherwise. Socket.IO can be used in combination with Node.js (and Express) where it provides effortless communication between Node.js (the server) and the client (the browser). MongoDB MongoDB is a scalable, high performance, document-oriented NoSQL database70 and as with all the technology used in this thesis, it is JavaScript based. The data is stored as JSON (JavaScript Object Notation) style documents called BSON71. JSON is an easy to read and commonly used data-interchange format that is supported by
61 62 63 64 65 66 67 68 69 70 71
http://expressjs.com/ https://github.com/visionmedia/jade https://github.com/NaturalNode/natural http://www.comp.lancs.ac.uk/computing/research/stemming/index.htm https://github.com/fortnightlabs/pos-js https://raw.github.com/fortnightlabs/pos-js/master/lexicon.js http://www.html5rocks.com/en/tutorials/websockets/basics/ http://dev.w3.org/html5/websockets/ http://socket.io/ http://www.mongodb.org/ http://bsonspec.org/
89
various programming languages. I have already mentioned the usage of MongoDB in the data mining section where it was used to store a data sample from SoundCloud. In relation to the developed concepts, MongoDB is used to store highscore tables and play statistics of the games and it has been used in combination with Mongoose72 to design appropriate schemas73 (models) for the data. Heroku Heroku74 is a cloud application platform that has been used to deploy the developed applications and games. The platform provides free deployment for small projects, which may be scaled up if necessary. The primary means for deploying apps to Heroku is by using Git75. Heroku started supporting Node.js in the summer 2011 and add-ons can be acquired for MongoDB (the possible add-ons are currently MongoHQ or MongoLab).
72 73 74 75
http://mongoosejs.com/ http://www.mongodb.org/display/DOCS/Schema+Design http://www.heroku.com/ https://devcenter.heroku.com/articles/git
90
Appendix B
9.
TF-IDF
Table 7-1: Top 10 terms using only the standard stop-words list in Natural
Figure 9.1: Standard stop-words list in Natural
Figure 9.2: Harsh stop-words list
91
Appendix C
10.
INITIAL EXPERIMENTS
Basic Statistics The first prototype I created was a very simplistic program for studying timed comments in various contexts. The program works by summarizing word counts from up to 200 tracks at a time using a predetermined list of words (see Figure 10.1). Using the program, it is possible to extract 200 tracks from a genre, all the tracks of an artist, and individual tracks from SoundCloud. The result is then visualized in a browser window with a pie chart from Google chart tools76 and by printing out the counts. The output of the prototype indicated that there were definitely differences in the word usage, as it can e.g. be seen from Figure 10.2.
Figure 10.1: Predetermined list of words that the program counts
Figure 10.2: Word frequencies in all the tracks of three different artists on SoundCloud
Box2D Visualizer
Figure 10.3: Box2D Visualizer
The second experiment I constructed was a basic visualization of timed comments using a JavaScript port of Box2D77. The program works by playing a track from SoundCloud and creating a circle object each time a timed comment appears. This circle object is dropped from the top of the screen and bounces around using the physics engine in Box2D. The number of characters in the comments determines the 76 77
https://developers.google.com/chart/ http://code.google.com/p/box2dweb/
92
sizes of the circles. For the purpose of visualizing the comments at the point of their timestamp, I constructed a simple loop that tests if the current time of the audio is equivalent to the timestamp of the next comment. This same loop has been used for the cloud music games developed in this thesis and also contributed to the rise of the idea of making games that utilize timed comments. Comments Visualizer The Box2D visualizer fueled the idea for a more detailed and appealing visualization that incorporates sentiment analysis from Music Metrics78. While experimenting with Music Metrics API, I stumbled upon a similar idea developed for music hack day that uses the sentiment analyzer to visualize tweets79 by Eduard Prats Molner. Presentational aspects of the comment visualizer including a particle movement algorithm were borrowed from this idea to make the prototyping process swifter. The comment visualizer gathers all the comments of a selected track, which are then sent to music metrics for sentiment analysis. The returned response for each comment is then visualized at its timestamp and summarized in the bottom of the screen while the track is played back. The sentiment analyzer gives each comment a score from 1 (negative) to 5 (positive), which are then converted into colored squares (green being positive and red being negative). As show in Figure 10.4, each particle (comment) can be hovered with the mouse, which causes the text to appear in the center of the screen alongside its sentiment score and the confidence of that score. The confidences of the scores are in general very low. This is due to the typically short texts in the comments, which are hard for the sentiment analyzer to classify. The classifier moreover wrongly classifies a lot of the comments as being negative due to some words that are usually considered negative but are not in most SoundCloud comments. For instance, comments like “sick” or “fucking dope” are not usually negative in relation to a music track.
Figure 10.4: Comments Visualizer
78 79
http://www.musicmetric.com/2010/01/musicmetrics-sentiment-analysis-v1-0-beta/ http://playground.jocabola.com/mhd/tweet-mood/
93
Appendix D
11.
BACK-END DESIGN: TIMED COMMENT GAMES
The database is a key component of the game suite application. It allows for efficient navigation of game levels and thus facilitates the music discovery aspects of the timed comment games. It is further constructed using the database system MongoDB (see Appendix A). The database consists of collections, which are upgraded when a user plays a level (track) or submits a new score. The model (or schema) for storing data in the collections has been revised several times doing the development process to make retrieval efficient, intuitive, and also to limit the amount of storage space needed by only storing the most necessary information. The main collection is organized by artists as opposed to tracks or game-type. This solution was chosen, as it was considered maintainable and viable but also because it supports the document-oriented nature of MongoDB. A simple instance of an artist in the collection, which contains one track and one played game, is shown in Figure 11.1. The artist in the example has only been played one single time as a game. {
//ARTIST "plays" : 1, "_id" : 16730, "tracks" : [ { "plays" : 1,
//TRACK(S) "id" : 47174436 "high_scores" : [ { //GAME(S) "game" : "missile_command" "plays" : 1, "scores" : [ { //PLAYER(S) "time_played" : 6776, "score" : 3022, "user_id" : 649509 } ], } ],
} ] } Figure 11.1: Example of artist object in the database
The schema makes it efficient to retrieve all the play statistics of an artist but conversely less efficient to retrieve all the play statistics of a specific player. To retrieve all the stats of a player would require iteration over all players, in all games, in all tracks, in all artists. This structure is intentionally chosen based on an assessment as to which items that will need to be retrieved most frequently as well as it being a somewhat intuitive structure. To clarify, every time a user (player) searches for a level to play (an artist or a track), the database must be searched in order to present the statistics to the user as shown in Figure 11.2. This is arguable a more frequent occurring event than the viewing of a players profile thus making the need to retrieve artists or tracks more common. To save storage space (and to avoid privacy issues80), only players SoundCloud ids are stored in the database. The ids are all that is needed to obtain additional data
80
http://developer.soundcloud.com/docs/api/terms-of-use#privacy
94
about the artist, tracks, and player’s (e.g. user names or followers) from SoundCloud. Also, only the top 10 scores for each track are stored in the current version. Hence, the 10th best score of a track will be removed if beaten by another player. This is a conscious design decision, as the dynamic high score tables are meant to encourage players to remain active in order to keep their high scores.
Figure 11.2: Data about artist retrieved from database
With an intention of future proofing the database, collections that hold the most frequently played tracks and the latest tracks played for each game have been created (see Figure 11.3). These are not directly related to the main collection (organized by artists) but a mirroring of the most important stats, which are updated each time a game is played. These collections contain a maximum of 100 tracks, each with a maximum of 10 high-scores. This makes retrieving essential statistics efficient and independent of the size of the artist collection, which might become important when the main collection increases in size, as the most popular and the latest tracks (as mentioned) are gathered from the database and displayed to the user visiting the site. {
//GAME (Asteroids) "hot" : [ { //TRACK(S) "id" : 14027183, "high_scores" : [ { //PLAYERS(S) "user_id" : 649509, "score" : 54, "time_played" : 4495 } ], "plays" : 13 }, ], "latest" : [ //SAME STRUCTURE AS “HOT” ],
} ⇓
Figure 11.3: Example of structure in the
95
Acknowledgements
12.
During my work on this thesis, studies and concepts similar to those described here have emerged. Most noticeably is a game based on timed comments called WaveRaid that was presented on the SoundCloud developer blog on April 11th, 2012. The game makes use of a new feature that was simultaneously added to SoundCloud called onTimedComment. I have not myself used this feature, as I had already constructed my games at this point and also because my games require some analysis of comments that might slow the game if done on the exact time of the comment. In terms of research, a publication entitled â&#x20AC;&#x153;Generating game content from open dataâ&#x20AC;? was released on May 31, 2012. This publication defines the term Data Games and touch upon many of the same ideas that are presented in my research. I have chosen not to incorporated this research when establishing my own definition of Dynamic Data Games, which is presented in 2.5.2.
96
Bibliography Bibliography
13.
[1] FORA.tv. (2006, June) FORA.tv, inc. [Online]. Viewed 2012 May 20. Available: http://fora.tv/fora/fora_transcript_pdf.php?cid=355 [2] R. A. R. MacDonald, D. J. Hargreaves, and D. Miell, "What are musical identities, and why are they important?," in Musical Identities.: Oxford University Press, 2002, pp. 120. [3] R. S. Wurman, Information Anxiety.: Doubleday Publishing, 1989. [4] G.W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais, "The Vocabulary Problem in Human-System Communication," presented at Commun. ACM, vol. 30, no. 11, pp. 964 - 971, 1987. [5] F. Ricci, L. Rokach, and B. Shapira, "Introduction to Recommender Systems Handbook," in Recommender Systems Handbook.: Springer, 2010, pp. 1-35. [6] E. Pariser, "Introduction," in The Filter Bubble: What the Internet Is Hiding from You.: The Penguin Press HC, 2011, pp. 1-20. [7] SoundCloud. (undated) [Online]. Viewed 2012 Juli 24. Available: http://soundcloud.com/press/releases/2012-05-09-next-soundcloud.pdf [8] Google. (2011, Jan.) Google. [Online]. Viewed 2012 Juli 25. Available: http://support.google.com/youtube/bin/answer.py?hl=en&answer=116618 [9] Ted. (undated) ted.com. [Online]. Viewed 2012 August 2. Available: http://www.ted.com/pages/287 [10] G. J. Kowalski and M. T. Maybury, "Introduction to Information Retrieval Systems," in Information Storage And Retrieval Systems: Theory and Implementation , 2nd ed.: Kluwer Academic Publishers, 2002, pp. 1-24. [11] C. D. Manning, P. Raghavan, and H. SchĂźtze. (2009) An Introduction to Information Retrieval. (Online Edition). [Online]. Viewed 2012 June 11. Available: http://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf [12] N. Orio, "Music Retrieval: A Tutorial and Review," presented at Foundations and Trends in Information Retrieval, vol. 1, no. 1, pp. 1-90, 2006. [13] T. Li and L. Li, "Music Data Mining: An Introduction," in Music Data Mining.: CRC Press, 2012, pp. 3-31. [14] S. J. Downey and J. Futrelle, "Interdisciplinary Research Issues in Music Information Retrieval," presented at Journal of New Music Research, vol. 32, no. 2, pp. 121-131, 2003. [15] M. Schedl, "Web-Based and Community-Based Music Information Extraction," in Music Data Mining.: CRC Press, 2012, pp. 219-242.
97
[16] B. Whitman, "Learning The Meaning of Music," Massachusetts Institute of Technology , PhD Thesis 2005. [17] R. Typke, F. Wiering, and R. C. Veltkamp, "A Survey of Music Information Retrieval Systems," in Proc. ISMIR, 2005, pp. 153-160. [18] A. S. Lillie, "Musicbox: Navigating the space of your music," Massachusetts Institute of Technology, Masters Thesis 2008. [19] P. J. P. de Leรณn, "A Statistical Pattern Recognition Approach to Symbolic Music Classification," Universidad de Alicante, PhD Thesis 2011. [20] B. Whitman and S. Lawrence, "Inferring Descriptions and Similarity for Music from Community Metadata," in Proc. ICMC, 2002, pp. 591-598. [21] C. Mckay and I. Fujinaga, "Combining Features Extracted From Audio, Symbolic and Cultural Sources," in Proc. ISMIR, 2008, pp. 597-608. [22] T. Jehan, "Creating Music by Listening," Massachusetts Institute of Technology, PhD Thesis 2005. [23] P. Knees, P. Tim, S. Markus, and G. Widmer, "Combining Audio-based Similarity with Web-based Data to Accelerate Automatic Music Playlist Generation," in Proc. MIR, 2006, pp. 147-154. [24] K. Yoshii, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno, "Hybrid Collaborative and Content-Based Music Recommendation Using Probabilistic Model With Latent User Preferences," in Proc. ISMIR, 2006, pp. 296-301. [25] S. J. Downie, D. Byrd, and T. Crawford, "Ten Years of ISMIR: Reflections on Challenges and Opportunities," in Proc. ISMIR, 2009, pp. 13-18. [26] J. H. Lee, C. M. Jones, and S. J. Downie, "An Analysis of ISMIR Proceedings: Patterns of Authorship, Topic, and Citation," in Proc. ISMIR, 2009, pp. 57-62. [27] D. McEnnis and S. J. Cunningham, "Sociology and Music Recommendation Systems," in Proc. ISMIR, 2007, pp. 185-186. [28] C Basu, H. Hirsh, and W. W. Cohen, "Recommendation as Classification: Using Social and Content-Based Information in Recommendation," in Proc. AAAI/IAAI, 1998, pp. 714-720. [29] F. Ricci, "Context-aware music recommender systems: workshop keynote abstract," in Proc. WWW (Companion Volume), 2012, pp. 865-866. [30] J. L. Herlocker, J. A. Konstan, and J. Riedl, "Explaining collaborative filtering recommendations," in Proc. CSCW, 2000, pp. 241 - 250. [31] C. Anderson, The Long Tail: Why the Future of Business Is Selling Less of More.: Hyperion, 2006. [32] D. Bogdanov and P. Herrera, "How much metadata do we need in music recommendation? A subjective evaluation using preference sets," in Proc. ISMIR, 2011, pp. 97-102.
98
[33] M. Tiemann, S. Pauws, and F. Vignoli, "Ensemble Learning For Hybrid Music Recommendation," in Proc. ISMIR, 2007, pp. 179-180. [34] J. V. Thong et al., "SPEECHBOT: An Experimental Speech-Based Search Engine for Multimedia Content in the Web," presented at IEEE Transactions on Multimedia, vol. 4, no. 1, pp. 88-96, 2002. [35] A Ghias, J. Logan, D. Chamberlin, and B. C. Smith, "Query by Humming: Musical Information Retrieval in an Audio Database," in Proc. ACM Multimedia, 1995, pp. 231236. [36] O. Celma, R. Miquel, and P. Herrera, "Foafing the Music: A Music Recommendation System based on RSS Feeds and User Preferences," in Proc. ISMIR, 2005, pp. 464-467. [37] D. M. Weigl and C. Guastavino, "User Studie in The Music Information Retrieval Litterature," in Proc. ISMIR, 2011, pp. 335-340. [38] K. Salen and E. Zimmerman, Game Design Fundamentals: Game Design Fundamentals.: MIT Press, 2004. [39] T Fullerton, Game Design Workshop, Second Edition: A Playcentric Approach to Creating Innovative Games, 2nd ed.: Morgan Kaufmann, 2008. [40] R. Caillois, Man, Play and Games.: University of Illinois Press, 2001, pp. 12-27. [41] R. Rouse III, Game Design: Theory and Practice, 2nd ed.: Jones & Bartlett Publishers, 2004. [42] B. Brathwaite and I. Schreiber, Challenges For Game Designers.: Charles River Media, 2008. [43] J. Juul, "The Open and the Closed: Games of Emergence and Games of Progression," in Proc. CGDC, 2002, pp. 323-329. [44] M. Pichlmair and K. Fares, "Levels of Sound: On the Principles of Interactivity in Music Video Games," in Proc. DIGRA, 2007, pp. 424-430. [45] G. Whitmore. (2003, May) Gamasutra. [Online]. Viewed 2012 Juli 4. Available: http://www.gamasutra.com/resource_guide/20030528/whitmore_pfv.htm [46] M. Siegler. (2009, May) TechCrunch. [Online]. Viewed 2012 Juli 4. Available: http://techcrunch.com/2009/05/28/spymaster-the-twitter-game-that-willassassinate-your-time/ [47] D. de Vaus, "The Context Of Design," in Research Design in Social Research.: Sage Publications Ltd, 2001, pp. 1-16. [48] R. Keele, "Quantitative Versus Qualitative Research, or Both?," in Nursing Research and Evidence-Based Practice: Ten Steps to Success.: Jones & Bartlett Learning, 2012, pp. 35-52. [49] W. Tichy. (1998, May) Institute for Program Structures and Data Organization. [Online]. Viewed 2012 August 5. Available: http://www.ipd.uka.de/~tichy/publications/moreexperiments/node4.html
99
[50] D. G. Feitelson, "Experimental Computer Science: The Need for a Cultural Change," School of Computer Science and Engineering; The Hebrew University of Jerusalem, Paper 2006. [51] P. J. Denning, "Performance Modeling: Experimental Computer Science at its Best," presented at Communications of the ACM, vol. 24, no. 11, pp. 725-727, 1981. [52] J. Gustedt, E. Jeannot, and M. Quinson, "Experimental Validation in Large-Scale Systems: a Survey of Methodologies," presented at Paralel Processing Letters, vol. 19, no. 3, pp. 399-418, 2009. [Online]. Viewed 2012 August 6. Available: http://hal.archives-ouvertes.fr/docs/00/42/64/94/PDF/RR-6859.pdf [53] J. V. Maanen, P. K. Manning, and M. L. Miller, "Series Editor's Introduction," in Exploratory Research in the Social Sciences (Qualitative Research Methods).: Sage Publications, 2001, pp. v-vi. [54] J. Nielsen, "Heuristic evaluation of user interfaces," in Proc. SIGCHI, 1990, pp. 249-256. [55] O. R. Zaiane. (1999) University of Alberta. [Online]. Viewed 2012 May 26. Available: http://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/notes/Chapter1/index.html [56] J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd ed.: Morgan Kaufmann, 2005. [57] K. J. Cios and W. Pedrycz, "The Knowledge Discovery Process," in Data Mining: A Knowledge Discovery Approach.: Springer, 2007, pp. 9-24. [58] U. Fayyad, G. Piatetsky-Shapiro, and S. Padhraic, "From Data Mining to Knowledge Discovery in Databases," AI Magazine, vol. 17, no. 3, pp. 37-54, 1996. [59] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.: Morgan Kaufmann, 2011. [60] M. Hearst. (2003, Oct.) UC Berkeley School of Information. [Online]. Viewed 2012 June 28. Available: http://people.ischool.berkeley.edu/~hearst/text-mining.html [61] W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, "Knowledge Discovery in Databases: An Overview," presented at AI Magazine, vol. 13, no. 3, pp. 57-70, 1992. [62] P. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining.: Addison Wesley, 2005. [63] D. Jurafsky and J. H. Martin, "Introduction," in Speech and Language Processing.: Pearson Prentice Hall, 2008, pp. 1-16. [64] T. Wilson, W. Janyce, and P. Hoffmann, "Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis," in Proc. HLT/EMNLP, 2005, pp. 347-354. [Online]. Viewed 2012 June 27. Available: http://www.cs.uic.edu/~liub/FBS/opinion-mining.pdf [65] E Brill, "A simple rule-based part of speech tagger," in Proc. ANLP, 1992, pp. 152-155. [66] A. Laplante, "Users' Relevance Criteria in Music Retrieval in Everyday Life: An Exploratory Study," in Proc. ISMIR, 2010, pp. 601-606.
100
[67] J. H. Lee and S. J. Downie, "Survey Of Music Information Needs, Uses, And Seeking Behaviours: Preliminary Findings," in Proc. ISMIR, 2004. [68] C. Horn, "Analysis and Classication of Twitter messages," Graz University of Technology, Master's Thesis 2010. [69] L. Massari, "What's inside MySpace comments?," in Proc. SPECTS, 2010, pp. 311-316. [70] W. G. Yee, A. Yates, S. Liu, and O. Frieder, "Are Web User Comments Useful for Search?," in Proc. LSDS-IR, 2009, pp. 63-70. [71] Mike T., P. Sud, and F. Vis, "Commenting on YouTube videos: From guatemalan rock to El Big Bang," presented at JASIST, vol. 63, no. 3, pp. 616-629, Nov 2011. [72] S. Siersdorfer, S. Chelaru, N. Wolfgang, and J. S. Pedro, "How useful are your comments?: analyzing and predicting youtube comments and comment ratings," in Proc. WWW, 2010, pp. 891-900. [73] J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, "Tackling the Poor Assumptions of Naive Bayes Text Classifiers," in Proc. ICML, 2003, pp. 616-623. [74] G. M. Weiss and F. Provost, "Learning when training data are costly: the effect of class distribution on tree induction," presented at Journal of Artificial Intelligence Research, vol. 19, no. 1, pp. 315-354, July 2003. [75] T. Pohle, P. Knees, M. Schedl, and G. Widmer, "Meaningfully Browsing Music Services," in Proc. ISMIR, 1986, pp. 115-116. [76] Y. Raimond and M. Sandler, "A Web of Musical Information," in Proc. ISMIR, 2008, pp. 263-268. [77] H. Fujihara, M. Goto, and J. Ogata, "Hyperlinking Lyrics: a Method For Creating Hyperlinks Between Phrases in Song Lyrics," in Proc. of ISMIR, 2008, pp. 281-286. [78] Y. Hu and M. Ogihara, "NextOne Player: A Music Recommendation System Based on User Behavior," in Proc. ISMIR, 2011, pp. 103-108. [79] R. Cai, C. Zhang, C. Wang, L. Zhang, and W. Ma, "MusicSense: Contextual Music Recommendation using Emotional Allocation Modeling," in Proc. ACM Multimedia, 2007, pp. 553-556. [80] P. Osman. (2012, Mar.) Stack Overflow. [Online]. Viewed 2012 April 27. Available: http://stackoverflow.com/questions/9880442/soundcloud-api-track-query-filtering [81] Metacritic. (undated) Metacritic.com. [Online]. Viewed 2012 Juli 25. Available: http://www.metacritic.com/game/ios/tweet-land/critic-reviews [82] SoundCloud. (2010, Jan) SoundCloud. [Online]. Viewed 2012 June 10. Available: http://blog.soundcloud.com/2010/01/14/namm-announcement [83] S. Sonvilla-Weiss, "Introduction: Mashups, Remix Practices and the Recombination of Existing Digital Content," in Mashup Cultures.: Springer Vienna Architecture, 2010, pp. 8-23.
101
[84] A. Easton and G. Easton, "Demystifying Mashups," in Proc. InSITE, 2012, pp. 479-486. [85] E. Navas, "Regressive and Reflexive Mashups in Sampling Culture," in Mashup Cultures.: Springer Vienna Architecture, 2010, pp. 157-177. [86] C. C. Collie and E. D. Gorman. (2011, December) Intellectual Property and Technology Forum. [Online]. Viewed 2012 Juli 4. Available: http://bciptf.org/wpcontent/uploads/2011/12/Gorman-Collie-IPTF.pdf [87] M. Hamasaki, H. Takeda, T. Hope, and T. Nishimura, "Network Analysis of an Emergent Massively Collaborative Creation Community: How Can People Create Videos Collaboratively without Collaboration?," in Proc. ICWSM, 2009. [88] A. Laplante, "Social Capital and Music Discovery: An Examination of the Ties through Which Late Adolescents Discover New Music," in Proc. ISMIR, 2011, pp. 341-346. [89] A. Laplante and S. J. Downie, "Everyday Life Music Information-Seeking Behaviour of Young Adults," in Proc. ISMIR, 2006, pp. 381-382. [90] K. Jacobson and M.B. Sandler, "Musically Meaningful or Just Noise? An Analysis of Online Artist Networks," in Proc. CMMR, 2008, pp. 107-118.
102