Recognizing Social Language in Online Conversations
Natural Language and Dialogue Systems http://nlds.soe.ucsc.edu
Center for Games and Playable Media http://games.soe.ucsc.edu
Prof. Marilyn Walker Copenhagen
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Motivation
More and more of the information available online is in the form of user generated content, e.g. reviews, conversations, comment streams
Current techniques and tools in NLP largely targeted at
monologic, traditional media such as Wall Street Journal
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Any topic you can imagine
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
d Theoretical Orientation
Very Large Corpora
our own, has ese online conData Source Dialog Affordances Avail# DiWord Count able from Site alogs [122, 123, 133, This is because Convinceme Rebuttals, Topic, Stance 3072 3,650,000 y contextually4forums Reply links, Quoting, Debate 11,800 84,300,000 Topic properties. In be our extensive CreateDebate Support, Oppose and Clari11,876 15,000,000 fication Links, Stance, Quotreate and curate ing, Topic nge of different y made a subset Table 1: Dialog sources in our corpus: with types of discussion, . Then, because available affordances from the site, media types and size of the resource. of some sites provide   deAffordances his area, we training data data, which automatic we ch goals and to scope the relevant theoretical literature. We then summarize
lists the dis along with To date, our f 26478 di-
, 4forums
sting of 102 g dataNatural from Language and Dialogue Systems e number of ere are also
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Dialogic: 4Forums, P1-P2-P3 Threading
Forums dialogues have tree-like reply structure Multiple people can respond to any given post Two person deep exchanges not unusual (Doc Jones & jazyjason)
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Dialogic Quote/Responses on 4forums
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Motivation What do people in different places think about X? Language of social media is completely different: social, dialogic, emotional, informal
Computational models of dialogue have never been based on, or tested on such a huge volume of data, many topics and styles.
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
account unless you read it that way (R1 in Fig. 3). The language used is social and non-literal; irony and sarcasm abound, e.g., Really? Well, when I have a kid, I’ll be sure to just leave it in the woods, since it can apparently care for itself (R5 in Fig. 3, see also see Q2 and R2). Insults are common: But in reality your opinion is gibberish (R3 in Fig. 3), and Here come the Christians, thinking they can know everything by guessing, and commiting the genetic fallacy left and right (R7 in Fig. 3). Subjective dialog acts such as sarcasm and insults targeted at other conversants and their opinions are very frequent. A pilot annotation study on 10,003 Quote/Response pairs from 4forums indicates that about 12% of posts are sarcastic, 23% are emotional, and 12% are insulting or nasty, behaviors that are uncommon in traditional media [1, 142, 141].
Types of Socio-Emotional Language
Topic Quote Q, Response R Evolution Q1: How can you say such things? The Bible says that God CREATED over and OVER and OVER again! And you reject that and say that everything came about by evolution? If you reject the literal account of the Creation in Genesis, you are saying that God is a liar! If you cannot trust God’s Word from the first verse, how can you know that the rest of it can be trusted? R1: It’s not a literal account unless you interpret it that way. —————————Q2: I jsut voted. sorry if some people actually have, you know, LIVES and don’t sit around all day on debate forums to cater to some atheists posts that he thiks they should drop everything for. emoticon-rolleyes emoticon-rolleyes emoticonrolleyes As to the rest of your post, well, from your attitude I can tell you are not Christian in the least. Therefore I am content in knowing where people that spew garbage like this will end up in the End. R2: No, let me guess . . . er . . . McDonalds. No, Disneyland. Am I getting closer? Gay Q3: Gavin Newsom- I expected more from him when I supported him in the 2003 election. He showed himself as a marfamily-man/Catholic, but he ended up being the exact oppisate, supporting abortion, and giving homosexuals marriage riage licenses. I love San Francisco, but I hate the people. Sometimes, the people make me want to move to Sacramento or DC to fix things up. R3: And what is wrong with giving homosexuals the right to settle down with the person they love? What is it to you if a few limp-wrists get married in San Francisco? Homosexuals are people, too, who take out their garbage, pay their taxes, go to work, take care of their dogs, and what they do in their bedroom is none of your business. Abortion Q4: Equality is not defined by you or me. It is defined by the Creator who created men. R4: Actually I think it is defined by the creator who created all women. But in reality your opinion is gibberish. Equality is, like every other word, defined by the people who use the language. Currently it means “the same”. People aren’t equal because they are not all the same. Any attempt to argue otherwise is a display of gross stupidity. —————————Q5: The key issue is that once children are born they are not physically dependent on a particular individual. R5 Really? Well, when I have a kid, I’ll be sure to just leave it in the woods, since it can apparently care for itself. Gun Q6: How about a sin tax of $100 each time you buy a gun and $10 each time you buy a bullet? Control R6: How about a sin tax of $100 dollars each time you log on and $10 dollars a word for each time you speak one? It’s fair because it would help pay for all the lying propaganda damage you do to society. Rights come with responsibilities. On guns, no can do. SCOTUS has ruled many times that a right freely stated in the Constitution cannot be compelled to purchase a license nor a fee to exercise. NEXT! Existence Q7: okay, well i think that you are just finding reasons to go against Him. I think that you had some bad experiances of God when you were younger or a while ago that made you turn on God. You are looking for reasons, not very good ones i might add, to convince people.....either way, God loves you. :) R7: Here come the Christians, thinking they can know everything by guessing, and commiting the genetic fallacy left and right.
Stance CON
emotional
PRO
factual
CON
sarcasm
PRO CON
PRO CON PRO
CON PRO PRO CON
Rhetorical Q
insults sarcasm Rhetorical Q
insults PRO CON
insults
Figure 3: Sample Quote/Response Pairs from 4forums.com with Mechanical Turk annotations for Stance. Natural and Dialogue TheLanguage information in Systems the growing
http://nlds.soe.ucsc.edu body of online opinion dialogs provides unique opportunities and challenges for the field. On the one hand, the sheer amount of data now available promises the possibility of better empirical understanding of dialog and its structure at scale. And given that an increasing portion of
Mechanical Turk Annotations 10,003 Q/R Pairs
20,384 P1/P2/ P3 threads
Selected to bias distribution of turn-initial discourse cues
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Mechanical Turk: Subjectivity in Dialogue Agree/Disagree Does the respondent agree or disagree with the prior post? Krippendorff’s α = 0.62
Fact/Emotion Is the respondent attempting to make a fact based argument or appealing to feelings and emotions? α = 0.32
Attack/Insult Is the respondent being
supportive/respectful or are they attacking/insulting in their writing? α = 0.42
Sarcasm Is the respondent using sarcasm? α = 0.22
Nice/Nasty Is the respondent attempting to be nice or is their attitude fairly nasty? α = 0.46
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Sarcasm Examples
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Annotations by Topic: Threshholds +-1 Agree
Sarcasm
Emote
Attack
Nasty
Evolution
10%
6%
16%
13%
9%
Gun Control
11%
8%
21%
16%
12%
Abortion
9%
6%
31%
16%
12%
Gay Marriage
13%
9%
23%
12%
8%
Exist. of God
11%
7%
26%
14%
10%
Healthcare
13%
10%
34%
17%
17%
Comm./Capt.
23%
8%
15%
8%
0%
Death Penalty
25%
5%
5%
5%
5%
Climate Change
20%
9%
17%
26%
17%
Marij. Legaliz.
5%
2%
20%
5%
5%
Topic
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Data not available before Sarcasm not just here, but everywhere Claire: This laptop is a great deal. If you think this laptop is a great deal, I’ve got a bridge for you to buy.
Restaurant reviews: “If you want to eat fairly good food in a lousy KEY with snooty hard to
locate waiters while practically sitting in your neighbors lap and yelling across the miniature table at your friends because the place is soooo loud, then this is the place for you!
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Bootstrapping Sarcasm & Nastiness Even if sarcasm is 10% of the data, expensive to collect more. Need 1000 to get 100.
Useful to have weakly
supervised or unsupervised methods to quickly adapt to new social media data
Try method from
Thelen&Riloff02,Riloff& Wiebe03
Both sarcasm &
nastiness to test generalization
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Sarcasm Examples
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Method Identify high precision “indicator” words and phrases Use learned indicators to train a sarcastic and nasty dialogue act classifiers that maximizes precision at the expense of recall
Use the classified utterances to learn general syntactic extraction patterns from sarcastic and nasty utterances
Bootstrap this process on unannotated text to learn new extraction patterns to use for classification
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Method adapted from R&W03
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Open Questions Previous work using this framework distinguishes subjective vs.
objective utterances in monologic traditional media, i.e. a news corpus. Can we adapt the algorithm to dialogue? Does the same method work for bootstrapping sarcasm and nastiness?
Theoretical work on sarcasm claims that it is highly context
dependent (Fox Tree, Gibbs, Sperber & Wilson). This method assumes the cues are in the utterance to be classified.
Sarcasm and nastiness in dialogue might be harder than
subjective/objective in news. Approach assumes in the first step you can develop a “High Precision” classifier.
Can we develop the sarcastic and nasty vocabulary this algorithm requires?
Will the ‘patterns’ used in R&W be useful in dialogue data or do we need to develop our own patterns that are dialogue specific?
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
First Step: Develop the Cues
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Have labelled data: use Χ2 measures?
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
What are the cues? HIT on MT
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
What are the Sarcasm & Nastiness Cues? Can we develop the sarcastic and nasty vocabulary this algorithm requires (context issue?)?
We decided to experiment with both methods. Provided the ‘human in the loop’, i.e. the Mechanical Turkers with the context
Automatically generate all unigrams, bigrams and trigrams and select ones predicted to be relevant using Χ2
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Get good agreement with 8 to 10 annotators
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Next Step: High Precision Classifier
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Which are the Best Cues? We want cues that are FREQUENT. Θ1 We want cues that are highly RELIABLE. Θ2 Two possible measures of reliability: R&W used PERCENT: of the posts retrieved by the cue, what percent of them are sarcastic (nasty) But we also have INTERANNOTATOR AGREEMENT for each cue on Mechanical Turk
Tested many variations: Focus on best: Mechanical Turk cues, + R&W Freq + R&W Percent
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
High Precision Classifier We want cues that are FREQUENT in the data. Θ1
We want cues that
are highly RELIABLE on the data Θ2
train over a variety of thresholds to find the best parameters
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
High Precision Classifier Results suggest PERCENT best reliability measure
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu http://nlds.soe.ucsc.edu
Next Step: Extraction Pattern Learning
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Syntactic Patterns Generalize across
utterances, new cues Not what humans consciously observe? Are these patterns tuned to news text?
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Examples of Learned Patterns
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Pattern Extraction: Generalization train over a variety of thresholds for θ1 and θ2, where the ‘indicator’ is now the pattern, to find the best parameters
use best parameters from pattern extraction
(training) and classify utterances in a new test-set
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Final Results
HP sarcasm: 54% precision and 38% recall Pattern sarcasm: 62% precision and 52% recall. HP nastiness: 58% precision and 49% recall Pattern nastiness: 75% precision and 62% recall
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Summary Bootstrapping improved F1 significantly Human in the loop Mechanical Turk worked better than χ² (corpus size?)
Riloff’s patterns not specific to news In process:
Using new classifier on unannotated data Mechanical Turk additional sarcasm
annotation Test to see whether able to skew distribution to favor selection of sarcastic utterances Exploring whether there are better patterns for dialogue
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
University of California Santa Cruz
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Sarcasm, not an emotion? Also insults etc. Dialogue acts or dialogue relations Relational analysis: need context to interpret about 25% of them but also indicators in turn itself: textual prosody, superlatives, strong verbs, emphasizers, cue words like “Oh” “Really”
Not a unitary phenomenon: Gibbs 2000 (a) jocularity, where speakers teased one another in humorous ways; (b) sarcasm, where speakers spoke positively to convey negative intent;
(c) rhetorical questions that implied either a humorous or critical assertion;
(d) hyperbole, where speakers exaggerate the reality of the situation (e) understatement
Fox Tree and Bryant 2002: People use term sarcasm for all of these Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Extra Slides
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Trained three feature sets
- ia: uses mechanical turk indicators and α and β parameters based on the ia metric
- percent: uses mechanical turk indicators and θ1 and θ2 parameters
- χ²: uses χ² indicators and θ1 and θ2 parameters
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Sarcasm: Lots of it, about 12% Q: Pot farms run by 'bad guys' getting closer to tourist spots - CNN.com Our national parks are safe as is, huh? Regular people have no reason to feel to need to carry concealed weapons for their own protection, huh? How can you be safe when mexican drug cartel members are literally behind the next bush and inclined to kill you for stumbling upon their operation?
R: HOT DAMN Nato maybe we better hot foot it up their and help keep that there Mexican drug cartel out.
---------------------- Q: It adds costs to the court system via increased litigation over custody, property, inheritance, etc.,
R: Yes, we shouldn't let those slaves be citizens, they will burden our legal system since they will now have...OMG...rights!
---------------------- Q: If you ground him from using the phone, you're a terrible father for tapping him on the wrist. In other words, you aren't showing love.
R: So it would be more loving to severely beat the child...forever? "Hey Timmy,
remember when you were 3 and you wrote on the wall with that crayon? *BAM* *SMACK*"
Natural Language and Dialogue Systems
http://nlds.soe.ucsc.edu
Subjectivity Dialogue Survey: Example The topic is Healthcare This was marked as:
– Clear Disagreement – – – –
• -4.3 on [-5, 5] Attacking • -1.8 on [-5,5] Neutral on Fact/Feeling • +0.5 on [-5,5] Nasty than nice • -1.3 on [-5,5] Label sarcastic by 1 in 3 • 0.33 on [0,1]
Natural Language and Dialogue Systems
Quote: Well to get healthcare from $6000 per person to $3,400 per person you'd have to pay them less or fire them. So do I understand you correctly that we will either cut pay or fire doctors and nurses?
Response: You are just playing dumb, right? There will be fewer technicians and fewer orders for equipment PER INSURED. Their pay will not be cut. There will be fewer of them PER INSURED. Initially the absolute volume will increase.
http://nlds.soe.ucsc.edu