5 minute read
PRIVACY, ANONYMITY, TRUST, AND READING BETWEEN THE LINES
by NYUTandon
As we have already established in this issue of CyberByte, the concept of privacy in a computer science context is complex and needs to be addressed in a diverse number of ways. The research initiatives of Dr. Rachel Greenstad affirm just how diverse these approaches can be. During her previous work as director of Drexel University’s Privacy, Security, and Automation Laboratory (PSAL), and as an associate professor of computer science at NYU Tandon since 2019, Greenstadt has tackled issues of cyber harassment and cyber crime, sometimes in partnership with her CCS colleague Dr. Damon McCoy. One such example can be read at https:// dl.acm.org/doi/pdf/10.1145/3432909
She has also used techniques like topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to investigate a broad range of privacy concerns related to social media (see https://dl.acm.org/doi/ pdf/10.1145/2665943.2665958)
But, Greenstadt has also conducted studies that explore the double-sided potential of privacy technologies that can both conceal identities or reveal them.
In an interview with CyberByte in May of 2022, Greenstadt spoke about her work with one such technology called stylometry— a technique based on the premise that “we all speak a dialogue of one.” Using the assumption that the language everyone speaks is subtly unique, stylometry analyzes linguistic patterns that can assist in identifying authorship of both text and code samples. Though the technique can be applied using a deep learning system, she noted that “a handcrafted approach, relying on character N-gram sliding windows of 1 to 4 characters” works about as well.
These models are used to characterize author style, which, in turn, can be used for forensic attribution work, such as “trying to see how identities migrate from one forum to another to understand dynamics of online communities, like cybercriminal forums.” But, she has also used the technique to test the viability of anonymization strategies so those who might need to hide their identities know whether their submissions are safe (see https://arxiv.org/pdf/1512.08546.pdf ).
Greenstadt notes that there is a somewhat long history of using stylometry to verify authorship, and it can and has been admissible in a number of court cases. One of the earliest instances was back in the 1960s, when it was used to prove authorship of some of The Federalist Papers. More recently, it was used to prove that a detective novel called The Cuckoo’s Calling was actually the pseudonymous work of Harry Potter author J.K. Rowling (https://en.wikipedia.org/wiki/The_ Cuckoo%27s_Calling).
For Greenstadt, some of her interest in stylometry goes back to graduate school conversations with colleagues, including Nick Mathewson, who would later cofound the Tor project. But, she admits that she initially had “quite a bit of skepticism” about the technique. “My thought was ‘well, if you pick the words, surely you could trick it’.” When she finally decided to pursue the work, it was at the request of Michael Brennan, the first doctoral student she worked with at Drexel. “By this point, there had been quite a lot of work done on stylometry, but nobody had really ‘stresstested’ it. What we could potentially bring to it was this cybersecurity adversarial mindset by asking ‘what would it take to fool the system?’”
To test these ideas, she designed a study that required two types of writing samples from participants. In the first set they merely attempted to disguise their writing in some way, but in the second group of samples, Greenstadt asked participants to write like Cormac McCarthy, an American novelist famous for tales of the frontier, and also for eliminating basic writing conventions, like punctuation and capitalization, from his text. The students were “supposed to do narratives of their morning, and we got these super grim tales of coffee and shaving,” she observed. While the immediate results of these tests were not conclusive, “there was an idea that something interesting was going on,” she noted, adding that she felt additional studies could reveal some useful things. “Maybe you couldn’t identify who did the writing, “ she explains, “but you can see some type of deception is going on.” Years later, the value of just noting that “deception is going on,” played out in an examination of Reddit accounts that were identified as part of the Russian manipulation in the U.S. Though the comparison didn’t reveal the authors’ identities, by just comparing them to other accounts on Reddit, it did show that the accounts in question “did not have stylistic integrity. These were supposed to be separate accounts, but they didn’t read that way.”
In addition to running stylometry studies, which have been documented in about 20 different publications, Greenstadt’s most recent research in this area has focused on a subset of stylometry called author verification. As she explained, these initiatives can “determine if the same person wrote two different texts,” which is useful when looking to find “sock puppet” accounts that write threatening or harassing messages. She recently presented one such study, co-authored with new graduate Dr. Janith Weerasinghe, at the 16th International AAAI Conference on Web and Social Media. (You can read the full article at https://ojs.aaai.org/index. php/ICWSM/article/view/19359/19131).
While her work has provided tools to expose those who abuse the privacy of others—during the interview Greenstadt noted that an algorithm she developed with McCoy is possibly being used by the FBI—she has also made important research contributions to the other side of the privacy coin. That is, she sees the potential to design systems to help people more effectively anonymize their text. This includes work that has evaluated the relationship between anonymity and trustworthiness. Intrigued by the idea that Wikipedia bans contributions that come through Tor or from other pseudonymous contributors, she was curious to see if there were higher incidences of misleading or incorrect information coming from these sources. As explained in a news story prepared by NYU, Greenstadt and her colleagues examined more than 11,000 Wikipedia contributions made by Tor users, who despite the ban, were able to edit pages between 2007 and 2018 (https://engineering.nyu.edu/news/torusers-untapped-resource-wikipedia). Not only was there little difference between the quality of these edits and those from editors who can be identified, but those editing through Tor were more likely to focus on topics that may be considered controversial, such as politics, technology, and religion.
Based on her findings, Greenstadt suggested that rather than banning these pseudonymous contributors that they and any other untrusted accounts should simply be reviewed before going live. She points out that this practice is already common in 17 other Wikipedia editions, including those in Germany and Russia. If such a review is instituted, it appears these pseudonymous editors could be valuable contributors at very little risk. And, as Greenstadt pointed out in the interview, “What are you really trusting them to do? People aren’t supposed to be providing facts out of their heads.” She adds that in the current political climate around the world, where anonymity is often a matter of life or death, particularly for journalists and activists,the motivation to use Tor, or to disguise ones identity online, becomes clearer.
Ultimately, for Greenstadt, the issue of privacy is, as quoted earlier, “about how individuals can manage the data about them, and their selfpresentation in their interaction in online and offline spaces.” Researching both sides of the privacy coin can help individuals negotiate this increasingly difficult management task.
STUDENT PROFILE: ALAN CAO