7 minute read

CYBERSPACE

Next Article
GEOPOLITICS

GEOPOLITICS

REINING IN DATA SCRAPING

As commercially-driven data scraping becomes commonplace, better safeguards are necessary to maintain data integrity and protect intellectual property rights.

Advertisement

SYNERGIA FOUNDATION

RESEARCH TEAM

Touted as the ‘new oil’, data has taken on commercial value as companies rely on statistical tools and data analysis to take operational decisions. Not surprisingly, the practice of web/screen scraping is gaining prominence in business circles. Data scraping is nothing but dipping into the huge amount of data available in public spaces like websites and programmes, which are then used as basic feed to predict industry trends and conduct market research, including industrial espionage.

Despite its perceived benefits, the use of scraping software has not been without controversy. Most recently, this was demonstrated in a lawsuit filed by Facebook’s new corporate identity, Meta Platforms Inc. As can be recalled, the tech conglomerate had sued Social Trading Ltd, a Hong Kong-based social media analytics company, for allegedly scraping account profiles from the former’s social media websites. The collected data was then reportedly sold as “demographics and insights about influencers and their audiences” to various interested parties. Meta considers this unauthorised scraping of data a breach of contract, embodied by the ‘Terms of Use’ of its family of apps. Social Trading was also accused of engaging in illegal hacking and unjust enrichment by circumventing the anti-scrapping blocks imposed by Meta. Although the final judgement, in this case, is yet to be delivered, it has already resurrected the debate on the legality of data scraping.

TECHNOLOGICAL ARCHITECTURE

The practice of data scraping has been around for quite some time. It came into prominence a few decades back when computing transitioned from being the province of a few to being ubiquitous. This dramatic shift had meant that there was a considerable amount of data lying in old, difficult-to-access systems. In this veritable milieu, screen scraping technology emerged as a popular method to interface with older apps that did not have data exporting capabilities. Today, the most common use of data scraping is web scraping, whereby information is grabbed from a website using software or script, mostly with the help of a scraper bot. The extracted information is then stored in various formats like SQL, Excel or HTML. Indeed, there are many different types of scrapers that can be used for different purposes. While some scrape data in bulk, others collect content on-demand or structured data without human interaction.

Of course, data parsing forms a critical part of this technology. It goes hand-in-hand with web scraping by converting the raw data collected into readable text. In fact, most

Most businesses derive immense economic value from automated external data acquisition. From discovering new market needs to evaluating the competition, scraping technology helps them to overcome informational asymmetry in a highly data-driven world.

good web scraping tools have a built-in data parser, which automatically converts the extracted code to a user’s chosen format.

A DOUBLE-EDGED SWORD

Most businesses derive immense economic value from such processes of automated external data acquisition. From discovering new market needs to evaluating the competition, scraping technology helps them to overcome informational asymmetry in a highly data-driven world. It also assists in monitoring product sentiment, improving brand management, bolstering cyber security and facilitating informed decision-making. For start-up companies, such tools are particularly useful, as they can optimise their entry points and design realistic growth strategies. As benign as these applications may appear to the uninitiated, scraping tools also have a potentially dark side to them. For instance, they can be deployed to steal protected content, violating many copyrights and intellectual property (IP) laws in the process. They may also be used to extract confidential information about a company, adversely impacting its growth and future business plans. Of course, privacy is yet another concern. In April 2021, for instance, around 533 million Facebook users were compromised when their information was scraped from the social media website and published on a hacking forum. Given this reality, it has become extremely important to define the legal contours around data scraping.

LEGALLY MURKY?

Facebook’s lawsuit against Social Trading is not the first high-profile case dealing with the legality of data scraping. As early as 2000, the American e-commerce corporation eBay had filed a preliminary injunction against Bidder’s Edge - an auction data aggregator, seeking to prevent the latter’s use of bots in gathering data from eBay’s website. By claiming that it had violated the ‘Trespass to Chattels’ law, the plaintiff was able to successfully obtain a preliminary injunction. In 2001, however, when a travel agency sued a competitor for scraping pricing data from its website, the judge had ruled that it was not necessarily “unauthorised access” for the purpose of federal hacking laws. The mere fact that a “do not scrape us” clause was inserted in the website’s terms of service was not deemed enough to warrant a legally binding agreement. Since then, the legality of scraping has been subject to numerous debates, implicating copyright laws and personal data regulations. Most recently, the U.S. Court of Appeals ruled that any data that is publicly available and not copyrighted is fair game for web scrapers in the controversial case of hiQ Labs, Inc. v. LinkedIn Corp. In doing so, it has opened Pandora’s box of questions about the privacy rights of social media users and the rights of businesses to protect themselves from data hijacking. What is certain, however, is that the unauthorised scraping of non-public data is almost always illegal.

Meanwhile, in Europe, all personal data is protected under the General Data Protection Regulation (GDPR), irrespective of whether it is publicly available or not. In fact, a data analytics company in the EU was fined a pretty hefty amount for scraping public data from the Polish business register. Although the fine was later overturned, the court explicitly upheld the prohibition on the scraping of publicly available data. As far as India is concerned, there have been a few cases where data scraping was classified as a violation of copyright law. For instance, in 2016, OLX – the online marketplace company- had obtained a permanent restraining order against the data scraping practices of another firm. However, barring a few such IPR-related cases, the courts have not dealt with the legality of web scraping per se. At best, there is a provision in the Information Technology (IT) Act, 2000, which penalises the unauthorised extraction of data from a computer resource without the owner’s permission. However, it can always be argued that this does not apply to the scraping of publicly available information. After all, information that is freely available or accessible in the public domain is excluded from the definition of sensitive personal data in the IT Act. Owing to this legal uncertainty, the recent lawsuit against Social Trading will be watched keenly not just in the U.S. but also in other jurisdictions. The importance of foreign precedents in shaping the trajectory of data governance cannot be ignored, especially in countries like India, where personal data protection laws are yet to be finalised.

DATA SCRAPING APPLICATIONS

MARKETING:

l Collating data from popular websites to create engaging content and SEO monitoring l Leading generation l Reputation monitoring l Competitive analysis.

RETAIL: l Competitor price monitoring l Fetching product descriptions from multiple manufacturers l Monitoring consumer sentiment l conducting market research l Sending product data from an e-commerce site to another online.

FINANCE:

l News aggregation to draw insights from external sources l Market data aggregation l Risk assessment l Financial health assessment by processing financial statements.

ACADEMIA:

l Collecting data for research and follow-up studies l Content analsis

Source: sciforce

DATA SCIENCE:

l Predictive analysis l Natural language processing, including sentiment analysis l Collecting data for training of machine learning models.

SYNERGIA FOUNDATION

Assessment

While data scraping may afford businesses the opportunity to make informed decisions and enhance customer satisfaction, it can also result in the wrongful exploitation or infringement of IPs. Unauthorised exposure of personal data is also a significant risk. In this context, it becomes exceedingly critical to bring legal clarity around this practice.

Even as the law tries to catch up, companies are still having their data stolen and abused. The need of the hour, therefore, is to scale up investments in anti-bot and antiscraping technology. Limiting the maximum number of access requests made by one IP address, coding IP content in non- extractable formats and recording server information for easy traceability can also go a long way in protecting data.

This article is from: