Insights February 2022

Page 31

REINING IN DATA SCRAPING

As commercially-driven data scraping becomes commonplace, better safeguards are necessary to maintain data integrity and protect intellectual property rights. SYNERGIA FOUNDATION R E S E A RCH

T E A M

T

outed as the ‘new oil’, data has taken on commercial value as companies rely on statistical tools and data analysis to take operational decisions. Not surprisingly, the practice of web/screen scraping is gaining prominence in business circles. Data scraping is nothing but dipping into the huge amount of data available in public spaces like websites and programmes, which are then used as basic feed to predict industry trends and conduct market research, including industrial espionage. Despite its perceived benefits, the use of scraping software has not been without controversy. Most recently, this was demonstrated in a lawsuit filed by Facebook’s new corporate identity, Meta Platforms Inc. As can be recalled, the tech conglomerate had sued Social Trading Ltd, a Hong Kong-based social media analytics company, for allegedly scraping account profiles from the former’s social media websites. The collected data was then reportedly sold as “demographics and insights about influencers and their audiences” to various interested parties. Meta considers this unauthorised scraping of data a breach of contract, embodied by the ‘Terms of Use’ of its family of apps. Social Trading was also accused of engaging in illegal hacking and unjust enrichment by circumventing the anti-scrapping blocks imposed by Meta. Although the final judgement, in this case, is yet to be delivered, it has already resurrected the debate on the legality of data scraping.

Most businesses derive immense economic value from automated external data acquisition. From discovering new market needs to evaluating the competition, scraping technology helps them to overcome informational asymmetry in a highly data-driven world. TECHNOLOGICAL ARCHITECTURE

The practice of data scraping has been around for quite some time. It came into prominence a few decades back when computing transitioned from being the province of a few to being ubiquitous. This dramatic shift had meant that there was a considerable amount of data lying in old, difficult-to-access systems. In this veritable milieu, screen scraping technology emerged as a popular method to interface with older apps that did not have data exporting capabilities. Today, the most common use of data scraping is web scraping, whereby information is grabbed from a website using software or script, mostly with the help of a scraper bot. The extracted information is then stored in various formats like SQL, Excel or HTML. Indeed, there are many different types of scrapers that can be used for different purposes. While some scrape data in bulk, others collect content on-demand or structured data without human interaction. Of course, data parsing forms a critical part of this technology. It goes hand-in-hand with web scraping by converting the raw data collected into readable text. In fact, most


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.