Q4 2014 Security Report | Bots, Spiders & Scrapers Excerpts | Document

Page 1

akamai’s [state of the internet] /

Q4 2014 State of the Internet Security Report – Bots, Spiders and Scrapters Selected excerpts As developers seek to gather, store and utilize the wealth of information available from other websites, third-­‐party content bots and scrapers have become increasingly prevalent. These meta searches typically use APIs (Application Program Interfaces) to access the data, but many now also use screen scrapers to collect information. These methods for obtaining this valuable data place an increased load on webservers. While bot behavior is benign for the most part, poorly coded bots can impact site performance and may resemble denial of service attacks or may even be part of a rival’s competitive intelligence program. Understanding the different categories of third-­‐party content bots, how they affect a website, and how to mitigate their impact, is an important part of building a secure web presence. Akamai has observed bots and scrapers being used for a wide variety of purposes, including setting up fraudulent sites, analysis of corporate financial statements, search and metasearch engines, competitive intelligence gathering, and more. Bots and scrapers can be divided into four categories, depending on their desirability and aggressiveness. Desirability is scored based on how much a site wants to host the bot. Aggressiveness is a function of the rate of requests and its impact on site availability. Highly desired bots exhibiting low aggression are bots that help users find content. These bots, such as Googlebot, are generally well-­‐behaved – they respect robots.txt and don’t make many requests at once. A second category is undesired and high aggressive bots and scrapers; these bots may be benign but poorly coded, although this category also includes malicious bots intent on web server disruptions. In 2014, Akamai observed a substantial increase in the number of these bots and scrapers targeting the travel, hotel and hospitality sectors, likely attributable to rapidly developed mobile apps that use scrapers as the fastest and easiest way to collect information from disparate websites. High desirable bots with high aggression, the third category, are more difficult to manage because they can’t be blocked completely, although their aggressiveness can cause site slowdowns and latency. Finally, bots with low desirability and low aggression characteristics fall into the fourth category. These bots crawl a site’s product pages with intent to reuse the content on shadow sites for fraud or counterfeiting scams. More difficult to block, these bots often stay under the detection threshold of security products and try to blend in with regular

1


akamai’s [state of the internet] /

user traffic through the use of headless browsers. Mitigation techniques vary depending on the classification of the bot, and there is a corresponding mitigation strategy for each type of bot. Akamai uses a wide variety of techniques to determine the owner and intent of a bot. For example, the volume of requests can help Akamai determine the bot’s platform. The sequence and pages a bot scrapes can also reveal information about the bot’s intent. Additionally, the user-­‐agent header can sometimes provide a unique and identifiable user agent – such as Googlebot, url-­‐lib or curl – and Whois can sometimes expose bot owners. Bots and scrapers will continue to be a problem for many organizations, regardless of industry. Development of a strategy to contain and mitigate the effects of undesirable bots should be a part of the operations plan of every website. Whether using a defensive framework such as the one advocated by Akamai, or another method, it’s important for each organization to evaluate which bots it will allow to access its site. A set of bots that are highly desirable for one organization may appear malicious to another, and the criteria can change over time. As an organization expands into new markets, a previously unwanted bot may become the key to sharing information. Frequent analysis and modification of security policies is key to mitigating the risks posed by bots and scrapers. Get the full Q4 2014 State of the Internet – Security Report with all the details Each quarter Akamai produces a quarterly Internet security report. Download the Q4 2014 State of the Internet – Security Report for: • • • • • • • • • • •

Analysis of DDoS attack trends Bandwidth (Gbps) and volume (Mpps) statistics Year-­‐over-­‐year and quarter-­‐by-­‐quarter analysis Application layer attacks Infrastructure attacks Attack frequency, size and sources Where and when DDoSers strike Spotlight: A multiple TCP Flag DDoS attack Malware: Evolution from cross-­‐platform to destruction Botnet profiling technique: Web application attacks Performance mitigation: Bots, spiders and scrapers

The more you know about cybersecurity, the better you can protect your network against cybercrime. Download the free the Q4 2014 State of the Internet – Security Report at http://www.stateoftheinternet.com/security-­‐reports today.

2


akamai’s [state of the internet] /

About stateoftheinternet.com StateoftheInternet.com, brought to you by Akamai, serves as the home for content and information intended to provide an informed view into online connectivity and cybersecurity trends as well as related metrics, including Internet connection speeds, broadband adoption, mobile usage, outages, and cyber-­‐attacks and threats. Visitors to www.stateoftheinternet.com can find current and archived versions of Akamai’s State of the Internet (Connectivity and Security) reports, the company’s data visualizations, and other resources designed to help put context around the ever-­‐changing Internet landscape.

3


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.