WEB MINING AGAINST PEDOPHILIA

Research Paper

Computer Science

E-ISSN No : 2454-9916 | Volume : 3 | Issue : 5 | May 2017

1

Shweta Macwan | Dr. inż. Grzegorz Filcek 1 2

2

Student, Information Technology, Wroclaw University of Science and Technology, Wroclaw, Poland – 50-370. Assistant Professor, Information Technology, Wroclaw University of Science and Technology, Wroclaw, Poland – 50-370.

ABSTRACT The need of security over the web is the foremost necessity and handling the cybercrimes is a priority. The growing popularity of the social media has led the children to use the internet more for social communication than information gathering. Children needs to learn and grow with technology but child safety is also required. Pedophiles hunt for innocent children over such social media and chat room platforms which are not safe for the child. Due to lack of parental guidance, such cases lead to cybercrimes which kids are not aware of. Social media is not the only area where pedophilic activities takes place. The search on the search engine may also help in detecting a pedophile. Here, the main idea is to capture the pedophiles using the conversions made with a child and detecting it based on the pattern of words and language used by an adult. Also, with the help of the search engine's query detection a pedophilic activity can be traced. KEYWORDS: Web mining, Web content mining, Pedophile, cyber-crime, cyberpedophilia, pedophilic activity. INTRODUCTION Protection of children on cyber space is an extremely critical problem faced by our society across geographical and cultural boundaries. As more and more children in their teens have started using the Internet, there has been an alarming increase in cases of child abuse through the Web.[1] As a report published by the National Center for Missing and Exploited Children (NCMEC),1 out of 7 kids is solicited for sex online; 1 out of 33 kids receives aggressive online solicitation to meet in person 1 out of 3 kids receives unsolicited sexual content online.[2] Internet nowadays is providing an easy and convenient access to the predators or criminals. Parents on the other hand does not track how their children are using the Internet. The lack of attention from parents and the criminal intentions of some people gives birth to cybercrime in children. The pedophiles are people having a psychic disorder and are sexually attracted towards the prepubescents. Today cybercrime activities such as pedophilia activities are a major issue of concern. This activity is termed a cyberpedophilia. Children are not safe on the internet and there is a dire need for an internet space that is safe for children. It has always been recommended that parents monitor their children's activities on the internet as to what they post, what they see, whom do they chat to, what kind of messages so they receive. The World Wide Web is an architectural framework for accessing linked documents spread out over millions of machines all over the Internet. With the Internet usage gaining popularity and the steady growth of users, the World Wide Web has become a huge repository of data and serves as an important platform for the dissemination of information. Web mining can then be defined as for the discovery and analysis of useful information from the World Wide Web. The combination of Data Mining and World Wide Web is termed as Web Mining. Web mining is the use of data mining techniques to automatically discover and extract information from Web documents and services. Web mining is the application of data mining technique to web data to discover useful patterns. The data available on web is termed as Web data and the process of mining the web data is termed as Web mining. The most commonly used techniques are association rule, classification, clustering and sequential pattern identification.

Taxonomy of web mining: A. Web Structure Mining Web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. The preprocessing of this type of mining involves identifying interesting graph patterns or preprocessing the whole web graph to come up with some matrices. The most common example is PageRank. The structure of a typical web graph consists of web pages as nodes and hyperlinks as edges connecting between two related pages. Such mining can be done on intra-page document level or inter-page hyperlink level. Some of the major use of this technique is done in PageRank algorithm, Hubs and Authorities, HITS algorithm, Information Scent, etc. Useful information such as quality of a web page, interesting web structures and web page classification can be obtained. B. Web Usage Mining Web usage mining is the process of applying data mining techniques to the discovery of usage patterns from web data. The data available on the web is not only huge but also semi-structured. The browsing history of the user is stored in a log file which can be used to mine interesting patterns. logs, proxy logs or browser logs. These log files hold a lot of information such as URLs, IP addresses, time, date, etc. When people visit one website, they leave some data such as IP address, visiting pages, visiting time and so on, web usage mining will collect, analyses and process the log and recording data. [3] This technique is widely used in ecommerce, web transactions, path and pattern discovery, pattern analysis and many more.

Web data are usually in the following forms: Web content that includes text, images, structured records, videos, audio files etc., Web Structure that includes hyperlinks, document structure and tags and Web Usage that includes web server logs, application server logs and application level logs. For any type of mining the most important step to be done is preprocessing. Data preprocessing is the step where the raw data is processed in such a way that the extracted data would be useful to mine some knowledge. There are some levels of processing done on the raw data to obtain a knowledgeable data. These levels include selecting the target data from the raw data, extraction of some data and transform the processed data to obtain knowledge. The processing includes cleaning of noisy data, integrating the data, data transformation, data reduction and data discretization. There are three types of web mining techniques based upon the usage and the type of knowledge to be mined and extracted. Web usage mining, web structure mining and web content mining.

Fig. 1 Taxonomy of Web Mining C. Web Content Mining Web content mining is the process to discover useful information from text, image, audio or video data on the web. Information retrieval is the basic means for any information gathering technique which helps user to find the specific information from the large set of data.[4] This technique is mainly used for Natural

Copyright© 2016, IERJ. This open-access article is published under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License which permits Share (copy and redistribute the material in any medium or format) and Adapt (remix, transform, and build upon the material) under the Attribution-NonCommercial terms.

International Education & Research Journal [IERJ]

371

Turn static files into dynamic content formats.

Create a flipbook

WEB MINING AGAINST PEDOPHILIA

Published on Jun 11, 2018

International Education and Research Journal

The need of security over the web is the foremost necessity and handling the cybercrimes is a priority. The growing popularity of the social media has led the children to use the internet more for social communication than information gathering. Children needs to learn and grow with technology but child safety is also required. Pedophiles hunt for innocent children over such social media and chat room platforms which are not safe for the child. Due to lack of parental guidance, such cases lead to cybercrimes which kids are not aware of. Social media is not the only area where pedophilic activities takes place. The search on the search engine may also help in detecting a pedophile. Here, the main idea is to capture the pedophiles using the conversions made with a child and detecting it based on the pattern of words and language used by an adult. Also, with the help of the search engine's query detection a pedophilic activity can be traced.