10 Key Data Mining Challenges in NLP and Their Solutions

Page 1

Key Data Mining Challenges in NLP and Their Solutions


Overview Even as we grow in our ability to extract vital information from big data, the scientific community still faces roadblocks that pose major data mining challenges. In this article, we will discuss 10 key issues that we face in modern data mining and their possible solutions.


1. Heterogeneous Data Data can be of low quality, adulterated, and incomplete. That’s why, apart from the complexity of gathering data from different data warehouses, heterogeneous data types (HDT) are one of the major data mining challenges. This is mostly because big data comes from different sources, may be automatically accumulated or manual, and can be subject to various handlers.


2. Scattered Data One of the most prominent data mining challenges is collecting data from platforms across numerous computing environments. Storing copious amounts of data on a single server is not feasible, which is why data is stored on local servers. This is the case with most large-scale organizations. In fact, it is something we ourselves faced while data munging for an international health care provider for sentiment analysis. Scattered data could also mean that data is stored in different sources such as a CRM tool or a local file on a personal computer. This situation often presents itself when an organization may want to analyze data from multiple sources such as Hubspot, a .csv file, and an Oracle database. Companies are also looking at more non-traditional ways to bridge the gaps that their internal data may not fill by collecting data from external sources.


3. Data Ethics Data mining challenges involve the question of ethics in data collection to quite a degree. This is different from data privacy. For example, there may not be express permission from the original source of the data from where it is collected, even if it is on a public platform like a social media channel or a public comment on an online consumer review forum. For example, an e-commerce website might access a consumer’s personal information such as location, address, age, buying preferences, etc., and use it for trend analysis without notifying the consumer. The question becomes whether or not it is OK to mine personal data even if for the seemingly straightforward purpose of building business intelligence.


4. Data Privacy Data privacy is a serious issue that arises in data collection, especially when it comes to social media listening and analysis. Social media organizations are under the spotlight even more so because of the Cambridge Analytica/Facebook fiasco, which ultimately led to the former filing for bankruptcy, and the latter paying a $5 billion fine to the U.S. government for data privacy violations. Because of this ongoing scrutiny, many social media platforms including Facebook, Snapchat, and Instagram have tightened their data privacy regulations. And this has proven to pose data mining challenges for social sentiment analysis.


5. Data Privacy Data privacy is a serious issue that arises in data collection, especially when it comes to social media listening and analysis. Social media organizations are under the spotlight even more so because of the Cambridge Analytica/Facebook fiasco, which ultimately led to the former filing for bankruptcy, and the latter paying a $5 billion fine to the U.S. government for data privacy violations. Because of this ongoing scrutiny, many social media platforms including Facebook, Snapchat, and Instagram have tightened their data privacy regulations. And this has proven to pose data mining challenges for social sentiment analysis.


Understand your data, customers, & employees with 12X the speed and accuracy.

Thank you! Visit: www.repustate.com to learn more


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.