OBUDA UNIVERSITY
Bánki Donát Faculty of Mechanical and Safety Engineering Institute of Mechatronics and Vehicle Engineering
Current and Future Trends in Data Analysis for Engineering Applications
OE-BGK
Student’s Name: Patricio
López Sánchez Álvaro
2017
Registration Number:
T006057/FI12904/B
Donát Bánki Faculty of Mechanical and Safety Engineering Institute of Mechatronics and Vehicle Engineering
THESIS Student’s surname, forename (s): Registration number: Thesis number:
Neptun code:
Branch of study, specialization: Mechanical and Safety Engineering, MSc Mechatronics
Engineering The proposed title of the thesis: Task description:
1. . 2. 3. Institutional consultant's name External consultant’s name and workplace:
The limitation period of the theme issued: Subjects of final examination:
Issued: Budapest,
PH
....................................................... Head of Institute
The Thesis is suitable for submission: ....................................................... Institutional consultant
iii
OBUDA UNIVERSITY Bรกnki Donรกt Faculty of Mechanical and Safety Engineering Institute of Mechatronics and Vehicle Engineering
DECLARATION OF STUDENT
The undersigned student hereby declares that the thesis as his own results, the literature, and tools used can be identified. Results in the achieved thesis may be used for the purposes and tasks of the university awarding institution free of charge, subject to any restrictions on encryption.
Budapest, May 15th -2017
......................................... Lรณpez ร lvaro
iv
DEDICATION
I dedicate this work to my loving son Jose David
v
ACKNOWLEDGEMENT
Firstly, I would like to express my sincere gratitude to my advisor Prof. Amir Mosavi for the continuous support, for his patience, motivation, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. Besides my advisor, I would like to thank my university â€œĂ“BUDA UNIVERSITYâ€? for giving me the opportunity of studding my Master program. My sincere thanks also go to my family for their continuous and unparalleled love, help and support. I am grateful to my brother for always being there for me as a friend. I am forever indebted to my parents for giving me the opportunities and experiences that have made me who I am. They selflessly encouraged me to explore new directions in life and seek my own destiny. This journey would not have been possible if not for them. I am grateful with my best friend Fernanda Quezada for all her knowledge, advice, her time. Finally, it is a pleasure to thanks Stefany Cevallos, who is an important person not just within this process but also in my live, thank you for your love and support.
vi
ABSTRACT Nowadays big-data analytics has become an important tool for different engineering fields. Its flexibility allows a constant increasing scope for several applications. Among others, the advantages of this approach are: time optimization, real time decision making, modeling and prediction. With them, it has been possible to find more accurate and feasible solutions for current engineering problems. Moreover, there are several new big-data applications within the industry. In one hand, in fields like Telecommunications, Information Technology (IT), Industry 4.0, Mechanical Engineering and others, a huge amount of data is always generated and consequently, the question of how to manage it is still open. On the other hand, in fields such as Government, Business, Banking, Health, and Education, the pursue of novel applications is producing a significant amount of data as well. Hence, in this research is addressed the impact of big-data analytics in the previous mentioned fields. Furthermore, the work presents how these fields have adopted big-data analytics within their processes, and the advantages attached to the use of those techniques. Finally, this survey shows the future trends of big-data analytics and its technical challenges as well. Keywords: Big Data; Data Mining; Big Data Applications; Big Data Research Trends
vii
INDEX ABSTRACT .............................................................................................................................. vi LIST OF TABLES ..................................................................................................................... 1 LIST OF FIGURES .................................................................................................................... 1 LIST OF ABBREVIATIONS .................................................................................................... 2 1
2
INTRODUCTION TO BIG DATA ANALYSIS ............................................................... 3 1.1
Introduction ................................................................................................................. 3
1.2
Big Data ....................................................................................................................... 5
1.2.1
Big Data Sets ........................................................................................................ 7
1.2.2
Properties .............................................................................................................. 8
1.2.3
Architecture .......................................................................................................... 9
1.2.4
Hardware, Software and Algorithms .................................................................. 12
1.3
Big Data and Decision Making ................................................................................. 14
1.4
Current Problems and Research Trends of Big Data Analysis .................................. 15
BIG DATA FOR ENGINEERING APPLICATIONS ..................................................... 18 2.1
Big Data in Telecommunications and Information Technology (IT) ........................ 18
2.1.1
Fifth Generation (5G) ......................................................................................... 18
2.1.2
Mobile Networks ................................................................................................ 19
2.1.3
Analysis in Network Traffic ............................................................................... 20
2.1.4
Internet of Things (IoT) technology ................................................................... 21
2.2
Big Data in Government, Health, and Education ...................................................... 24
2.2.1
Healthcare........................................................................................................... 24
2.2.2
Education ............................................................................................................ 30
2.2.3
Government Sector ............................................................................................. 33
2.3
Big Data in Electric Power Systems .......................................................................... 36
2.4
Big Data in Mechanical Engineering......................................................................... 39
2.4.1
Electric Vehicle Design ...................................................................................... 39
2.4.2
Analysis of Traffic Systems ............................................................................... 43
2.5
Big Data in Business.................................................................................................. 47
2.5.1
Supply Chain and Business Intelligence (BI) .................................................... 47
2.5.2
Big Data and Banking Customers Analytics ...................................................... 48
2.6
Big Data in Meteorology and Agricultural Science .................................................. 52
viii
3
2.6.1
Weather Forecasts .............................................................................................. 52
2.6.2
Agriculture ......................................................................................................... 53
BIG DATA AND FUTURE CHALLENGES AND TRENDS ........................................ 58 3.1
Challenges ................................................................................................................. 58
3.1.1
Fundamental Problems ....................................................................................... 58
3.1.2
Standardization ................................................................................................... 58
3.1.3
Big Data Computing Modes ............................................................................... 58
3.2
Big Data Development .............................................................................................. 60
3.2.1
Format Data Conversion .................................................................................... 60
3.2.2
Data Transmission .............................................................................................. 60
3.2.3
Real-Time Analytics .......................................................................................... 61
3.3
Big Data Security....................................................................................................... 61
3.3.1
Big Data Privacy ................................................................................................ 61
3.3.2
Data Quality ....................................................................................................... 61
3.3.3
Big Data Encryption ........................................................................................... 61
3.4
Big Data and the New Thinking ................................................................................ 61
3.5 Big Data Analysis in Managing Large-scale Flow-Table for Software-Defined Networking ........................................................................................................................... 62 3.6 4
5G Wireless Networks and Big Data ......................................................................... 62
DISCUSSION ................................................................................................................... 63
CONCLUSION ........................................................................................................................ 64 REFERENCES ......................................................................................................................... 65
1
LIST OF TABLES Table 1. BIG DATA ORIGIN AND TARGET USE DOMAINS (Demchenko, De Laat and Membrey, Defining Architecture Components of the Big Data Ecosystem) ............................................................ 6 Table 2. Hardware specifications for Big Data analysis. ..................................................................... 12 Table 3 Big Data in Telecommunications and Information Technology (IT) ..................................... 22 Table 4 Big Data in Health .................................................................................................................. 26 Table 5 Big Data in Education............................................................................................................. 31 Table 6 Big Data in Government ........................................................................................................ 34 Table 7 Big Data in Electric Power Systems ....................................................................................... 38 Table 8 Big Data in Mechanical Engineering ...................................................................................... 43 Table 9 Big Data in Business............................................................................................................... 49 Table 10 Big Data in Meteorology and Agricultural Science.............................................................. 54
LIST OF FIGURES Figure 1 Big Data sources (Michalik, Štofa and Zolotová). .................................................................. 8 Figure 2 Big Data Properties. ................................................................................................................. 8 Figure 3 Framework for data mining using Big Data (Sowmya and Suneetha). .................................... 9 Figure 4 Hadoop with HDFS and Map-Reduce (Trnka). .................................................................... 13 Figure 5 The blind men and the giant elephant: the localized (limited) view of each blind man leads to a biased conclusion (Wu, Zhu and Wu). ............................................................................................... 17 Figure 6 Number of researchers per year researches in Telecommunications and Information Technology (IT) .................................................................................................................................... 24 Figure 7 Number of researchers per year researches in Health Care ................................................... 30 Figure 8 Number of researchers per year researches in Education ...................................................... 33 Figure 9 Number of researchers per year researches in Government ................................................... 36 Figure 10 Big Data processing and analysing platform for electric power system condition monitoring (Guo, Feng and Li, Big data processing and analysis platform for condition monitoring of electric power system). ...................................................................................................................................... 37 Figure 11 Number of researchers per year researches in Electric Power Systems ............................... 39 Figure 12 Range estimation framework block diagram (Rahimi-Eichi and Chow) ............................. 40 Figure 13 The Resarch Model (Jeon, Lee and Cho). ............................................................................ 42 Figure 14 Number of researchers per year researches in Mechanical Engineering.............................. 47 Figure 15 Architecture of ICARE Solution (Sun, Morris and Xu)....................................................... 49 Figure 16 Number of researchers per year researches in Business ...................................................... 52 Figure 17 Number of researchers per year researches in Meteorology and Agricultural Science ....... 57 Figure 18 A general Big Data architecture (Chen, Mao and Zhang).................................................... 59 Figure 19 Big Data Challenges............................................................................................................. 60
2
LIST OF ABBREVIATIONS 1. CERN - The Large Hadron Collider 2. DVD - Digital Versatile Disc 3. GPU - Graphics Processor Unit 4. GB - Gigabyte 5. GHz – Giga Herzt 6. HDFD - Hadoop Distributed File System 7. RDDs - Resilient Distributed Datasets 8. DFS - Distributed file system 9. 5G - Fifth Generation 10. 4G – Fourth Generation 11. ISP - Internet Service Provider 12. DDoS - Distributed Denial of Service 13. IoT – Internet of Thing 14. IT – Information Technologies 15. HDL - High-density Lipoprotein 16. CDC - Centers for Disease Control and Prevention 17. IBM - International Business Machines 18. SCADA - Supervisory Control and Data Acquisitio 19. EV - Electronic Vehicles 20. EKF - Extended Kalman Filter 21. ITS - Intelligence Traffic Systems 22. BI - Business Intelligent 23. NWP - Numerical Weather Prediction 24. GPS - Global Positioning System 25. SDN - Software-defined networking
3
1 INTRODUCTION TO BIG DATA ANALYSIS 1.1 Introduction Over the last years, it has been seen how data is growing up in a large scale. Due to different activities, society is generating data every single day in many fields. There is a dramatically increase in data generation. It brings new challenges for data storing, data processing and the way to get useful information from big datasets. The goal is to find better solutions for a given process through the value or the information obtained within a huge dataset. A new definition for big datasets was born and it is called Big Data. This term is mainly used to describe a huge dataset generated with a high rate. It has different features, hidden values, and important patterns. Compared to traditional datasets, Big Data includes a significant amount of structured and unstructured data which demands more time and complex resources to analyze it (Chen et al, 2014). Before continuing with Big Data explanation it is worthy to understand what historically Big Bata has been and what is nowadays. Historically data has been generated by workers. Employees have entered data into computers systems since the introduction of information technology in the work sphere. Several years later, when the Internet appeared, users were allowed to generate their own data, this means they created data by themselves when navigating different websites. This has caused an exponential change about the amount generated data. According to (Sagiroglu & Sinanc, 2013) until 2003, 5 exabytes of data were created. Of course this amount of data is very much larger than the data generated when workers or employees used to do it. Nowadays that amount of data is created every two days. In 2012, new applications and platforms have appeared due to this 2.72 zettabytes of data generated, and it is predicted to double it every two years (Center, Intel IT, 2012). However, data without its analysis does not have any sense. The value of the big datasets comes from its analysis, discovering correlation and patters (Ho et al, 2016). For instance, users are generating data about their needs and preferences when they are streaming videos, sending e-mails, sending pictures, playing games, and making in-app purchases. Therefore, people are awash by a huge amount of data generated by them. Although there is a massive amount of data and probably analyzing it could be complex, there are many advantages and opportunities behind it. A statistical analysis for Big Data has become important (Fan, et al, 2014). Big Data is present everywhere in many scenarios. Fields such as business, public services, social networks, and industry are demanding to analyze their data in order to better understand their environment. Big Data gives many facilities to help companies to enhance process and make easier decision taking. For industry, having a Big Data analysis can help to gain new insights to enhance its productivity. Although there are many problems such as capturing, analyzing, searching, sharing, transferring, visualization and privacy, managers are working beside good
4
technological tools and are finding an important ally in Big Data analysis (Sri & Anusha, 2016). While Big Data is affecting different aspects of human endeavor, there are challenges in building Big Data applications (Noorwali, & Arruda 2016). Regarding nuclear research, “The Large Hadron Collider” (CERN) produces 15 petabytes of data annually, enough to fill more than 1.7 million dual-layer DVDs every year. According to YouTube, the popular medium is used heavily for both uploading and viewing. A conservative reported number says that 100 hours of video are being uploaded in every minute while 135,000 hours are watched (Mohanty, et al, 2015). The importance of Big Data analysis devolves upon the introduction of new process, new technologies, and new skills in order to find the potential value in data. Then, the crucial data can be processed and analyzed. The main point is that with all collected data from a specific field, a pattern and tendency can be found. Basically, people are looking for trends in enormous quantities of data and they are doing analytics on it to discover what is happening. They are trying to find attributions, because it is necessary to find correlations, in others words, what and why happened. To answer these two questions, algorithms can be used to take decisions, predict the future and get insight within the relationship between features and response (Fan & Lv, 2008). To get started with Big Data it is necessary to think in analytics, attribution and algorithms (Yin & Kaynak, 2015). Social human activities can be analyzed by algorithms. After an algorithm is performed, a set of attributions will appear. Previously, this procedure was based on guesswork, and now it can be made based on the data itself. Furthermore, Big Data and its analysis will bring a new life style for people and companies, because they are able to better understand their business, customers, and products. They can lead to enhance the efficiency related to sales, cost, customer service, etc. Big Data analysis offers the opportunity for companies to take better decisions (Wielki, 2013). Data is an important source of benefits and gives a competitive advantage. Indeed, this allows companies to develop new business models in order to be more competitive through the data analysis. Companies such as Google, Amazon, Facebook, have adopted these analysis methods of data achieving an increase in their profits. By using cookies or service log, data is gathered and stored in servers and will be analyzed to understand customer’s needs (Mohanty et al, 2015). Regarding the company’s size, the amount of data to be analyzed is variable. For instance, small companies should deal with data in scale of gigabytes, and big companies need to process data in scale of 10 terabytes. But companies like Facebook, Yahoo, and others; they should deal with data in peta bytes. In case of Google, it analyzes the clicks, links and contents on trillion pages’ views daily in fact. It is a big advantage for them because it allows to know people tendency or people behavior (Wu et al, 2014). The process of Big Data analysis can be divided in four main activities: data generation, data acquisition, data storage, and data analytics; each activity presents some challenges duet to the heterogeneous and complex behavior of the dataset (Latinović et al, 2016). Regarding the process how data has been gathered, the first step is to store data. Traditional storage
5
technologies are obsolete because they do not offer any processing techniques; the challenge is to process it. Thus, Big Data Analysis redefines the traditional storage methods (Femminella et al, 2016). Big Data analysis does not conflict with traditional data analysis, Big Data analysis has the flexibility to be applied in different field (He et al, 2015). According to (Chiang, 2015) Big Data is high-volume, high-velocity, high-variety, and high-complexity data sets. Data sets can be structured and unstructured data. It requires powerful computer systems, innovative algorithms that can get insight in shorter periods of time. During the last ten years new algorithms were developed and the processing capacity of computers has been improved. The fact that data now is abundance opens up the big possibility for new analysis tools are developed (Kitchin, 2014). As well, computational science is facing with Big Data analysis; some algorithms and optimizations are being developed (Richtárik & Takác, 2016). In fact, Big Data analysis is a perfect opportunity for a storm of new technologies, Big Data Analytics will become in a determining factor within taking decision in companies around the word. It is becoming a main part of the IT department in companies. Managers will invest more and more resources to improve the skills of employees and create improved software and hardware. Regarding decision making, Big Data analysis will be the next challenge for the innovation, competition, and productivity. Many solutions will appear to support all the issues involved in this context (Dobre & Xhafa, 2014). Further, costumers always are expecting for new capabilities and services from companies.
1.2 Big Data In the present day, Big Data has become an important topic which is discussed by several researchers. Before having a Big Data concept, it is necessary to define information and the difference between data. Data are structured and unstructured elements which in a manner suitable can be processed for people and computers. Data do not have a particular meaning before processing. Data become information only when it has a meaning (Latinović, et al, 2016). Big Data is a concept about digital information in big scale. Hence, having a suitable definition for big Data is needed. According to (Elarabi et al, 2016) Big Data is a lager collection of heterogeneous datasets which traditional databases or software are not able to process it because of the amount and the data nature. Regarding technologies and hardware architecture, Big Data is a big challenge to extract information from large volume of a wide variety of data in a complex dataset with a low latency (Anshari et al, 2016). Another important issue is, due to Big Data comes from different sources, dataset does not contain only structured data, and most of the data is unstructured. For these cases, a complex architecture is needed to analyze it. (Xinhua et al, 2013). Understanding the nature of Big Data, its features, trends, etc., opens the possibility to new technologies development, architecture model, and algorithms. Regarding dataset features and properties, there is other definition about Big Data. “Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making” (Demchenko et al, 2014). Volume, Velocity, Variety, and Complexity of Big Data continue to be an important
6
challenge for computer systems and algorithms (Gadepally et al, 2015). Every day fields such as science, telecom, industry, business social media networks, are demanding new skills for Big Data analysis. The Table 1 shows the origin domains of Big Data and the target use domains as well. It is important to understand the relationship between the “origin and the target”. Table 1. BIG DATA ORIGIN AND TARGET USE DOMAINS (Demchenko et al, 2014)
Big Data Origin Science Telecom Industry Business Living Environment Cities Social media and Networks Healthcare
Big Data Target Use Scientific Discovery New technologies Manufacturing process transport Personal services campaigns Living environment support Human Behavior Healthcare Support
control,
Big Data provides the opportunities for many applications in different areas. Within the framework for Big Data, there are three important issues: Data Acquisition, Data Processing, and Data Services. In other words, there are abstraction levels. These levels should be characterized by efficiency, processing time, flexibility and scalability (Sowmya & Suneetha, 2017). To show Big Data features and its components, Gartner definition brings a better view about Big Data: “Big Data Technologies are targeting to process high-volume, high-velocity, high-variety data to extract intended data value and ensure high-veracity of original data and obtained information that demand cost-effective, innovative forms of data and information processing for enhanced insight, decision making, and processes control; all of those demand new data models and new infrastructure services and tools that allow obtaining data from a variety of sources and delivering data in a variety of forms to different data and information consumers and devices.” Regarding the main features of Big Data, it is not just about scale and volume; it has four important properties to be analyzed such as: Velocity, Volume, Variety, and Complexity. Velocity term refers to how fast data is generated or transmitted. Volume feature refers to the amount of data that has been generated in a period of time. The last feature is complexity; it refers to the variety of the data, in other words the types of data (Jati et al, 2016).
7
1.2.1
Big Data Sets
According to (Srivastava & Chaudhari, 2016), there are unstructured and structure data sets. Regarding the format of the data, it is possible to differentiate two main groups of data.
1.2.1.1 Structured Data A structured dataset generally is a group of data that has a defined length and format. This type of data is collected from traditional sources. Examples of structured data are numbers, dates, groups of words and numbers called strings. Although, this kind of data is common in different fields, it only represent 20 percent of the total data (Hurwitz et al, 2013). According to (Kumar et al, 2014), by processing unstructured data it is possible to get a structured dataset. Transformed data in a structured schema help to ensure a good performance. Some pre-processing techniques like filtering helps to get a structure dataset. Although new sources are appearing even in real time and in large volume, for structured datasets it is possible to define two sources. The first are computers or machines, the data which comes from this kind of sources are generated without human intervention. Some examples regarding this kind of sources are sensors, web logs, and financial transactions. The second types of sources are humans, that is, all the information which is generated by human interactions. Examples of these sources are input data, this occurs when humans are introducing data into computing systems. Another type of structured dataset is when data is generated by the number of clicks that a link of a website receives (Hurwitz et al, 2013).
1.2.1.2 Unstructured Data Unstructured data involves data like text, XML, e-mail, image, video, etc. (Islam & Islam, 2014). Regarding the definition from (Hurwitz et al, 2013), unstructured data do not have a specified format. Moreover, in the real word 80 percent of the data is unstructured. Consequently, unstructured data is the most common type of data around the word. Regarding its processing, it is difficult since it needs an exhaustive analysis. Unstructured and structured data is generated either by machines or humans as it is shown in Figure 1. There are several sources which generate unstructured data. Machines like satellites generate unstructured data. The images taken from the satellite include weather data or maps such as Google Earth. Photographs and video recording from security systems or traffic videos are unstructured data as well. Thus, all the data which involves videos or pictures can be considered an unstructured data. Additionally, scientific data such as seismic image, atmospheric data, and high energy physics are also unstructured data. Human beings generate unstructured data as well. Files such as documents logs, e-mails, and information for companies represent an important amount of information in the world. Further, social media data, mobile data, and website contents are important sources of unstructured data today. Unstructured data denotes huge datasets which are challenging to exanimate with conventional tools. Nowadays, there are several tools to examine Big Data and handle unstructured data (Reshmy & Paulraj, 2015).
8
Figure 1 Big Data sources (Michalik et al, 2014).
1.2.2
Properties
Before going further, it is important to define the fundamental characteristics of Big Data: Velocity, Volume, Variety, and Complexity, as it is shown in Figure 2.
Figure 2 Big Data Properties.
1.2.2.1 Velocity Velocity focuses on the speed of data is collected or generated. In other words, it is the speed of data flowing into the system (Leung et al, 2016). It is important to consider that the velocity will increase with the number of the sources.
9
According to (Wan & Alagar, 2016), velocity also is characterized by three aspects. The first refers to the data in motion; for instance, it can be the data taken by a sensor. This data flows at different speeds and should be reactive and adaptive. The second is the speed at which data is generated and it must be stored and retrieved. This speed will have an impact on the volume. When the speed is increasing the volume increases as well. The third is the speed of the data when the data moves within the system. It is necessary to know the life time of the data and its utility.
1.2.2.2 Volume Volume refers to the amount of data generated, that is, it focuses on the quantity of data. Generally, it is larger than terabytes (Leung et al, 2016). In other words, volume represents the size of the data set (Arora et al, 2016).
1.2.2.3 Variety Variety refers heterogeneity embracing different types of the diversity of data, regarding data sources. That is, all types of data generated by different sources. For instance, emails, financial data, and huge percentage of non-numerical data (Michalik et al, 2014). Within variety there are two groups of data which are structured and unstructured.
1.2.2.4 Complexity Complexity refers to the ability to analyze, get information, and value from a dataset. Alternatively, how difficult it is to analyze a dataset (Mohanty et al, 2015).
1.2.3
Architecture
In other to understand Big Data environment, it is necessary to create a model of data characteristics from the real world. It is fundamental to establish a data model for data analysis. It helps a higher level of database construction, data access, data analysis, and data mining (Zheng et al, 2014). A Big Data framework is necessary to establish three levels. Those levels show how data becomes a large dataset, data processing, and different services obtained after data analysis. Figure 3 shows the data levels beside a short description.
Figure 3 Framework for data mining using Big Data (Sowmya & Suneetha, 2017).
10
1.2.3.1 Levels 1.2.3.1.1 Data Acquisition Data acquisition refers to the processes to get data from different sources. Big Data is collected from web logs, texts, documents, business transactions, biological data, video streaming, photograph, etc. (Leung et al, 2016). Important issues are necessary to consider in data acquisition: type of data, size, and frequency of data collection. Moreover, an efficient storage and organization mechanisms are necessary to optimize the storage space. Three are important sub-processes within data acquisition: data collection, data transmission, and data pre-processing units. In data collection all the data is collected from the sources such as web pages, data bases data warehouse, and others. It is important to identify the sources of data and all the different formats and structures (Al-Jaroodi & Mohamed, 2016). In data transmission, data is transmitted from the sources to data bases to storage before the processing. It is necessary to understand how the transmission works because parameters like delay should be considered. Finally, data pre-processing unit performs activities like cleaning, filtering, integration and selection while the transmission process (Miloslavskaya & Tolstoy, 2016). Most collected data can be processed as soon as they are collected. Practical results can be produced to support the final result.
1.2.3.1.2 Data Processing Data processing refers to storage and integration the data after the data acquisition process has been performed. Effective computing software and hardware with distributed large data storage is needed to processing datasets (Chetan et al, 2016). Many techniques were developed for big data processing such as intelligent search modes, data mining, machine learning, patter recognition, and statistical analysis (Al-Jaroodi & Mohamed, 2016). The aim of data processing is to obtain data process by using different methods. That is, to analyze the dataset with different tools to find patter tendency which helps to get information of the dataset (Sowmya & Suneetha, 2017). According to (Miloslavskaya & Tolstoy, 2016), there are three types of Big Data processing:
1.2.3.1.2.1 Batch processing in pseudo real or soft real-time Data is stored in a non-volatile memory and later processed. Features such as probability and time characteristic, are determined to accomplish the requirements of the applied problems. This type of data processing provides benefits like the use of more data. For instance, perform better training of predictive models.
1.2.3.1.2.2 Stream processing in hard real-time Data is collected and processed without storing. Only the result of the processes is stored. The incoming rate determines the probability and time characteristic of data conversion processes. The advantage of this model is the time optimization, because it is suitable for domains which need a low response of time.
11
1.2.3.1.2.3 Hybrid processing This process uses a hybrid model with three architectural principles: Robustness, the system has face hardware, software and human errors. Data immutability, data should be stored forever and it can be never modified. Re-computation, the result can be getting by recomputing the raw data ageing.
1.2.3.1.3 Data Service Data service or data retrieval refers to the different services like access and use of the information obtained from the data. It is like an interface to collect the data easily from different sources. Moreover, some requirements beside the analysis should be identified such as: accuracy, optimization, real time applications.
1.2.3.2 Characteristics of Data It is important to understand how the different type of data is relating with the architecture. Different types of data need different storages format. There are strong data models for storing transactional data and hierarchical models. On the other hand, there are weaker models for storing data like weblogs, social media, and documents (Mohanty et al, 2015).
1.2.3.2.1 Value Density Value density or also known as “information per TB� is a measure of which data must be processed for getting information. This characteristic is different regarding the type of data. For instance, data from transaction has some weird data. This kind of data is removed before data is stored. On the other hand, data from social media is simpler. A small amount of repetitive data is collected. During the pre-processing function some filtering techniques are used to filter out the unnecessary data to reduce the amount of data to be stored. This process helps to optimize the cost per TB stored (Mohanty et al, 2015).
1.2.3.2.2 Analytic Agility According to (Basanta et al, 2016), one of the features required from Big Data systems is the ability to process a big amount of data using a cluster of servers. This capacity involves different technical parameters associated to hardware issues. Furthermore, the scheme of data or data organization affects the ability to analyze data. For instance, analytics on structured data is more efficient than unstructured data.
1.2.3.2.3 Frequency and Concurrency of access Frequency and concurrency of access are an important issue within Big Data architecture. Remote access to data tends to introduce delays. However, remote communication offers the opportunities to introduce other techniques to reduce the computation time. One of these is parallel computing and the other one is distributed computing (Basanta et al, 2016). These days, it is important to have a low latency system for big data analysis. High throughput and reliability is needed in a large data system to find the information from a complex dataset (Du et al, 2016).
12
1.2.4
Hardware, Software and Algorithms
Although several companies have a big amount data archived, it is likely that they do not have the capacity to process it (Trnka, 2014). To extract insight or information from data sets a high speed and reliable system are need. This demands a complex architecture in hardware and software. For instance, in fields like business is necessary to do a real time analysis. Therefore, there are some software and algorithms working on a robust hardware which allow performing data analysis under reliable parameters.
1.2.4.1 Hardware A high performance in Big Data analysis plays an important role. To reach a high standard reliable hardware architecture is needed. According to (Xenopoulos et al, 2016) this architecture can be used in two different ways. The first one is as a capacity machine, which multiple users are allowed to use the computer resources to solve small problems. And the second one is a capability machine; this option allows to use almost all machine capacity to solve a single problem. Data is separated in ¨groups¨ and different machines. The analysis is executed in parallel; it allows increasing the speed of the analysis. I order to have a better idea about the hardware which is using for Big Data analysis bellow is listed some ¨super computers¨ and its technical parameters. These computers are used for simulations to predict weather conditions. Table 2. Hardware specifications for Big Data analysis.
Computer Rhea (non-GPU) Rhea(GPU) Eos Titan
Number of Number of processors cores 2 8 2 14 2 8 1 16
RAM 128 GB 1 TB 64 GB 32 GB
Frequency (GHz) 2.0 2.3 2.6 2.2
1.2.4.2 Software Having a powerful hardware, it is not enough for Big Data analysis. Suitable software is needed to process big datasets. There is much software which can do analytics, the most important of them and their important features are described below.
1.2.4.2.1 Hadoop Avery popular software for Big Data analysis is Hadoop. It is an open source framework which allows to develop and to execute applications on huge amount of data (Kadam et al, 2016). Hadoop has its own file system named HDFD (Hadoop Distributed File System). When Hadoop starts to analyze a big dataset, it spread the data in different groups and those groups are analyzed by many clusters, as it is addressed in Figure 4. If a failure appears while the software is performing the analysis, Hood has the functionality to make a copy the data. According to (Sivaraman & Manickachezian, 2014), “Hadoop enables users to store and process large volumes of data and analyses it in ways not previously possible with SQL-based approaches or less scalable solutions. Remarkable improvements in conventional compute
13
and storage resources help make Hadoop clusters feasible for most organizations�. The apache Hadoop project performs data analysis using the MapReduce algorithm, supporting by others features as scalability, and distributed computing (Jia et al , 2010). MapReduce algorithm processes the data in parallel at the same time. This algorithm works with two tasks, one of them is Map and the other one is Reduce. Hadoop evaluates gigabytes or petabytes of structured or unstructured data and transforms it into a more manageable data. The map task converts the dataset in key-value pairs called tuples, whilst the reduce task takes the output from the first task (Map) and combines the tuples in order to get a smaller set of tuples. The Map task is always performed first, after this process the Reduce task takes place (Singh & Kaur, 2014). Likewise, Hadoop provides many compression methods. By using these methods Big Data Analysis becomes better. The performance of Big Data depends on the speed of data transfer. Furthermore, it reduces the storage space, but the performance should be optimizing in different fields such as biomedical records (Jati et al, 2016). Although Hadoop has had good performance and it has helped the modern industry, it still has some limitations, since new computer models are needed with new processing requirements and the attained results from Hadoop performance may or may not be accurate at all. It depends on the nature of the field which has been analyzed as well.
Figure 4 Hadoop with HDFS and Map-Reduce (Trnka, 2014).
14
1.2.4.2.2 Apache Spark A remarkable software for Big Data analysis is Apache Spark, which provides to the user a friendly programming interface. According to (Zaharia, et al., 2012) Spark is based on Java and it was developed by AMPLab at UC Berkeley. Its primary programming abstraction is named Resilient Distributed Datasets (RDDs), RDD is a read-only, and it can be created through deterministic operations on any data in stable storage. This feature improves memory utilization, it is ideal for iterative applications, and each dataset is represented as a single object and transformations are invoked using methods on these objects. Spark also has options for SQL and a machine learning library called MLib (Gopalani & Arora, 2015). Compared to Hadoop, Spark has a lower task process because of in-memory analytic capabilities and superior programming (Xin et al, 2013). Spark has the flexibility to support applications written in Scala, Java, or Python, and supports a variety of iterative algorithms (Ghasemi & Chow, 2016).
1.2.4.3 MapReduce Algorithms According to (Shim, 2012), Map Reduce is a programming model which allows parallel applications to process big datasets in different cluster. Map Reduce represents data as (key, values) pairs and DFS (distributed file system) spread data in multiples machines. This framework is broken down into two important functions one of them is the Map function, and the other one is the reduce function. These functions take (key, values) pairs as input and may output (key, values) pairs. After the data analysis through Map and reduce functions, the result is written into a distributed file system.
1.3 Big Data and Decision Making According to (Mosavi, 2014), decision making process takes the best solution from all the possible choices. A satisfaction decision making procedures should be develop according to human behavior and environment features (Mosavi, 2013). Big Data analysis, artificial intelligence, and decision making are related each other. Nowadays companies have found potential advantages through decision making supporting by Big Data analysis and machine intelligence (Bailey, 2014). Big Data will establish a coherent basis for decision making through computer systems and with the help of the corresponding processes (Schermann, et al., 2014). Further, by data processing valuable insights have appeared which are helping to make the right decision choices and improve different system (Van Oort & Cats, 2015). Under decision context the most important requirements to be able to access is the value of the information or different “solutions� generated by decision processes (Kowalczyk & Buxmann, 2014). Beyond Big Data analysis, it is important to focus on the ability to visualize insights using some machine intelligence techniques to effective decision making for people (Nevo et al, 2015). Regarding (Babu & Sastry, 2014) work, the evolution of Big Data analysis and predictive analysis have given a new way to explore analytics-driven automation, decision making in high volume.
15
According to (Mohanty et al, 2015), getting insight from data and perform decision based on data analysis is the goal of decision making. There is a procedure that has to follow, data analysis get information for data. According to (Mosavi, 2010) The support of intelligence computational systems it is necessary to analyze the inputs variables and utilize the results to achieve a satisfactory decision making. When the information is generated, it is possible to understand the tendencies and patters. And the last part of the procedure involves three steps insight, prediction and foresight. Data crosses through five stages as part of analyzing and gaining insight: acquire, prepare, analyze, validate and operationalize.
1.4 Current Problems and Research Trends of Big Data Analysis Regarding the traffic of vehicles in big cities, the number of vehicles has increased dramatically. For this reason, new challenges have emerged, with traffic jams being the most important. In other to find a solution to tackle this problem, people are analyzing datasets generated by traffic systems. After dataset analysis, different traffic models can be created. The aim of this analysis is to predict traffic flow and to have a better route plan (Arief et al, 2016). Furthermore, cities are becoming “smarts cities� through improving different sectors such as schools, transportation, power plants, waste administration, water supply, etc. Therefore, Big Data Analysis techniques should be able to face with all this data and provide an optimal solution. Nowadays, machines and sensors in buildings, networks, vehicles, planes, among others are creating and accumulating data. The world becomes more instrumented. Technologies such as Internet of things and Cloud-Computing are generating an extra volume of data. This type of data is generated with high rate; it has a complex nature because of the different types of sources which are generating data. Moreover, there are billions of mobile phones around the world. In 2020, it is estimated that 50 billons of these devices will be connected to mobile networks and Internet (Gerhard et al, 2012). Since smartphones are potential generators of information, a big amount of data has been generated and it is continuously produced every single day within mobile networks (Guo et al, 2016). Besides, it is possible to get other parameters such as the position of the users, velocity, number of daily call phones, and text messages. According to (Larkou et al, 2014), current days, there are improvements in smartphone devices and new complex applications are being developed. For researches it is a big opportunity to explore complex interdisciplinary areas from the big data perspective. The analysis of mobile networks and smartphone devices provides facility to understand the physical world. Within mobile networks it is possible to understand individual behavior through mobility, and learn different social patterns in communication and interactions. To be able to analyze all the features which involve mobile networks, complex algorithms should be created with scalable designs in order to benefit from Big Data and its advantages (Kulcu et al, 2016). The key for getting values from data, it is to build a suitable and coherent model for data analysis. Models should be able to analyze key values, detect anomalies and trends, make predictions, and perform other analysis. Due to all of these requirements, statics modeling, data mining, machine learning and artificial intelligence techniques have had a big importance
16
in Big Data Analytics context. Further, merging of actual and historical with appropriate techniques data would add a new value in the decision making (Robak et al, 2013). Databases have tried to allow predictable scalability and some constraints were eliminated of the until-then fixed database schema. As a consequence, NoSQL and others challenges for hardware architecture appeared,, which led to new distributed processing paradigms; therefore, working with parallel processing on commodity hardware is needed (Dobre & Xhafa, 2014). As it was described previously, the first step is to storage the data, but sometimes it could be physically and economically infeasible. When data is stored, researches have to face with several challenges like: redundancy, distributed storage, and others. A well know software for processing data or to perform Big Data analysis is Apache Hadoop. It is an open source framework written in JAVA which is developed by Google and it allows to acquire and distribute processing of huge datasets within clusters of computers using programming models (Singh & Kaur, 2014). With the proliferation of Big Data, it can give a significant value, quality, suitability, in different areas. By knowing how to get information from datasets, it provides to companies many advantages such as: market development, operational efficiency, demand predictions, loyalty, and decision making. It is important to acquire knowledge from data analysis and provide it to different scenarios. If worn chooses or wrong methods are selected the return of investment would be not rewarding (Pondel, 2015). It will become the main factor with respect to decision taking. Big Data analysis is the next innovation for competition, productivity and other solutions. Data mining applicable in engineering and data reduction techniques maintains the integrity of data and help for an efficient Big Data process (Esmaeili & Mosavi, 2010). Supporting by many algorithms like MapReduce gives many facilities for large scale data analysis (Dobre & Xhafa, 2014). The most fundamental challenge for the Big Data applications is to explore the large volumes of data and extract useful information or knowledge for future actions. Otherwise, if it is not possible to understand a big dataset, the scope of the solution regarding the problem beside it, could be inappropriate as it is shown in Figure 5.
17
Figure 5 The blind men and the giant elephant: the localized (limited) view of each blind man leads to a biased conclusion (Wu et al, 2014).
Nevertheless, nowadays there are tools to manage big datasets, before to use algorithms or any particular software it is important to know the nature of the dataset. All dataset features should be considered and some technical aspects as well. According to (Kitchin & Lauriault, 2014), there are some issues like protocols, organizational processes, measurement scales, categories, and standards that are relevant when a big data sets is analyzed.
18
2
BIG DATA FOR ENGINEERING APPLICATIONS
In recent times, the data generation in different industry fields has increased in a big scale. Several applications, services, and products are offered by many sectors and allow a comfortable life for people, who are demanding to access these services or trying to obtain new products according to their preference and tendency. Regarding people’s culture, age, gender, education, and other important aspects their preferences for a certain service or product change. The duty of the industry is to find a better solution for a specific problem according to these aspects and achieving a fault-free and cost efficient of the process within the companies. Solving problems within engineering fields is a demanding decision making process and should be considered from different prospective (Mosavi, 2013). Regarding the supply chain management, can be improved by implementing Big Data solutions (Reichert, 2014). Two important scenarios are important for Big Data analysis in industry. The first is to perform Big Data analysis to create a new service or product. The second one is to enhance an existing service or product by the analysis of the previous version of it. Data is a critical asset of a company because is a “raw material�; then, the challenge is to analyze these data and find solutions to improve or create a new service. Society is dynamic and the modern industry should deal with all issues regarding social behavior and needs. Although for enterprises finding a solution for a given problem is important, they are identifying how to increase their revenue and to optimize processes as well. Therefore, decreasing costs during the manufacture is imperative. In this way, Big Data analysis plays an important role in the fourth industrial revolution as it helps to find a low cost strategy for companies to be more competitive (Kagermann et al, 2013).
2.1 Big Data in Telecommunications and Information Technology (IT) 2.1.1
Fifth Generation (5G)
Big Data provides three capabilities for 5G (fifth generation) design. The first is full intelligence of the current network status, the second is capability of predicting user behavior, and the last one is capability of dynamitic associating the response to the network parameters (Imran et al, 2014). All these capabilities give many facilities and make more flexible Big Data analysis. It will have a big impact in mobile network, since the upcoming generation (5G) mobile communications is in the near future (Boccardi & Heath, 2014). Fifth generation technology will allow users to have very high inter access so that new applications and services will appear. The rate of generated data will be bigger and bigger. In order to give a good service to users through 5G, researches have had to investigate in Big Data sets and find useful information to enhance within the network technical parameter such as bandwidth, mobile network distribution, and mobile network architecture. With the advances of Big Data analysis in mobile networks, operators can have a deep insight when different events occur in the network. Thus, correlations can be determined to be within these events, and they will help in resources optimization and operational costs. Considering QoE (Quality of Experience), there are important challenges for 5G standardization and the way how to enhance the services and make a more efficient network than 4G (Zheng & Yang 2016).
19
Mobile network operators are looking for the best way to adjust traffic requirements and optimize resources allocations as it was addressed before. All these activities are performed by the intelligent use of Big Data. Nowadays all this data has been collated from different layers of 4G networks. The type of data collated from these networks is very complex; it has features like multisource data, high volume, unstructured, and real time (Wu et al, 2014). Thus, a suitable Big Data analytics is needed to extract the necessary information to design appropriate schemas for network optimization. All this analysis does not help only the networks provider but also it helps to improve the services to the customers.
2.1.2
Mobile Networks
Mobile networks have become an important source of Big Data and nowadays more and more researchers are focusing on this field. There are important advances of wireless technologies and new applications (Liu et al, 2014). Millions and millions of people are in social networks like Facebook, Google+, Twitter, etc. In 2014 the number of mobile devices around the world was more than its population (Cisco, CVNI, 2015). People are posting comments, pictures, videos, even their position or some favorite places. Therefore, a huge amount of data can be analyzed by many companies and get insight about people behavior and their tendency. Thus, Big Data analysis is important in this field since it incorporates aspects such as algorithms, methods, technologies, software, and hardware, for collecting data and analyze it in real time (Parwez et al, 2017). According to (He, & Fei 2016), in the past, the data generated by mobile networks had not importance for researchers. However, with the development of Big Data analysis, and its applications over this field, many important patterns were found. As a consequence, it is worth to use all this important information to improve the processes and services within mobile networks thus maximizing their revenue. According to (Kulcu et al, 2016) social network analysis has two challenges; the first is to process very large datasets in a reasonable time, and the second one is to integrate several distinct datasets into a new larger one that is semantically consistent. It is important to emphasize that there are two types of data in mobile networks. The first one has to do with all the data collected from users; it means their behavior, tendencies, and preferences. The second one is all the data collected from the networks devices; it means all technical data regarding telecom devices. When mobile network operators have the result of the Big Data analysis, they can make predictions over their physical network. This point is relevant since they can prepare in advance new links, new equipment, or increase the bandwidth of a certain channel. All these issues avoid future failures in the normal network performance. Even in the case of troubleshooting, the solution for a particular problem can be found immediately. Regarding the literature of (Baldo et al, 2014), Big Data offers many solutions in this field: data analysis in end-to-end visibility of the wireless network, data analysis in self coordination amount network function entities, data analysis enables assessment of long term dynamics network, data analytics builds faster and proactive network, data analytics energy efficient network, and data analytics would enable unified performance evaluation. On the other hand, just regarding the data collected from users as it was addressed before, allows understanding user’s behavior. By this analysis, it is possible to find patters and new
20
models can be created for traffic systems. It brings good benefits for cites, because across the year the number of vehicles increases exponentially and giving an optimum solution in this field is needed. For companies which are interested in developing new traffic modeless they have the opportunity to analyze data in this type of networks. Moreover, it could be better for transportation industry to integrate a mobile network with their transport systems and good strategies can be found to enhance traffic in cities. In other words, it should be a correlation between mobile networks and transport systems. Even a real time monitoring can be performed in traffic systems through mobile networks and it could help to take real time decisions.
2.1.3
Analysis in Network Traffic
Due to there are many applications and millions of websites around the word, the traffic within the Internet has increased dramatically. In networking, traffic measurements are an important aspect, since they help how to determine the bandwidth channels between telecom devices. If there is no appropriate bandwidth plan, networks can present problems such as packet loss, high latency times, and some applications cannot work properly. Hence, a suite table model should be implemented to avoid these problems. Although current telecom devices have better capabilities and can provide channels with a high bandwidth, optimizing the bandwidth utilization will save economical resources and it is possible to increase the network performance. There are some networks traffic measurements based on centralized methods, but the physical limitations in hardware bring some problems and it is difficult to satisfy network requirements (Zhou et al, 2016). Fortunately, with the development of cloud computing it is possible to do traffic measurements by distributed processing. In this case, Hadoop is widely used in this field because its high performance, due to networks need high operation availability and Hadoop can provide all reliable features such as fault tolerance and high scalability.
2.1.3.1 Traffic Measurement If Map-Reduce algorithms were implemented in a conventional way, most of the time they could not face with their load; therefore, (Zhou et al, 2016) propose a new network traffic measurement using sampling Map-Reduce algorithms. Through this model it is possible to deal with real data sets even if they obey skew distribution. The way how the algorithms work is to execute a sampling process through a Map-Reduce job to estimate the flow and the data distribution. The data partition strategy is designed by these parameters. When the data partition strategy is done, Map-Reduce job distributes the network traffic in different nodes so it is possible to have a load balancing and the Big Data analysis can be performed. Hence, different features can be found, like traffic predictions, the most commons sources and data destinations, DDoS tacks, among others. Big Data is helping to ISP (Internet Service Provider) to improve their networks, and refuting the troubleshooting time. Regarding the scalability, there are several advantages, but the most important the possibility to have better bandwidth estimation for a new network installation.
21
2.1.4
Internet of Things (IoT) technology
According to (Jin et al, 2014) literature, in 2050 the 70% of the world population will live in cities. Consequently, new challenges will appear for different sectors. Public sectors like water supply, electricity, transportation, and health, should deal with this problem. Despite the fact that nowadays some cities are becoming “smart cities�, there are many things to be considered before becoming a real smart city. Governments want to have a better control of natural resources; they are looking for better life conditions for their citizens. Bugged optimization is another important issue; it may even be done by different policies. All these areas can be addressed by an important technology which will help to optimize all the processes within a city; this technology is called IoT (Internet of Things). IoT has emerged as a new trend in the last few years, and allows connecting multiple devices within a common network, which in most cases can be the Internet. Devices like sensors, actuators, and others, can be installed in different elements or other electronics devices. Through these sensors, it is possible to monitor and control several functions of the devices to which they are connected. In others words, it is possible to have an integral management of them, but the most fascinating thing is that all this management can be possible done through the Internet. According to (Atzori et al, 2010) over 50 billons of devices such as smartphones, laptops, sensors, game consoles, and others, will be connected to the Internet. Hence, IoT gives a platform for these devices to communicate each other within a smart environment and enables to share information in a convenient manner (Marjani, et al., 2017). IoT is also related to industry 4.0, it helps factories with a rapid product develop flexible production and complex environment. The age of smart factories is coming, intelligent and customized products can be manufactured according to customers preferences in a short period of time (Kagermann et al, 2013). IoT gives flexible solutions for the automation pyramid, self-controlling systems are the responsible to perform the automation control, hence this process generates big datasets (Vyatkin et al, 2007). It is worth to analyze these datasets in order to better understand how the system works and find new strategies for supply chain process. The big amount of data generated by all these devices connection to the Internet will generate a big amount of structured and unstructured data. This sharp increase converges once again in a big data analysis. Nevertheless, the data gathered by IoT have different features comparing to traditional big data because of data generation, data interoperability, and data quality (Gubbi et al, 2013). Interpretation of big datasets from IoT is a challenge because the data sources are ubiquitous, the transmitted data is noisy, is heterogeneous, and spatiotemporal dependent (Lelwala, 2016). In order to perform an appropriate Big Data analysis in IoT, there are five important processes to follow: IoT big data, aggregation, classification, storage, and analysis. Hence, Big Data analytics over IoT systems involves the process of searching a data base, mining, and analyze data to improve company performance (Kwon et al, 2014). According to the requirements of IoT applications, there are four analytics systems (Chen & Zhang, 2014) described below.
22
2.1.4.1 Real-time analytics It is performed in data collected from sensors. In this case, data change constantly, hence rapid analysis techniques like parallel processing are needed to achieve reliable results (Pfaffl, 2001).
2.1.4.2 Off-line analytics This type of analytics system is used when a quick response is not required. In this case, Hadoop is a good option since it performs an off-line analysis and it reduces the cost of data conversion formant (Cheng et al, 2004).
2.1.4.3 Memory-level analytics If the memory of the cluster is enough for the data size, this method works perfectly. Even a real time analysis can be performed in this case (Jourdan et al, 2008).
2.1.4.4 BI analytics These systems are required when the size of data is bigger than the capacity of the memory. In this case, data can be imported to the BI analysis environment. It is also easy to interpret for decision taking. Regarding the architecture of IoT for Big Data analysis, (Gubbi et al, 2013) literature addressed an architecture with cloud computing at the center and a model of end-to-end interaction among many stakeholders in a cloud-centric computing. There are many architecture schemes according to different fields, but the most important thing is to implement a good architecture which allows users to understand the integration of enterprise architecture management with IoT. There are many fields in which IoT can be implemented, for instance: smart metering, smart transportation, smart supply chains, smart traffic light system, smart grid, and smart agriculture. Hence, it is essential to think that for each scenario it is necessary to consider specific parameters regarding IoT architecture, and the way how to do analytics on it. Because through IoT implementation in smart cities there will be a big impact on the economy and the way how people live. In Table 3 there is a summary about the Big Data analysis in Telecommunication and Information technology over the years, and in Figure 6 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field. Table 3 Big Data in Telecommunications and Information Technology (IT)
No.
Author (s)
1
Imran, Ali; Zoha, Ahmed; AbuDayya, Adnan
2
Batty, M; Axhausen, KW; Giannotti,F;
Applicati Name on SON in Challenges in 5G: 5G how to empower SON with big data for enabling 5G Smart Smart cities of the cities future
Location
Algorithm
Year
NaN
Map-Reduce
2014
United Kingdom
Map-Reduce
2012
23
3
Pozdnoukhov, A; Bazzani, A; Wachowicz, M; Ouzounis, G; Kowalczyk M, Buxmann P.
BI&Asupported decision processes . Social Network Analysis
4
Sercan Kulcu, Erdogan Dogdu, A. Murat Ozbayoglu
5
Yanbin Guo, mobile Jianzhong Zhang, big data Yu Zhang
6
Atanas Radenski, Todor Gurov, Kalinka Kaloyanova, Nikolay Kirov, Maria Nisheva, Peter Stanchev, and Eugenia Stoimenova Majid AhmadI, Tahir Rashid, Durgesh Kumar Mishra Mauro Femminella, Matteo Pergolesi, Gianluca Reali
Systems, Applicati ons, and Platforms
Ibrahim Abaker Targio Hashem, Ibrar Yaqoob, Nor Badrul Anuar, Salimah Mokhtar, AbdullahGani, Samee Ullah Khan
Cloud computin g
7
8
9
Wireless Technolo gy Cloud Services
Big Data and Information Processing in Organizational Decision Processes A Survey on Semantic Web and Big Data Technologies for Social Network Analysis An algorithm for analyzing the city residents’ activity information through mobile big data mining Big Data Techniques, Systems, Applications, and Platforms: Case Studies from Academia
Germany
Map-Reduce
2014
Turkey
Map-Reduce
2016
China
Map-Reduce
2016
Bulgaria
Map-Reduce
2016
Impact of Wireless Technology on Future of Big-data Industry IoT, Cloud Services, and Big Data: A Comprehensive Pricing Solution The rise of “big data” on cloud computing: Review and open Research issues
SaudiArabia
Map-Reduce
2014
Italy
Genetic/Map- 2016 Reduce
Malaysia
MapReduce
2014
24
Figure 6 Number of researchers per year researches in Telecommunications and Information Technology (IT)
2.2 Big Data in Government, Health, and Education 2.2.1
Healthcare
Due to the increasing development of Wireless Sensor Networks (WSNs) and mobile networks, more and more hospitals and physicians are able to monitor both indoor and hometreated patients through IoT devices (Islam et al, 2015). These devices collects data such as body temperature, blood pressure, heart rate, sugar levels, etc. (Sahoo et al, 2016). Hence, Big Data analytics in healthcare should process all this data according to medical parameters. In other words, Big Data processes all the electronic data generated in different medial fields. Through Big Data analysis and tools methods, the procedure can be defined to process medical data. In literature (Raghupathi & Raghupathi, 2014), USA healthcare system generated 150 exabytes in 2011 and nowadays it has reached the zettabytes scale. This in turn means that a big amount of data is generated around the world regarding healthcare, and Big Data analysis provide a wide range of facilities for healthcare industry. In case of Harvard Business Review, when Big Data analysis was adopted by them, they led to the simplification of IT (Harvard Business Review, 2014). Consequently, the treatments for patients were improved; and therefore better preventive care and personalized treatment were developed. Moreover, in literature (Jonnagaddala et al, 2016), there are a lot of potential applications in healthcare such as pervasive health, pharmaceutical discoveries, fraud detections, and clinical decisions.
25
2.2.1.1 Pattern Recognition According to (Olaronke & Oluwaseun, 2016), Big Data analysis helps to discover patters which helps in the process of diagnostic and treating patients. For diagnosing and treating patients, there are many data to collect that comes from different sources: Machine Generated Data, this type of data is generated by sensors, smart meters and wearable devices and others. Biometric Data, these data is obtained from individuals’ physical characteristics like genetics, heart rate finger prints, blood pressure, and x rays, etc. Human Generated Data, this type of data is unstructured data and is generated by human beings, for instance, clinical data, summaries, e-mail, and admission records. Transactional Data, these data is related to billing records. Behavioral Data, this data come from social interactions and some tools for communication like social networks and websites. Epidemiological Data, these data is related to health surveys, disease registries, and statistical data. Publication Data, these data come from medical research and reference materials. Regarding the architecture for Big Data analysis in healthcare, there are three important layers to consider (Zhang et al, 2015). The first is data collection layer; this layer consists in all the sources which generate data, like human beings and medical devices. The second is data management layer; in this layer there are involved processes such as storing and processing data. In case of storing data there are distributed file storage functions, and in case of processing data there are distributed parallel functions which perform these processes and provide a suitable analysis method. Enormous advantages appear after applying Big Data analysis in healthcare, namely making difference in taking decisions, fraud detection, costs effective policy, population health management, disease surveillance, clinical decision, and patient care diagnostic systems (Jangade & Chauhan, 2016). There are some initiatives to utilize the potential of Big Data analysis in healthcare (Raghupathi & Raghupathi, 2014), such as: Combating the flu, used by the Centers for Disease Control and Prevention (CDC), the aim is to prevent the influenza. All the information about the patients who are infected is collected by the CDC and after a suitable Big Data analysis the information is evaluated by experts. Through this information it is possible to determine the spreading of the disease across the city or even a nation. Moreover, it is likely to predict the possible infected people and therefore medicine and vaccines can be send to a certain place in advance (Bort, 2013). To detect how the flue is spreading in a given place, there is an almost real-time tool developed by Google named Google Flu Trends. By using this tool, it is possible to track all the individuals who are infected and predict the possible infected people. Hence, Google Flu is a good example of Big Data analysis to predict tendencies and calculate related predictions. Aetna and GNS Healthcare, Aetna is a healthcare insurance company, and GNS Healthcare is a Big Data analysis company. Both companies are analyzing different
26

parameters, namely high blood pressure, large waist size, high triglycerides, high blood sugar, and low High-density Lipoprotein (HDL). If they find any combination of these parameters, they can have a conclusion about the health of a particular patient. Big Data Analytics for Diabetes, through Big Data analysis it is possible to offer a better treatment for patients. The data collated from insulin pens, which determine the amount of insulin needed for a patient in a certain time, is important for doctors to identify problems or to tweak dosages if it is necessary (Stephanie, 2013).
In Table 4 there is a summary about the Big Data analysis in Health Care over the years and in Figure 7 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field. Table 4 Big Data in Health Care
No.
Author
Application Name
Location Algorithm
Year
1
Islam, SM Riazul; Kwak, Daehan; Kabir, MD Humaun; Hossain, Mahmud ; Kwak, KyungSup
The internet of hings for health care
The internet of things for health care: a comprehensive survey
India
Map-Reduce
2015
2
Sahoo, Prasan Kumar; Mohapat ra, Suvendu Kumar; Wu, Shih-Lin
Prediction for Future Health Condition
Analyzing Healthcare Big Data With Prediction for Future Health Condition
Taiwan
Map-Reduce
2016
3
Raghupa thi, Wulliana llur;
Promise and Big data analytics in healthcare: promise and potential potential
Ethiopia
Map-Reduce
2014
27
Raghupa thi, Viju 4
Jonnagad Electronic dala, health Jitendra; records Dai, H and Ray, Pradeep; Liaw, S
Mining electronic health records to guide and support clinical decision support systems
Australia
Map-Reduce
2016
5
Zhang, CyberYin; Qiu, physical Meikang system ; Tsai, ChunWei; Hassan, Moham mad Mehedi; Alamri, Atif
Health-CPS: Healthcare cyber-physical system assisted by cloud and big data
China
Map-Reduce
2015
6
Jangade, Rajesh; Chauhan, Ritu
cloud computing for healthcare
Big data with integrated cloud computing for healthcare analytics
India
Map-Reduce
2016
7
Bort, Julie
Cdc
How the cdc is using big data to save you from the flu
United States
Map-Reduce
2013
8
Stephani e Baum
remote monitor embedded in insulin pen caps
A remote monitor embedded in insulin pen caps could help personalize diabetes treatment
United States
Map-Reduce
2013
9
David W. Bates, Suchi Saria, Lucila
Using Analytics To Identify And Manage High-Risk
Big Data In Health Care: Using Analytics To Identify And Manage High-Risk And High-Cost Patients
United States
Map-Reduce
2014
28
OhnoMachado , Anand Shah and Gabriel Escobar
And HighCost Patients
10
David Traps in Big The Parable of Google Lazer, Data Flu: Traps in Big Data Ryan Analysis Analysis Kennedy , Gary King, Alessand ro Vespigna ni
United States
Map-Reduce
2014
11
Van-Dai Ta, ChuanMing Liu, Goodwill Wandile Nkabind e Iroju Olaronke , Ojerinde Oluwase un Mario Bochicch io, Alfredo Cuzzocre a, Lucia Vaira Xiaohua Feng, Babatun de Onafeso, Enjie Liu A. S. Panayide
12
13
14
15
Healthcare Real-Time Analytics
Big Data Stream Computing in Healthcare Real-Time Analytics
Taiwan
Map-Reduce
2016
Big Data in Healthcare
Big Data in Healthcare: Prospects, Challenges and Resolutions
Nigeria
Map-Reduce
2016
Big Healthcare Data
A Big Data Analytics Framework for Supporting Multidimensional Mining over Big Healthcare Data
Italy
Map-Reduce
2016
Big Data Healthcare Security
Investigating Big Data Healthcare Security Issues with Raspberry Pi
United Map-Reduce Kingdom
2016
Video Analytics in
The Promise of Big Data Technologies and
Mexico
2016
Map-Reduce
29
16
17
18
19
s, C. S. Pattichis, and M.S. Pattichis Weider D. Yu, Avinash Chander Gottumu kkala, and others.
Healthcare
Service Computing HealthCare
Big Data Satwik Analytics In Sabharwal Healthcare , Industry Samridhi Gupta , Thirunavukkaras u. K Fuad Healthcare Rahman, Application Marvin s Slepian, Ari Mitra Yin Healthcare Zhang, CyberMeikang Physical Qiu, System ChunWei Tsai, and others.
Challenges for Image and Video Analytics in Healthcare Big Data Analytics in Service Computing HealthCare
United States
Map-Reduce
2016
India
Map-Reduce
2016
A Novel Big-Data Processing Framwork for Healthcare Applications
United States
Map-Reduce
2016
Health-CPS: Healthcare Cyber-Physical System Assisted by Cloud and Big Data
Saudi Araabia
Map-Reduce
2017
Insight Of Big Data Analytics In Healthcare Industry
30
Figure 7 Number of researchers per year researches in Health Care
2.2.2
Education
The application of Big Data analysis in education is relatively new, and it is improving the teaching and learning experience. According to (Pang et al, 2014), there is a tool for elearning named Massive, Open, Online Course (MOOC). It is a web-based online course which is different than the conventional e-learning systems and is a revolutionary in different educations fields. Conventional e-learning’s systems have some limitations, but MOOC allows having large audiences. Moreover, there is not need to have prerequisites to participate in courses, and it is accessible via Internet. In (Demchenko et al, 2014) literature, there were 1.7 millions of trainees for different subjects per semester. Regarding this example, a lot of data is going to be created by trainees that are files such as video contents, quizzes, documents, finals exams, and projects, among others. In order to create all these documents and information according to the population, Big Data analysis is performed over social networks and personal information. By this analysis it is possible to improve learning activities, change the course contents, and create new courses (Daradoumis et al, 2013). The primary objective of MOOC is about the dropout rate of students. Researchers are working to understand why students fail to finish the program. Regarding the analysis under e-learning framework, if a student does not have any activity in the e-leaning platform for a week, the system sends the teachers a warning. The teachers have to send some emails to remember the activity that the student has to do (Liang et al, 2016). Big data provides several advantages like dropout and quit analysis, course trends, learning effectiveness, recruitment strategy, and employment flow graduates prediction (Yang & Huang, 2016). In Table 5 there is a summary about the Big Data analysis in Education over the years and in Figure 8 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field.
31 Table 5 Big Data in Education
No.
Author
1
Application
Name
Location
Algorithm
Year
Pang, MOOC Data Yanxia; Wang, Tong; Wang, Na
MOOC Data from Providers
China
Map-Reduce
2014
2
Demch enko, Yuri; Grueng ard, Emanu el; Klous, Sander
Instructional model for building effective Big Data curricula for online and campus education
Nederland s
Map-Reduce
2014
3
Yang, Education Stephen Cloud JH; Huang, Chester SJ
Taiwan Digital Learning Initiative and Big Data Analytics in Education Cloud
Taiwan
Map-Reduce
2016
4
Reyes, Jacquel een A.
The skinny on big data in education: Learning analytics simplified
United States
Map-Reduce
2015
5
John P. Academic Campb Analytics ell and Diana G. Oblinge r
Academic Analytics
United States
Map-Reduce
2007
6
Wolfga ng Greller and Hendri
Translating Learning United into Numbers: A States Generic Framework for Learning Analytics
Map-Reduce
2012
effective Big Data curricula
The skinny on big data in education:
Generic Framework for Learning Analytics
32
k Drachsl er 7
Samira ElAtia, Donald Ipperci el and Ahmed Hamma d
8
Canada
Map-Reduce
2012
Elena Big Science Aronov and Big Data a, in Biology: Karen S. Baker and Naomi Oreskes
Big Science and Big United Data in Biology: From States the International Geophysical Year through the International Biological Program to the Long Term Ecological Research
Map-Reduce
2010
9
Richard Big Data in Abel Digitized Newspapers
The Pleasures and Perils of Big Data in Digitized Newspapers
United States
Map-Reduce
2013
10
Hal R. Varian
Big Data: New Tricks for Econometrics
Big Data: New Tricks for Econometrics
United States
Map-Reduce /Random Forest
2014
11
Jiajun Liang, Jian Yang, Yongji Wu, Chao LI, Li Zheng Shaoyi ng Li, and Jun Ni
Dropout Prediction in Edx MOOCs
Big Data Application in Education: Dropout Prediction in Edx MOOCs
China
Map-Reduce
2016
Big-Dataenhanced Higher Education Systems Reforming
Evolution of BigData-enhanced Higher Education Systems
China
Map-Reduce
2015
Reforming Education
Canada
Map-Reduce
2016
12
13
Said
Implications and Challenges to Using Data Mining in Educational Research in the Canadian
Implications and Challenges to Using Data Mining in Educational Research in the Canadian
33
14
15
Rabah Azzam, Ylber Ramad ani MangC hen, Chao Huang Laura Lenz, AndrĂŠ Pomp
Education Sector through Big Data
Sector through Big Data
Big Data in education
The Positioning and Construction of Education Ecosystem Base on Big Data How will The Internet of Things and Big Data Analytics impact the Education of Learning-Disabled Students?
Education of LearningDisabled Students
China
Map-Reduce
2016
Germany
Map-Reduce
2016
Figure 8 Number of researchers per year researches in Education
2.2.3
Government Sector
Nowadays, public organizations are in a new era, because they have to handle huge amount of data. All the countries around the word need to process data and take decisions even in real time. In the case of the U.S. government in collaboration with IBM, they develop an infrastructure for Big Data analysis according to their requirements (Kim et al, 2014). IBM has two important platforms for Big Data analysis, one of them is called IBM InfoSphere Stream, and the other is IBM Big Data. These two platforms are used by many government organizations for discovering patterns and getting insight from thousands of sources (Kim et al, 2014). In order to have a better idea about the amount of data managed by the U.S. government, according to (U.S. Government. Data.gov), in 2009 the government launched
34
Data.gov, it contains 420,894 datasets. All these data come from different sectors such as: healthcare, education transportation, economic, education, and human services. Big Data analysis helps to improve the e-government efficiency by applying analytical tools to get information of the data analyzed (Joseph & Johnson, 2013). Although through Big Data analysis it is possible to have a better view, the challenge is how to apply the information obtained in the real live to improve the organizational functions (McAfee et al, 2012), hence organization and its people should be trained to take advantage of it. The implementation of Big Data analysis in this sector, will depend on the current technological architecture, this architecture should be able to storage all the data from different sources. Therefore, all initial conditions should be considered in advance before a Big Data implementation. Moreover, security of information is a significant issue in this field, because of the high volume of data that should be transported from different departments. Hence, these data can be attacked and the risk of fraud will increase. Organizations should be able to recognize attacks as soon as possible and take all preventions to avoid these problems. Big Data analysis has new analytical models to prevent it, and some access behavior or suspicions patters can be found (Motau & Kalema, 2016). All in all, there are many new opportunities for Big Data analysis in e-government, which will be an important advance in public sector to serve citizens. In Table 6 there is a summary about the Big Data analysis in Government over the years and in Figure 9 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field.
Table 6 Big Data in Government
No.
Author
Application
Name
Location Algorithm Year
1
Kim, GangHoon; Trimi, Silvana; Chung, JiHyong
IBM InfoSphere Stream, and IBM Big Data
Big-data applications in the government sector
United States
MapReduce
2014
2
Joseph, Rhoda C; Johnson, Norman A
Goverment Servicies
Big data and transformational government
United States
MapReduce
2013
3
Motau, Mokgadi; Kalema, Billy Mathias
Big Data Analytics readiness
Big Data Analytics readiness: A South African public sector perspective
South Africa
MapReduce
2016
35
4
David W. Nickerson and Todd Rogers
Political Campaigns and Big Data
Political Campaigns and Big Data
United States
Random forest
2014
5
Chad Squitieri
Confronting Big Data
Confronting Big Data: Applying the Confrontation Clause to Government Data Collection
United States
MapReduce
2015
6
James T. Graves, Alessandro Acquisti and Nicolas Christin
Big Data and Bad Data: On the Sensitivity of Security Policy to Imperfect Information
Big Data and Bad United Data: On the States Sensitivity of Security Policy to Imperfect Information
MapReduce
2016
7
MĂŠl Hogan and Tamara Shepherd
Information Ownership And Materiality In An Age Of Big Data Surveillance
Information United Ownership States And Materiality In An Age Of Big Data Surveillance
MapReduce
2015
8
Annarita Ricci
EGovernment, transparency and personal data protection.
E-Government, Italy transparency and personal data protection. A new analysis’ approach to an old juridical issue
MapReduce
2016
9
Catherine Enoredia Odorige
EGovernance and the Nigerian tax administrative system
E-Governance and the Nigerian tax administrative system
MapReduce
2016
Nigeria
36
10
Mihai Grecu, Ilie Costas and Artur Reaboi
EGovernment services in Moldova: Value and opportunities
E-Government services in Moldova: Value and opportunities
Moldova
MapReduce
2016
Figure 9 Number of researchers per year researches in Government
2.3 Big Data in Electric Power Systems In power industry the numbers of communication devices and sensors have increased in sub-stations and massive measurements which are continuously collected have become enormous big datasets (Gandomi & Haider, 2015). There are some new technologies for analyzing the data collected from power system but they are not able to face with massive volumes and heterogeneous data (Popovic & Kezunovic, 2012). A new tool for data management is needed for power system, decision-making and processes the growing demands. Big data analysis in power systems give facilities, and allow to prevent faults within the electric network and the power industry has an important interests in big data analysis associated with this field. In power systems data recorded from field Supervisory Control and Data Acquisition (SCADA) devices. According to (Guo et al, 2016), the sampling interval in 3 seconds the amount of SCADA data at 10000 inspection points can reach 1.03TB. For instance Phasor Measurement Units (PMU) and smart meters are collecting 100 samples per second and easily the amount of 495 TB yearly. Many companies in this field have to deal with this problem, in (Guo et al, 2016) literature, Pacific Gas and Electric Company of USA collects more than 3 TB power data from 9 million smart meters across the state grid. Big Data analysis is applied in power grids to support the network management and monitoring the transmissions and distribution systems. Within a power grid there are
37
heterogeneous data such as: equipment status, environmental data, power quality data, historical data, real time data and others. By the analysis of this data, the Big Data system can provide facilities like, decision making, better maintenance techniques, system evaluation and failure prevention. In Figure 10 it is shown the four main modules within big data system: Data acquisition subsystem, Big Data Analysis subsystem, Decision Making Assistant subsystem, and Information and integration subsystem.
Figure 10 Big Data processing and analysing platform for electric power system condition monitoring (Guo, Feng, Li, et al, 2016).
Apache spark is one of the Big Data software used in this field; some computing libraries should be integrated like: such as numerical computing tool NumPy, science computing tool SciPy, data analysing library Pandas, scalable machine learning library MLlib (Meng et al, 2016). In order to have an effective power grid decision making good performance analytics methods are needed. A high performance cloud computing techniques are needed to become a Smart Grid. According to (Meng et al., 2016), there are many advantages of Big Data analysis in power distributions grids. Better flexibility of power consumption. Increase reliability, safety improvements, and efficiency of power distribution networks. Big real time information allows power outage anticipation. Gives facility for renewable energy sources implementation. Better utilization of the existing equipment. Reduce the operational costs. Facility to determine the root causes of failure. Allow a better vision for future events. Planning and better strategies to achieving energy efficiency.
38
In Table 7 there is a summary about the Big Data analysis in Electric Power Systems over the years and in Figure 11 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field. Table 7 Big Data in Electric Power Systems
No.
Author
1
Popovic, Tomo; Measures of Kezunovic, value: data Mladen analytics for automated fault analysis
2
Guo, Yuanjun; Feng, Shengzhong; Li, Kang; Mo, Wenxiong; Liu, Yuquan; Wang, Yong
Big data processing and analysis platform for condition monitoring of electric power system
3
The-Hien Dang-Ha, Roland Olsson, and Hao Wang
The Role of Big Data on Smart Grid Transition
4
Ai Minghao1, Ge Xianjun1, Wang Xiaohui1, Li Zhihong1,Chen Naishi1, Pu Tianjiao, Xu Zhiheng, Yuan Fei W. Alves, D. Martins, U.
A Big Data analysis based new method for power grid dispatch and control training simulation A Hybrid Approach
5
Application Name Measures of value: data analytics for automated fault analysis Big data processing and analysis platform for condition monitoring of electric power system The Role of Big Data on Smart Grid Transition A Big Data analysis based new method for power grid dispatch and control training simulation A Hybrid Approach
Location
Algorithm
Year
NaN
Map-Reduce
2012
China
Map-Reduce
2016
Norway
Map-Reduce
2015
China
Map-Reduce
2016
Brazil
Map-Reduce
2017
39
Bezerra and A. for Big Data Klautau1 Outlier Detection from Electric Power SCADA System 6
for Big Data Outlier Detection from Electric Power SCADA System Yuanjun Guo, Monitoring Big Data China Shengzhong of Electric Processing Feng, Kang Li, Power and Wenxiong Mo System Analysis Platform for Condition Monitoring of Electric Power System
Map-Reduce
2016
Figure 11 Number of researchers per year researches in Electric Power Systems
2.4 Big Data in Mechanical Engineering 2.4.1
Electric Vehicle Design
The last decade, the automobile industry has developed important solutions for car driving, and all the mechanical and electronic systems within Electronic Vehicles (EV) (Petit &
40
Shladover, 2015). These developments allowed manufacturing autonomous vehicles, which are equipped with advanced sensing, navigation devices, communication capabilities and computer vision, etc. (Sherif et al, 2016). Thus, these entire new characteristic in vehicles are a potential support for all the user and transportation systems since it can avoid crashes, reduce the travel time, assisting traffic flows, among others (Shchetko, 2014). There are several sensors installed in vehicles, which help in different functions of vehicles and, at the same time, provide a big amount of data for researches. By the analysis of data collected from vehicles, it is possible to improve all the systems inside. Using Big Data techniques in a proper way, all the function in the vehicle can be substantially improved. In the following section different applications of Big Data analysis in vehicles will be described. 2.4.1.1 Range Estimation for Electronic Vehicles Regarding the range estimation for EV in (Rahimi-Eichi & Chow, 2014), there are several parameters to analyze and this big amount of data has different levels of accuracy, relevance, and unstructured ways. Big Data analysis provides much better estimation for vehicle driving range. In order to give an optimum solution for this particular problem, data such as: state of charge of the battery, battery manufactures model, driving history, the model of the vehicle, GPS location, weather conditions, and traffic report are collected and categorized according to their properties within three groups. The first is Standard Data, this category includes data such as GPS position, weather conditions, and estimation of the driving time to the destination. The second is Historical Data, within this group there are parameters like mile per gallon, and the data from other people who did the same trip is collected. The third is Real Time Data, which includes data collected when an event occurs unexpectedly, for instance a traffic jam due to an accident. To develop a range estimation, all these data is collected and analyzed by a Big Data tool such as Hadoop (Zhang et al, 2012), The Figure 12 address a block diagram, it shows all the data required for the analysis and the modules which collect this data, and the electric vehicle with the battery model makes the prediction for the driving range.
Figure 12 Range estimation framework block diagram (Rahimi et al, 2014).
41
2.4.1.1.1 Data collection nodes 2.4.1.1.1.1 Route and terrain or route information Parameters such as driving distance, speed limit, traffic data, and the road terrain are needed to make the estimation. It is important to differentiate that these parameters are categorized according to their nature within the groups already explained. In the case of the speed limit, the terrain, and the distance, they can be categorized as standard data. Regarding traffic conditions, they are categorized like real-time data and historical data is a historical traffic data.
2.4.1.1.1.2 Weather data It is understandable that that weather conditions are considered as real-time data, yet the historical data of weather conditions it is also important, since they can help to make a more accurate driving range predictions. Several webpages provide high quality weather reports, weather data helps to calculate the temperature effect in the battery and make some adjustments in the battery parameters. Wind speed is a parameter used to calculate the energy consumption as the air circulates in opposite direction of the movement of the vehicle.
2.4.1.1.1.3 Driving behavior data This is the most difficult data to obtain, because it is straight related to human behavior, and the complexity of this data is how to categorize it. Driving Behavior data can be related to real time data, historical data, and standard data. In (Liaw & Dubarry, 2007) literature, they propose historical driving data and use some fuzzy logic techniques to predict the speed of the vehicle on the road. Currently, this method is widely used in transportation systems.
2.4.1.1.1.4 Electrical vehicle modelling data In Electrical Vehicle Modelling Data, in order to calculate the power consumption of a EV there are several physical parameters to consider such as acceleration of the vehicle, speed in different circumstances, road, and weather conditions (Larminie & Lowry, 2004).
2.4.1.1.1.5 Battery modelling data There are several types of models to represent a battery, the RC-equivalent circuit is one of the most popular used in electric vehicles battery model. All things considered, Big Data analysis in driving range for EV plays an important role by collecting different types of data from different sensors, and through a range of estimation algorithms it is possible to determine if it is probable to reach to the destination. It is a huge advantage for drivers since they can avoid several troubles on the road.
2.4.1.2 Predicting the Vehicle Slip There are thousands and thousands of car accidents, and one of the reasons is the car slip as a consequence of a slippery road. There are several physical factors which cause car slip, for instance, liner velocity of the wheels, rigidity modulus, and cornering force (Wang & Low, 2008). Depending on the influence of these three physical factors, different car slip can occur,
42
and it is difficult to understand the cause and effect on them (Trachtler, 2004). Moreover, there are big datasets in order to understand this phenomenon. In literature (Jeon et al, 2015), researchers are processing all this data and are performing Big Data analysis on predicting vehicle slip. Through this tool and using Map-Reduce algorithms it is possible to have a real-time analysis. There are several sensors for measuring the slip conditions from wheels, but the aim is that through the current and the historical measurements, a prediction system can be developed to avoid this problem by recovering the vehicle stability after a slip and thus enhancing the safety of the users. Figure 13 illustrates the system model for the implementation of this project. It is divided into four stages Data, Feature Extraction, Predict Vehicle Slip, and Control. In Data scenario, all the data generated by sensors are collected and stored in a data based named MongoDB. With the collected data and by the application of the MAP-Reduce algorithm, it is possible to determine the velocity of the vehicle, its position and others; this process is performed in a Feature Extraction scenario. In Predict Slip scenario, when the system has obtained all the parameters an EKF (Extended Kalman Filter) algorithm calculates the vehicle current split angle. In Control scenario, the system compares between the values of angle and the threshold, if the angle is over the threshold, the system automatically takes control of the vehicle. In contrast, if the angle value is less than the threshold the system continues with the calculation, and all the data obtained help to do more accurate calculations in the future. This is possible since Machine Learning techniques are involved in this system.
Figure 13 The Resarch Model (Jeon et al, 2015).
43
2.4.2
Analysis of Traffic Systems
Nowadays, Vehicular Networks have become more complex because the number of vehicles has increased dramatically, and there are many public transportation methods, such as subway, light rail tram, bus, etc. The necessity of roadway safety and traffic efficiency is a significant issue, and researches are working to enhance Intelligence Traffic Systems (ITS). There are many elements within a traffic system, namely traffic lights, speed controllers, sensors and cameras, etc.; all these elements should work properly to optimize the traffic network performance. Moreover, these elements are potential generators of data, and transportation companies are looking inside this data to acquire information to make a better ITS. On the other hand, with the implementation of IoT in traffic systems the rate of generated data will increase exponentially. Hence, exploiting Big Data analysis is a good solution for ITS, as patterns can be discovered to enhance the system. Society needs real time information, news, weather conditions, traffic condition in a certain place, thus having real time information regarding traffic vehicles it is a big advantage since real time decision making can be taken. According to (Yan et al, 2015), for analysis traffic there is one interesting method, transport companies through a GPS positioning records the relevant map locations and through any mobile technology tracks the status of the transport (Dobre & Xhafa, 2014). By this analysis, researches can evaluate road congestions and provide real time information for users. For traffic systems the most important aim is to avoid traffic jams. There are several strategies for traffic management systems such as: video information analysis, infrared sensors, remote sensors, inductive sensors, etc. (Knorr et al, 2012). All these strategies are a decent solution; however it is important to take all the main considerations according to technical conditions. Additionally, in order to simplify the big data analysis, it would be better to know about the data type generated by these sensors as it is better to have a uniform type of data. Beyond all the positive enhancements retarding network transport systems and efficiency that could have through Big Data analysis, road accident rate will decrease. A good perdition system can help to avoid accidents and save lives. In Table 8 there is a summary about the Big Data analysis in Mechanical Engineering over the years and in Figure 14 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field. Table 8 Big Data in Mechanical Engineering
No.
Author
Application
Name
Location
Algorithm
Year
1
Petit, Jonathan; Shladover, Steven E
Cyberattacks
Potential cyberattacks on automated vehicles
United States
MapReduce
2015
2
Shchetko,
Driverless
Laser eyes pose price hurdle for
United
NaN
2014
44
Nick
cars
driverless cars
States
3
Zhang, Yuhe; Wang, Wenjia; Kobayashi, Yuichi; Shirai, Keisuke
Remaining driving range estimation
Remaining driving range estimation of electric vehicle
Japan
MapReduce
2012
4
Liaw, Bor Yann; Dubarry, Matthieu
Battery performance
From driving cycle analysis to understanding battery performance in real-life electric hybrid vehicle operation
Hawaii
MapReduce
2007
5
Jeon, Joohyoung; Lee, Woosik; Cho, Hyo Joo; Lee, Hongchul
Vehicle slip
A big data system design to predict the vehicle slip
Korea
EKF (Extended Kalman Filter)
2015
6
David L. Butler, Steven A. Goldstein and Farshid Guilak
Functional Tissue Engineering: The Role of Biomechanics
Functional Tissue Engineering: The Role of Biomechanics
United States
MapReduce
2000
7
Rahul Sharan Renu, Gregory Mocko, Koneru
Use of Big Data and Knowledge Discovery to Create Data Backbones for Decision Support
Use of Big Data and Knowledge Discovery to Create Data Backbones for Decision Support
United States
MapReduce /KDD
2013
45
Systems
Systems
8
Hua Caia, Xiaoping Jiac, Anthony S.F. Chiud, Xiaojun Hue, Ming Xu
Siting public electric vehicle charging stations in Beijing using big-data informed travel patterns of the taxi fleet
Siting public electric vehicle charging stations in Beijing using big-data informed travel patterns of the taxi fleet
China
MapReduce
2014
9
Robin G. Qiu, Katie Wang, Shan Li, Jin Dong, Ming Xie
Real Time Capturing and Understanding of Electric Vehicle
United States
MapReduce
2014
10
JoĂŁo Soares, Nuno Borges, Bruno Canizes, Zita Vale
Probabilistic Estimation of the State of Electric Vehicles for Smart Grid Applications
Portugal
MapReduce
2015
11
Huang, KaiSheng. Huang, Jianye. Lin, RongSheng. Ren, Dakai
Energy Consumption Optimization
China
Mapreduce
2016
12
Ahmed B. T. Sherif, Khaled Rabieh, Mohamed Mahmoud, Xiaohui Liang Longhua
Ride Sharing Scheme for Autonomous Vehicles
Big Data Technologies in Support of Real Time Capturing and Understanding of Electric Vehicle Customers Dynamics Probabilistic Estimation of the State of Electric Vehicles for Smart Grid Applications in Big Data Context Energy Consumption Optimization of Vehicle Power System Based on Big Data PrivacyPreserving Ride Sharing Scheme for Autonomous Vehicles in Big Data Era
United States
Mapreduce
2017
A Secure
Japan
Map-
2017
13
A Secure
46
14
Guo, Mianxiong Dong, Kaoru Ota, Qiang Li, Tianpeng Ye, Jun Wu and Jianhua Li Meiqin Mao, You Yue, Liuche, Chang
15
Habiballah RahimiEichi, MoYuen Chow
16
Benjamin Baron, Prométhéé Spathis, Herve Rivano, Marcelo Dias de Amorim ChungHong Lee, Chih-Hung Wu
17
Mechanism for Big Data Collection in Large Scale Internet of Vehicle
Multi-time Scale Forecast for Schedulable Capacity of Electric Vehicle Fleets
Mechanism for Big Data Collection in Large Scale Internet of Vehicle
Multi-time Scale Forecast for Schedulable Capacity of Electric Vehicle Fleets Using Big Data Analysis Vehicle Range Big-Data Estimation Framework for Electric Vehicle Range Estimation Road map Vehicles as space big data reduction and carriers: Road efficient data map space assignment reduction and efficient data assignment
Battery Modeling
reduce
China
Mapreduce
2016
United Stated
Mapreduce
2014
France
Mapreduce
2014
Collecting and Taiwan Mining Big Data for Electric Vehicle Systems Using Battery Modeling Data
Mapreduce
2015
47
Figure 14 Number of researchers per year researches in Mechanical Engineering
2.5 Big Data in Business The Big Data technologies applied on business registered a fast growing, multimillion worldwide markets, and it is widely expanding among IT companies (Ashish & Vesset, 2014). The increasingly and social environment generate a vast amount of data and extracting information from it help governments and companies to make better predictions through suitable models (Hogarth & Soyer, 2015). Business activities are shaped by ITC and managers should be well informed about these entire tendency and business models to implement in their companies (Mosavi & Delavar, 2016). Nowadays business men are looking for find a better solution or enhance process within the supply chain management. The important factor to have a good performance in business, companies can become empowered to make choices consistent, effective and timely (Adeyeml & Mosavi, 2010). Big Data analysis applied in business brings new opportunities to achieve added value and competitive advantages. From the business point of view, working together with a coherent Big Data analysis allows companies to understand in an enhanced way the market, costumers, and improve internal process within the company. (Robak et al, 2016) According to (Marjanovic et al, 2015), there are significant advancements of business analytics such as: Management Information Systems, Decision Support Systems, Executive Information Systems, Interactive Online Analysis (OLAP), data mining, dashboards, and recently predictive analytics. The smart combination of these advancements with appropriate technologies will improve decision speed, data filtering, and aggregating data.
2.5.1
Supply Chain and Business Intelligence (BI)
Regarding the supply chain, there are many organizations and activities associated with the transformation of the raw materials in goods (Papazoglou, 2006). During this process, while companies are interacting with others looking for materials or services, they are also generating important data to be analyzed. The type of data that can be analyzed is geographic positions, linguistic, technological, and others. This data is obtained from social networks, that is why social networks are important for business, and as it was addressed before through
48
social network it is possible to understand consumer behavior. However, it can be a limitation as the data is gathered in real time, but there is an important advance in IT which avoids this problem. Communication plays an important role in business as well, because it makes easier many activities such as purchases, payments, bank transactions, looking up products, and marketing etc. Big data allows a dynamic business environment by the analysis of supply chain. On the one hand, by the analytics of different process involved in supply chain, companies can enhance their productivity. Companies can analyze, evaluate the market and its tendencies, in order to plan better strategies for launching a new product or service. All these issues are called Business Intelligent (BI). BI term means BI systems as automatic data retrieving and processing system which can allow taking intelligent decisions (Luhn, 1958). In (Nagar et al, 2016) literature, BI is designed to perform decision making. Thus, BI retrieves and processes data from various sources, this data can help to make intelligent decisions (Choi et al, 2017). BI allows to understand capabilities available in the firm, tendencies and decisions in the market technologies, and the environment in which firms competes (Negash, 2004). Indeed, BI is used to include the methods, measurements, and process to make easier the analysis, view, and making more understandable the results. These outcomes help to the current projects and those who will come in the future. According to (Data Science and Big Data Analytics, 2014), in BI there are six phases: data discovery, data preparation, modeling planning, model, execution, communicating results, and operationalizing the results. Some phases can be performed at the same time during the analysis. This is an interactive process, and consequently, sometimes it should check the previous phases or go ahead. Through this iterative process, the provision of the appropriate data and its quality is important when the result of this process is a key factor for decision making or an automated process. Furthermore, regarding BI tools, there some features that BI tools should have: they use data that is structured or unstructured, manage small datasets, feature text mainly, give interactive results in form graphs, tables and charts, a friendly user interface, and open source. Finally, business intelligent with Big Data is a fascinating and a crucial important field of research, therefore new innovative applications will appear.
2.5.2
Big Data and Banking Customers Analytics
Another important application for Big Data is for analyzing customer’s behavior. According to (Data, IBM What Is Big, 2012), the 90% of the data were created in the last two years and the amount of data stored by banks is growing up exponentially. By the analysis of this data allow banks to enhance its business and services. In the past banks used to analyze samples of their data bases in order to have reports and take the best decision for the future. Nowadays, banks have big amounts of information and by Big Data analysis they are available to have information in detail about its customers and the market. Hence, the aim of banks is to determine the customer behavior by their transactions. Customers behavior means, to understand customers preferences, to find customers with an acceptable spending potential and with high revenue. In (Sun et al, 2014) literature an Intelligent Customer Analytics for Recognition and Exploration (iCARE) is used to analyze customer’s behaviour with baking Big Data analysis. Through IBM architecture iCARE processes unstructured and structured data gets a deep
49
insight from all the data related to customers and generates a suitable view to understand better business scenarios. The advantage of iCARE system as the others who work with Big Data analysis is that works with parallel computing processes and has a high performance with low time response. iCARE also provides a scalable architecture by adding more parallel modules so it can handle bigger projects with a bigger amount of data. In Figure 15 is addressed the iCARE architecture and the principal phases such as data acquisition, data preparation, data storage, and data processing.
Figure 15 Architecture of ICARE Solution (Sun, Morris, et al, 2014).
There are several application of customers analytics and allow customers a prolongation of the life time within the bank. The most common applications are customer marketing, credit approval, profile credit card customers, and high-risk load customers identification. Through these applications the bank staff is available to identify loyal customers with high revenue and an accurate customer’s list can be obtained. Moreover using demographical data it is possible to created new services or takes better decision in the market and banks can have a better overview of their customers and competitors; hence, the bank profit can be increase. In Table 9 there is a summary about the Big Data analysis in Business over the years and in Figure 16 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field. Table 9 Big Data in Business
No.
Author
Application
Name
Location
Algorithm
Year
1
Ashish, N; Vesset, D
Big Data Technology and Services
Worldwide Big Data Technology and Services 2014--2018
China
MapReduce
2014
50
Forecast Using simulated United experience to Stated make sense of big data Business process Germany optimization with big data analytics under consideration of privacy
2
Hogarth, Robin M; Soyer, Emre
simulated experience
MapReduce
2015
3
Robak, Silva; Franczyk, Bogdan; Franczyk, Bogdan
big data analytics under privacy
MapReduce
2016
4
Marjanovic, Olivera; Ariyachandra, Thilini; Dinter, Barbara
Business Intelligence Minitrack
Introduction to Organizational Issues for Big Data, Business Analytics, and Business Intelligence Minitrack
United States
MapReduce
2015
5
Nagar, Parth; Atriwal, Labhansh; Mehra, Himanshi; Tayal, Sandeep
big data business intelligence tools
Comparison of generalized and big data business intelligence tools
India
MapReduce
2016
6
Meriem Amel Business GUESSOUM, intelligence Rahma DJIROUN, Kamel BOUKHALFA
Dealing with Decisional Natural Language WhyQuestion in Business Intelligence
Algeris
MapReduce
2017
7
Sun, N; Morris, JG; Xu, J; Zhu, X; Xie, M
Banking customer analytics
iCARE: A China framework for big data-based banking customer analytics
MapReduce
2014
8
Hsinchun Chen, Roger H. L. Chiang,
Business intelligence and analytics
BUSINESS United INTELLIGENCE States AND ANALYTICS:
MapReduce
2012
51
Veda C. Storey (BI&A) h
FROM BIG DATA TO BIG IMPACT
9
Jacques Bughin, Michael Chui, and James Manyika
Clouds, big data, and smart assets: Ten techenabled business trends to watch
Clouds, big data, United and smart assets: States Ten tech-enabled business trends to watch
MapReduce
2007
10
V MayerSchรถnberger, K Cukier
Big data: A revolution that will transform how we live, work, and think
Big data: A revolution that will transform how we live, work, and think
United States
MapReduce
2013
11
Sulin Pang, Yizhou He, Li Wang
The Intelligent Control Model and Application for Commercial Bank Systems Emergence under Risk Status Based on Big Data
China
MapReduce
2016
Spain
MapReduce
2014
Saudi Arabia
Mapreduce
2016
Lithuania Mapreduce
2014
12
The Intelligent Control Model and Application for Commercial Bank Systems Emergence under Risk Status A. Munar, E. Management Chiner, I. Sales Architecture for Global Banking
13
Nuha Almoqren, Mohammed Altayar
14
Dalia Kriksciuniene, Marius
A Big Data Financial Information Management Architecture for Global Banking Big Data The Motivations Mining for Big Data Technologies Mining Adoption in Technologies Saudi Banks Adoption in Saudi Banks Research of Research of customer customer behavior behavior
52
15
Liutvinavicius, Virgilijus Sakalauskas, Darius Tamasauskas N. Sun J. G. Morris J. Xu X. Zhu M. Xie
anomalies in big financial data
Banking customer analytics
anomalies in big financial data
iCARE: A framework for big data-based banking customer analytics
China
Mapreduce
2014
Figure 16 Number of researchers per year researches in Business
2.6 Big Data in Meteorology and Agricultural Science 2.6.1
Weather Forecasts
Weather is the most important aspect for humans and for all fields within the industry. The analysis of weather conditions to obtain more accurate weather forecasts is a challenge as it demands a complex analysis. According to (Haupt & Kosovic, 2015), weather forecasting is one of the most important computational challenges because of the amount of generated data and its complexity. In (Radhika et al, 2016) literature, a huge amount of data is generated everyday by weather forecasts organizations. This amount of data is between the terabytes and petabytes scale. By the installation of sensors around the cities, it is possible to collect weather parameters. The parameters which are collected by sensors are temperature, humidity, and wind speed. Due to the large amount of data to be analyzed, it could be a nightmare for meteorologists and therefore there is a need to find a scalable and accurate analysis tool to perform this complex activity (Ismail et al, 2016). Hence, Big Data analysis brings a revolution for weather forecasting. Meteorologists have deepened the use of Big Data analysis and algorithms to enhance the weather predictions. Although Big Data analysis is a
53
solution, a high performance computing should work together with it. For weather forecasting, big simulations should be run and super computers are used. This new way to make weather forecasting will revolutionize the Numerical Weather Prediction (NWP). The NWP is a numerical method which computer models used to predict the weather (Charney et al, 1950).
2.6.1.1 Big Data Techniques for Weather Forecasts There are wide methods to make weather forecasts, such as data fusion, cluster analysis, network analysis, machine learning, and others. In this section, there is a description of some of them.
2.6.1.1.1 Data Fusion Data Fusion combines various data sets into a big set, the super set contains patters which cannot find in the originals sets. According to (Zheng, 2015), there are three categories of data fusion methods. The first is semantic meaning-based data fusion; the second is stage-based; and the third is feature level-based.
2.6.1.1.2 Crowdsourcing In this technique the data is collated by several individuals who are not prepared to measurements. The data is exchanged and stored in a distributed computer environment. This technique is used in geospatial domain.
2.6.1.1.3 Cluster Analysis The aim in this technique is to separate the data in groups, the classification is done according to the nature of the data. In literature (Nejra et al, 2012), environmental features can be presented in different parameters such as: temperature, moistness, estimations, precipitations, daylight length, wind course, and speed. For atmosphere analysis, it is helpful because a lot of data is required for this analysis. Cluster Analysis is used in many applications and atmosphere data investigation.
2.6.1.1.4 Machine Learning Machine learning is a tool which is focused on theory, properties, execution of learning structures and standards (Radhika et al, 2016). Sometimes machine learning techniques are not suitable for a big volume of data but there are some relative new machine learning techniques as Distributed and Parallel Learning, which can be helpful in this field.
2.6.2
Agriculture
Nowadays data is generated by farms as well, due to there is a new way for farming called precision agriculture (Bendre et al, 2015). Within precision agriculture area, there are remote farming, satellite faming, and on-site farming. It is important to find new sophisticate farming ways since the word population is increasing and to supply this big demand is becoming a challenge for food industry (Cukier & Mayer-Schoenberger, 2013). In order to improve farming and maximize its productivity, there are two important aspects to be considered weather predictions and all the data gathered from farms.
54
Regarding the type of data collected from farms, there are structured and unstructured data. The different types of sensors generate complex data with a high variety. The data collected from sensors includes geographical position based on GPS, soil moistures, fertilizers rate, temperature, and equipment logs. Even though most of the data comes from the sensors installed in farms, there is historical data that is also considered like crop patters, soil testing, field monitoring, weather conditions, and others. There is extra important data which comes from some web sites, parameters such as customers feedback, some publications about practice guidelines, data related billing, and schedule systems. In (Awuor et al, 2013) literature, precision agriculture is other of the industry fields for Big Data analysis implementation that needs an architecture based on layers. There are three layers in this field specifically, store and processing layer, infrastructure layer and application layer. For store and processing layer a robust system such as cloud computing is needed and consider for this layer. In infrastructure layer, clustered network sensors and system are involved; in this layer functions like information management and accessibility are performed. In application layer, tools like web based solutions, data acquisition, and platforms for developing are involved. There are many big data functionalities used in many agricultural applications, if more farmers could access to this kind of technologies; they would make more accurate predictions or recommendations to others farmers. There are several advantages in this field by Big Data implementations, nevertheless the lack of information and resources are the most typical problems (McBratney et al, 2005). In Table 10 there is a summary about the Big Data analysis in Meteorology and Agricultural Science over the years and in Figure 17 is addressed the percentage of researchers per year, it is showing the growing tendency of Big Data analysis in this field. Table 10 Big Data in Meteorology and Agricultural Science
No.
Author
Application
1
Haupt, Sue Forecasting Ellen; Kosovic, Solar Power Branko for Utility Operations
2
Radhika, T V; Gouda, Krushna Chandra; Kumar, S
Big data research in climate science
Name
Location
Algorithm
Year
Big Data and Machine Learning for Applied Weather Forecasts: Forecasting Solar Power for Utility Operations
United States
MapReduce
2015
Big data research in climate science
India
MapReduce
2016
55
Sathish 3
Ismail, Khalid Adam; Majid, Mazlina Abdul; Zain, Jasni Mohamed; Bakar, Noor Akma Abu
weather Temperature based on MapReduce algorithm
Big Data prediction framework for weather Temperature based on MapReduce algorithm
Malaysia
MapReduce
2016
4
Nejra, Hadzimejlic; Dzenana, Donko; Nijaz, Hadzimejlic
Clustering Data Mining Techniques
Climate Data Analysis Using Clustering Data Mining Techniques
Bosnia and MapHerzegovina Reduce
2012
5
Awuor, Fredrick; Kimeli, Kimutai; Rabah, Kefah; Rambim, Dorothy
Ict solution architecture for agriculture
Ict solution architecture for agriculture
Kenya
MapReduce
2013
6
McBratney, Alex; Whelan, Brett; Ancev, Tihomir; Bouma, Johan
Future directions of precision agriculture
Future directions of precision agriculture
United States
MapReduce
2005
7
Bendre, M R; Thool, RC; Thool, V R
Precision agriculture: Weather forecasting for future farming
Big data in precision agriculture: Weather forecasting for future farming
India
MapReduce
2015
8
Doug Howe, Big data: The Maria future of Costanzo, biocuration Petra Fey, Takashi Gojobori, Linda Hannick,
Big data: The future of biocuration
United States
MapReduce
2008
56
and others. 9
M. Herrero, P. K. Thornton, A. M. Notenbaert1, S. Wood, S. Msangi, H. A. Freeman, and others.
Smart Investments in Sustainable Food Production: Revisiting Mixed CropLivestock Systems
Smart Investments in Sustainable Food Production: Revisiting Mixed CropLivestock Systems
United States
MapReduce
2010
10
QIN Xian-lin, YI Hao-ruo
A Method to Identify Forest Fire Based on MODIS Data
A Method to Identify Forest Fire Based on MODIS Data
China
MapReduce
2004
11
Takemasa Miyoshi, Keiichi Kondo and Koji Terasaki
Numerical Weather Prediction
Big Ensemble Data Assimilation in Numerical Weather Prediction Sue Ellen Weather Big Data and Haupt and Forecasts Machine Branko Learning for Kosovic Applied Weather Forecasts Takemasa Post-Petascale “Big Data Miyoshi, Guo- SevereWeather Assimilation� Yuan Lien, Prediction Toward PostShinsuke Petascale Satoh, Tomoo SevereWeather Ushio, Prediction: Kotaro Bessho, An Overview Hirofumi and Progress Tomita and others. Seungwoo Analysis of the Analysis of the Jeon, Bonghee effect of effect of Hong, weather weather Hyeongsoon determinants determinants Im on on lodging lodging demands demands using big data processing Xingang An Approach An Approach
Japan
MapReduce
2015
United Steates
MapReduce
2015
United Steates
MapReduce
2016
Korea
MapReduce
2015
China
Map-
2016
12
13
14
57
Wang, Zhigang for Extracting Gai, Suiping Big MicroQi Scale Severe Weather Region Trajectories Automatically from Meteorological Radar Data
for Extracting Big MicroScale Severe Weather Region Trajectories Automatically from Meteorological Radar Data
Reduce
Figure 17 Number of researchers per year researches in Meteorology and Agricultural Science
58
3
BIG DATA AND FUTURE CHALLENGES AND TRENDS
Big Data analysis involves several challenges and nowadays the research at the industry and laboratories are in the beginning phase. There is much work to do and big efforts are needed to improve different features in Big Data analysis. Moreover, there will be new fields in which a Big Data analysis will be the solution to tackle the problem and therefore Big Data will become an important issue everywhere. According to (Chen et al, 2014) there are several problems to be solved as it is addressed below.
3.1 Challenges 3.1.1
Fundamental Problems
The most significant problem is that there is not a rigorous definition of Big Data. Nowadays, there are several explanations of Big Data, the problem nevertheless is that most of them are somewhat commercial definitions rather than scientific. The root cause could be the nature of datasets, their complexity, and the high volume of data.
3.1.2
Standardization
The need of an evaluation standard for data computing efficiency and a system which can evaluate the quality of data to enhance Big Data features is an important issue both now and in the future. Although there are many good Big Data solutions, there is no capacity to measure the Big Data performance by mathematical algorithms. This performance is evaluated by the implemented system and it shows the results, however it is not possible to evaluate and compare them before and after the Big Data analysis. The Figure 18 illustrates a general architecture of Big Data analysis.
3.1.3
Big Data Computing Modes
Regarding computing modes, transferring data is a challenging aspect as there are several features that a network should offer for this type of applications. There are many features to analyze and transfer data within a network such as channels features, security, reliability, high network availability, and others. All these features should be ensured in order to avoid a bottleneck in this process (Song et al, 2017). In (Nasser & Tariq, 2015) literature, Big Data processes involve multiple phases such as data acquisition and capture, extraction of information and filtering, data integration, aggregation and visualization, query processing, data modelling, data analysis, and data interpretation and presentation. The disadvantage is that every phase has its own challenges and many difficulties to face (Jaseena & David, 2014). Due to the exponentially growing of data, there are still many challenges as it can be seen in Figure 19.
59
Figure 18 A general Big Data architecture (Chen et al, 2014).
60
Figure 19 Big Data Challenges
3.2 Big Data Development There are many technical aspects which have to be enhanced. There are issues concerning stream computing, grid computing, parallel computing, big data architecture, and others. In others words, there are technical hardware and software efforts to be made to face Big Data analysis (Bakshi, 2012).
3.2.1
Format Data Conversion
As it was addressed before, there are multiple sources for Big Data generation. Big datasets elements are heterogeneous, and this is an important aspect which does not allow a suitable efficiency in data format conversion. There is a need to create a suitable data format conversion in order to get more reliable information from big datasets.
3.2.2
Data Transmission
In (Andrejevic, 2014) literature, an inevitable data application is the transmission of big datasets, within this process there are others sub-process, namely data generation, data storage, and data acquisition. Big Data transmission demands a high cost because a suitable bandwidth is needed to optimize the time latency during the transmission. Regarding data storage, it is very important to prepare the software and hardware involved in this process. The aim is to have enough storage resources and even back up resources for recovering data in case of a failure.
61
3.2.3
Real-Time Analytics
Sometimes, when a real-time analytics is performed, the reliability of the information obtained it is not close to reality. This is the reason why this is a considerable drawback in different fields. Defining the life cycle of data and building computing models for real-time applications, will affect the information and the feedback result from the datasets.
3.3 Big Data Security According to (Zhang et al, 2017) security is one of the biggest concerns on transmission and storage data, since there are cloud services and different systems for processing important data. The exponential growing of datasets brings new challenges for data security and privacy protections. Due to the traditional data mining algorithms show sensitive information, Big Data analysis should be robust in order to avoid any information filtration. Hence, an efficient security system in Big Data analysis is still an open challenge to be developed. There are two challenges in Big Data Security; they are Big Data Privacy, and Data Quality.
3.3.1
Big Data Privacy
According to (Basso et al, 2016), within Big Data analysis there are two important issues to consider. The first is the protection of personal privacy since there are numerous critical personal data which can be obtained easily and the user cannot even realize about that. The second is that while transmission and storage of the data is performed, personal privacy can be leaked without the permission of the users. Having a better technology to improve personal privacy data security is essential nowadays and this is another challenge for Big Data.
3.3.2
Data Quality
The quality of data is an important aspect in Big Data analysis. If the quality of data is not good enough, resources will be wasted in the transmission and storage processes. There are several factors that affect the quality of data, for instance processes like generation, acquisition, and transmission could affect the quality of data if those processes do not have a high performance (Wright, 2004). The low quality of data produces problems such as unreliable results, low accuracy, and others. Hence, new algorithms should be created to effortlessly detect low quality data.
3.3.3
Big Data Encryption
The security of data depends on suitable technologies and data encryption. Data encryption is another important task within Big Data as the traditional encryption methods are not efficient at all (Li et al, 2007). According to law regulations, everybody should control his or her privacy data. In the new era of data, it could be easier to obtain confidential information and technology working together with the law to protect people’s identity.
3.4 Big Data and the New Thinking Due to the enormous advantages of Big Data analysis in several fields, Big Data will change the way of human thinking (Markus & Topi, 2015). People are going to utilize all the data available and make analytics instead of analyzing some samples. Therefore, more and more complex data will be input into Big Data analysis. In order to acquire a better row material for Big Data analysis, it is important to pay attention and look for new data sources.
62
Finally, a simple algorithm of Big Data analysis that is more powerful that a complex algorithm for small datasets.
3.5 Big Data Analysis in Managing Large-scale Flow-Table for Software-Defined Networking In recent years, software-defined networking (SDN) has been an important support to the world of the networks. Originally, OpenFlow was proposed to accelerate the innovations in networks, and there are many network elements having to do with it (McKeown, et al., 2008). Big Data applications in cloud computing and its combination with SDN are enormous advantages for troubleshooting and solving problems within networks. Complications as network efficiency, agility, scalability, flexibility can be solved in a relatively simple way. The challenge for Big Data analysis is to manage the large number of rules to process the network packets of a FlowTable. Consequently, the challenge is to implement SDN with Big Data techniques to enhance the process, storage, and FlowTable utilization.
3.6 5G Wireless Networks and Big Data With the develop of cloud computing and Big Data analysis there are new and better facilities for 5G wireless network design (Keshavamurthy & Ashraf, 2016). Similarly, SDN techniques will play an important role for network performance. According to (Chen et al, 2014), despite 5G wireless provides the opportunity to enable the mobility of Big Data and might also facilitate the efficient storage and analysis of the mobile data, there are some limitations like interoperability, lack of skills staff, and others.
63
4
DISCUSSION
As it was described in all the previous sections, Big Data analysis has enormous advantages comparing to the traditional methods. Moreover, more and more companies are seeking within Big Data new opportunities to improve its process and reducing the operational costs. There are important issues to consider improving the big Data performance, such as, hardware architecture, software and algorithms, staff knowledge or staffing skills. Regarding hardware architecture, although there are enough super computer which can deal with these ¨big¨ tasks, the complexity of it architecture could be a problem in this scale of processing. Furthermore, another important aspect is the transportation of the data, in some cases data is storage at the same place where the big data analysis is going to take place, but there are cases where the transportation of the data is needed. Hence, within this framework there are important parameters to considerer like, a suitable bandwidth, data security, a redundant channel for the data transmission, and the costs. It is not enough to have high quality hardware architecture, software and algorithms play an important role. Software like Hadoop and Apache Spark are commonly used nowadays, and working together with Map-Reduce algorithm the Big Data analysis can reach a high performance. There are some challenges for software and algorithms; one of them is to deal with the complexity of data. Duet to new application, services, and new fields which are interested in Big Data analysis as a solution, new kind of data sets are needed to analyze. Hence, complexity is an important aspect to consider for creating new software and algorithms. Finally, it is important to train the staff involved in Big Data analysis that is why companies should focus in staff knowledge or staffing skills as well. Although the Big Data system could deploy a clear result after the analysis the staff behind it should be trained in order to interpret those results and taking the right decision. Big Data analysis involves many aspects in all its phases, hence before of it implementation a previous survey is needed to determine all the requirements in order to have a suitable, flexible, scalable, accurate, reliable Big Data system. All these features are very important especially when a real-time decision making is needed.
64
CONCLUSION In this document a definition of Bid Data was addressed and how Big Data analysis was adopted in different fields. Nowadays, Big Data is an important tool to manage big data sets and in the future will become essential within companies because; they are continuously generated data and the information obtained from this data will give more opportunities for them. Regarding the governance sector and healthcare, better services can be created and citizens will have a better life standard. Smart cities are becoming more popular and all the data which came beside it will bring new challenges for Big Data analysis. Although Big Data has brought many advantages, in the future more contributions of Big Data Analysis will appear, because day by day there are improvements in Bid Data Systems.
65
REFERENCES 1. Al-Jaroodi, J., & Mohamed, N. (2016). Characteristics and Requirements of Big Data Analytics Applications. In IEEE (Ed.), Collaboration and Internet Computing (CIC), 2016 IEEE 2nd International Conference on (pp. 426-430). Pittsburgh-Bahrain. 2. Adejuwon, A., & Mosavi, A. (2010). Domain driven data mining–Application to business. IJCSI International Journal of Computer Science Issues, 7(4). 3. Andrejevic, M. (2014). Big Data, Big Questions| The Big Data Divide. International Journal of Communication , 17. 4. Anshari, M., Alas, Y., & Sei Guan, L. (2016). Developing online learning resources: Big data, social networks, and cloud computing to support prevasive knowledge. (Springer, Ed.) Education and Information Technologies , 21 (6), 1663-1677. 5. Arief Wisesa, H., Ma’sum, M. A., Mursanto, P., & Febrian, A. (2016). Processing Big Data with Decision Trees A Case Study in Large Traffic Data. In IEEE (Ed.), Big Data and Information Security (IWBIS), International Workshop on (pp. 115-120). Indonesia: IEEE. 6. Arora, S., Kumar, M., Johri, P., & Das, S. (2016). Big Heterogeneous Data and Its Security: A Survey. In IEEE (Ed.), Computing, Communication and Automation (ICCCA) (pp. 37-40). Greater Noida. 7. Ashish, N., & Vesset, D. (2014). Worldwide Big Data Technology and Services 2014-2018 Forecast. Analytical overview . 8. Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. (Elsevier, Ed.) Computer networks , 54 (15), 2787-2805. 9. Awuor, F., Kimeli, K., Rabah, K., & Rambim, D. (2013). Ict solution architecture for agriculture. In IST-Africa Conference and Exhibition (IST-Africa) (pp. 1-7). IEEE. 10. Babu, P., & Sastry, H. (2014). Big Data and Predictive Analytics in ERP Systems for Automating Decision Making Process. In IEEE (Ed.), Software Engineering and Service Science (ICSESS), 2014 5th IEEE International Conference on (pp. 259-262). Visakhapatnam. 11. Bailey, M. (2014). Will Big Data Diminish the Role of Humans in Decision Making? In IEEE (Ed.). California. 12. Bakshi, K. (2012). Considerations for big data: Architecture and approach. In Aerospace Conference, 2012 IEEE (pp. 1-7). IEEE.
66
13. Baldo, N., Giupponi, L., & Mangues-Bafalluy, J. (2014). Big data empowered self organized networks. In VDE (Ed.), European Wireless 2014; 20th European Wireless Conference; Proceedings of (pp. 1-8). 14. Basanta-Val, P., Audsley, N. C., Wellings, A. J., Gray, I., & FernĂĄndez-GarcĂa, N. (2016). Architecting Time-Critical Big-Data Systems. (IEEE, Ed.) IEEE Transactions on Big Data , 2 (4), 310-324. 15. Basso, T., Matsunaga, R., Moraes, R., & Antunes, N. (2016). Challenges on Anonymity, Privacy, and Big Data. In Dependable Computing (LADC), 2016 Seventh Latin-American Symposium on (pp. 164-171). IEEE. 16. Bendre, M. R., Thool, R., & Thool, V. R. (2015). Big data in precision agriculture: Weather forecasting for future farming. In Next Generation Computing Technologies (NGCT), 2015 1st International Conference on (pp. 744-750). IEEE. 17. Boccardi, F., Heath, R. W., Lozano, A., Marzetta, M. L., & Popovskr, P. (2014). Five disruptive technology directions for 5G. (IEEE, Ed.) IEEE Communications Magazine , 52 (2), 74-80. 18. Bort, J. (2013). How the cdc is using big data to save you from the flu. Business Insider . 19. Center, Intel IT. (2012). Planning guide: Getting started with hadoop. Steps IT Managers can take to move forward with big data analytics . 20. Charney, J. G., Fjrtoft, R., & Neumann, J. v. (1950). Numerical integration of the barotropic vorticity equation. Tellus , 2 (4), 237-254. 21. Chen, C. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on Big Data. (Elsevier, Ed.) Information Sciences , 275, 314-347. 22. Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications , 19 (2). 23. Chen, M., Mao, S., Zhang, Y., & Leung, V. C. (2014). Big data: related technologies, challenges and future prospects. Springer. 24. Cheng, M., Jia, W., Gao, X., Gao, S., & Yang, F. (2004). Mu rhythm-based cursor control: an offline analysis. (Elsevier, Ed.) Clinical Neurophysiology , 115 (4), 745751. 25. Chiang, I.-J. (2015). Agglomerative Algorithm to Discover Semantics From Unstructured Big Data. In IEEE (Ed.), Big Data (Big Data) (pp. 1556-1563). Taipei. 26. Choi, T.-M., Chan, H. K., & Yue, X. (2017). Recent development in big data analytics for business operations and risk management. (IEEE, Ed.) IEEE transactions on cybernetics , 47 (1), 81-92.
67
27. Cisco, CVNI. (2015). Source:< http://www. com/c/en/us/solutions/collateral/service-provider/visual-networking-indexvni/white\_paper\_c11-520862. html .
cisco.
28. Clarke, R. (1997). Introduction to dataveillance and information privacy, and definitions of terms. 29. Cukier, K., & Mayer-Schoenberger, V. (2013). The rise of Big Data: How it's changing the way we think about the world. Foreign Aff. , 92, 28. 30. D N, D., B J, S., Chetan, & S, S. (2016). An Efficient Framework of Data Mining and its Analytics on Massive Streams of Big Data Repositories. In IEEE (Ed.), Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER), IEEE (pp. 195200). Bangalore. 31. Daradoumis, T., Bassi, R., Xhafa, F., & Caball, S. (2013). A review on massive elearning (MOOC) design, delivery and assessment. In IEEE (Ed.), P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2013 Eighth International Conference on (pp. 208-213). 32. (2014). Data Science and Big Data Analytics. In Wiley (Ed.), Discovering, Analysing, Visualizing and Presenting Data. 33. Data, IBM What Is Big. (2012). Bring Big Data to the Enterprise. 34. Demchenko, Y., De Laat, C., & Membrey, P. (2014). Defining Architecture Components of the Big Data Ecosystem. In IEEE (Ed.), Collaboration Technologies and Systems (CTS) (pp. 104-112). Amsterdam. 35. Demchenko, Y., Gruengard, E., & Klous, S. (2014). Instructional model for building effective Big Data curricula for online and campus education. In IEEE (Ed.), Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on (pp. 935-941). 36. Dobre, C., & Xhafa, F. (2014). Intelligent services for big data science. Future Generation Computer Systems , 37, 267-281. 37. Dobre, C., & Xhafa, F. (2014). Parallel Programming Paradigms and Frameworks in Big Data Era. (Springer, Ed.) International Journal of Parallel Programming , 42 (5), 710-738. 38. Du, J., Zhou, J., Li, C., & Yang, L. (2016). An Overview of Dynamic Data Mining. In IEEE (Ed.), Informative and Cybernetics for Computational Social Systems (ICCSS), 2016 3rd International Conference on (pp. 331-335). Jinzhou. 39. Elarabi, T., Sharma, B., Pahwa, K., & Deep, V. (2016). Big data analytics concepts and management techniques. In IEEE (Ed.), Inventive Computation Technologies (ICICT) (Vol. 2, pp. 1-6).
68
40. Esmaeili, M. and Mosavi, A., 2010, April. Notice of Retraction Variable reduction for multi-objective optimization using data mining techniques; application to aerospace structures. In Computer Engineering and Technology (ICCET), 2010 2nd International Conference on (Vol. 5, pp. V5-333). IEEE. 41. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. (W. O. Library, Ed.) Journal of the Royal Statistical Society , 70 (5), 849-911. 42. Fan, J., Fang, H., & Liu, H. (2014). Challenges of big data analysis. (O. U. Press, Ed.) National science review , 1 (2), 293-314. 43. Femminella, M., Pergolesi, M., & Reali, G. (2016). IoT, Cloud Services, and Big Data: A Comprehensive Pricing Solution. In IEEE (Ed.), Cloudification of the Internet of Things (CIoT) (pp. 1-5). Perugia. 44. Gadepally, V., Herr, T., Johnson, L., Milechin, L., Milosavljevic, M., & Miller, B. A. (2015). Sampling Operations on Big Data. In IEEE (Ed.), Signals, Systems and Computers (pp. 1515-1519). Massachusetts. 45. Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management , 35 (2), 137-144. 46. Gerhardt, B., Griffin, K., & Klemann, R. (2012). Unlocking value in the fragmented world of big data analytics. Cisco Internet Business Solutions Group . 47. Ghasemi, E., & Chow, P. (2016). Accelerating Apache Spark Big Data Analysis with FPGAs. In IEEE (Ed.), Ubiquitous Intelligence \& Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), 2016 Intl IEEE Conferences (pp. 737-744). 48. Gopalani, S., & Arora, R. (2015). Comparing apache spark and map reduce with performance analysis using k-means. (F. o. Science, Ed.) International Journal of Computer Applications , 113 (1). 49. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A vision, architectural elements, and future directions. (Elsevier, Ed.) Future generation computer systems , 29 (7), 1645-1660. 50. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things (IoT): A vision, architectural elements, and future directions. (Elsevier, Ed.) Future generation computer systems , 29 (7), 1645-1660. 51. Guo, Y., Feng, S., Li, K., Mo, W., Liu, Y., & Wang, Y. (2016). Big data processing and analysis platform for condition monitoring of electric power system. In Control (CONTROL), 2016 UKACC 11th International Conference on (pp. 1-6). IEEE.
69
52. Guo, Y., Feng, S., Li, K., Mo, W., Liu, Y., & Wang, Y. (2016). Big data processing and analysis platform for condition monitoring of electric power system. In Control (CONTROL), 2016 UKACC 11th International Conference on (pp. 1-6). IEEE. 53. Guo, Y., Zhang, J., & Zhang, Y. (2016). An algorithm for analyzing the city residentsâ&#x20AC;&#x2122; activity information through mobile big data mining. In IEEE (Ed.), Trustcom/BigDataSE/I SPA, 2016 IEEE (pp. 2133-2138). Tianjin: IEEE. 54. Harvard Business Review. (2014). How Big Data Impacts Healthcare. Harvard Business Review . 55. Haupt, S. E., & Kosovic, B. (2015). Big Data and Machine Learning for Applied Weather Forecasts: Forecasting Solar Power for Utility Operations. In Computational Intelligence, 2015 IEEE Symposium Series on (pp. 496-501). IEEE. 56. He, X., Ai, Q., Qiu, R. C., Huang, W., Piao, L., & Liu, H. (2015). A big data architecture design for smart grids based on random matrix theory. (IEEE, Ed.) IEEE Transactions on Smart Grid . 57. He, Y., Fei, R., Zhao, N., Yin, H., Yao, H., & Qiu, R. C. (2016). Big data analytics in mobile cellular networks. (IEEE, Ed.) IEEE Access , 4, 1985-1996. 58. Ho, N., Vo, H., & Vu, M. (2016). An Adaptive Information-Theoretic Approach for Identifying Temporal Correlations in Big Data Sets. In IEEE (Ed.), Big Data (Big Data), 2016 IEEE International Conference on (pp. 666-675). New York: IEEE. 59. Hogarth, R. M., & Soyer, E. (2015). Using simulated experience to make sense of big data. (C. M. Massachusetts Institute of Technology, Ed.) MIT Sloan Management Review , 56 (2), 49. 60. Hurwitz, J., Nugent, A., Halper, F., & Kaufman, M. (2013). Big Data For Dummies. Canada: John Wiley & Sons, Inc. 61. Imran, A., Zoha, A., & Abu-Dayya, A. (2014). Challenges in 5G: how to empower SON with big data for enabling 5G. (IEEE, Ed.) IEEE Network , 28 (6), 27-33. 62. Islam, R., & Islam, E. (2014). An approach to provide security to unstructured Big. In IEEE (Ed.), Software, Knowledge, Information Management and Applications (SKIMA), 2014 8th International Conference on (pp. 1-5). Bangladesh. 63. Islam, S. R., Kwak, D., Kabir, M. H., Hossain, M., & Kwak, K.-S. (2015). The internet of things for health care: a comprehensive survey. IEEE Access , 3, 678-708. 64. Ismail, K. A., Majid, M. A., Zain, J. M., & Bakar, N. A. (2016). Big Data prediction framework for weather Temperature based on MapReduce algorithm. In Open Systems (ICOS), 2016 IEEE Conference on (pp. 13-17). IEEE.
70
65. Jangade, R., & Chauhan, R. (2016). Big data with integrated cloud computing for healthcare analytics. In IEEE (Ed.), Computing for Sustainable Global Development (INDIACom), 2016 3rd International Conference on (pp. 4068-4071). 66. Jaseena, K., & David, J. M. (2014). Issues, challenges, and solutions: big data mining. NeTCoM, CSIT, GRAPH-HOC, SPTM--2014 , 131-140. 67. Jati, G., Kusuma, I., MH, H., & Jatmiko, W. (2016). Big Data Compression using SPIHT in Hadoop. In IEEE (Ed.), Big Data and Information Security (IWBIS) (pp. 133-139). Indonesia. 68. Jeon, J., Lee, W., Cho, H. J., & Lee, H. (2015). A big data system design to predict the vehicle slip. In Control, Automation and Systems (ICCAS), 2015 15th International Conference on (pp. 592-596). IEEE. 69. Jia, B., WiktorWlodarczyk, T., & Rong, C. (2010). Performance Considerations of Data Acquisition in Hadoop System. In IEEE (Ed.), Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on (pp. 545549). Stavanger. 70. Jin, J., Gubbi, J., Marusic, S., & Palaniswami, M. (2014). An information framework for creating a smart city through internet of things. (IEEE, Ed.) IEEE Internet of Things Journal , 1 (2), 112-121. 71. Jonnagaddala, J., Dai, H. a., & Liaw, S. (2016). Mining electronic health records to guide and support clinical decision support systems. (I. Global, Ed.) Improving health management through clinical decision support systems , 252-269. 72. Joseph, R. C., & Johnson, N. A. (2013). Big data and transformational government. IT Professional , 16 (6), 43-48. 73. Jourdan, Z., Rainer, R. K., & Marshall, T. E. (2008). Business intelligence: An analysis of the literature 1. (T. \. Francis, Ed.) Information Systems Management , 25 (2), 121-131. 74. Kadam, S. D., Motwani, D., & Vaidya, S. A. (2016). Big data analyticsrecommendation system with Hadoop Framework. In IEEE (Ed.), Inventive Computation Technologies (ICICT), International Conference on (Vol. 3, pp. 1-5). 75. Kagermann, H., Helbig, J., Hellinger, A., & Wahlster, W. (2013). Recommendations for Implementing the strategic initiative INDUSTRIE 4.0: securing the future of German manufacturing industry; final report of the Industrie 4.0 working group. (Forschungsunion, Ed.) 76. Kagermann, H., Helbig, J., Hellinger, A., & Wahlster, W. (2013). Recommendations for Implementing the strategic initiative INDUSTRIE 4.0: securing the future of German manufacturing industry; final report of the Industrie 4.0 working group. Forschungsunion.
71
77. Keshavamurthy, B., & Ashraf, M. (2016). Conceptual design of proactive SONs based on the Big Data framework for 5G cellular networks: A novel Machine Learning perspective facilitating a shift in the SON paradigm. System Modeling \& Advancement in Research Trends (SMART), International Conference , 298-304. 78. Kim, G.-H., Trimi, S., & Chung, J.-H. (2014). Big-data applications in the government sector. Communications of the ACM , 57 (3), 78--85. 79. Kitchin, R. (2014). Big Data, New epistemologies. (S. Publications, Ed.) Big Data \& Society , 1 (1), 2-6. 80. Kitchin, R., & Lauriault, T. P. (2014). Towards critical data studies. 81. Knorr, F., Baselt, D., Schreckenberg, M., & Mauve, M. (2012). Reducing traffic jams via VANETs. IEEE Transactions on Vehicular Technology , 61 (8), 3490-3498. 82. Kowalczyk, M., & Buxmann, P. (2014). Big Data and Information Processing in Organizational Decision Processes. (Springer, Ed.) Business \& Information Systems Engineering , 6 (5), 267-278. 83. Kulcu, S., Dogdu, E., & Ozbayoglu, M. A. (2016). A Survey on Semantic Web and Big Data Technologies for Social Network Analysis. In IEEE (Ed.), Big Data (Big Data), 2016 IEEE International Conference on (pp. 1768-1777). Ankara: IEEE. 84. Kumar T K, A., Liu, H., & Thomas, J. P. (2014). Efficient Structuring of data in Big Data. In IEEE (Ed.), Efficient structuring of data in big data (pp. 1-5). Oklahoma. 85. Kwon, O., Lee, N., & Shin, B. (2014). Data quality management, data usage experience and acquisition intention of big data analytics. (Elsevier, Ed.) International Journal of Information Management , 34 (3), 387-394. 86. Larkou, G., Mintzis, M., Andreou, P. G., Konstantinidis, A., & Zeinalipour-Yazti, D. (2014). Managing big data experiments on smartphones. (Springer, Ed.) Distributed and Parallel Databases , 34 (1), 33-64. 87. Larminie, J., & Lowry, J. (2004). Electric vehicle technology explained. John Wiley \& Sons. 88. Latinović, T. S., Preradović, D. M., Barz, C. R., Latinović, M. T., Petrica, P. P., & Pop-Vadean, A. (2016). Big Data in industry. In I. Publishing (Ed.), IOP Conference Series: Materials Science and Engineering (Vol. 144, p. 012006). Bosnia and Herzegovina. 89. Lelwala, N. (2016). Ensemble inference based framework for creating knowledge from big data in IoT. In IEEE (Ed.), Advances in ICT for Emerging Regions (ICTer), 2016 Sixteenth International Conference on (pp. 245-245).
72
90. Leung, C. K., Braun, P., Enkhee, M., Pazdor, A. G., Sarumi, O. A., & Tran, K. (2016). Knowledge Discovery from Big Social Key-Value Data. In IEEE (Ed.), Computer and Information Technology (CIT) (pp. 484-491). Winnipeg. 91. Li, N., Li, T., & Venkatasubramanian, S. (2007). t-closeness: Privacy beyond kanonymity and l-diversity. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on (pp. 106-115). IEEE. 92. Liang, J., Yang, J., Wu, Y., Li, C., & Zheng, L. (2016). Big Data Application in Education: Dropout Prediction in Edx MOOCs. In IEEE (Ed.), Multimedia Big Data (BigMM), 2016 IEEE Second International Conference on (pp. 440-443). 93. Liaw, B. Y., & Dubarry, M. (2007). From driving cycle analysis to understanding battery performance in real-life electric hybrid vehicle operation. Journal of power sources , 174 (1), 76-88. 94. Liu, J., Lui, F., & Ansari, N. (2014). Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop. (IEEE, Ed.) IEEE Network , 28 (4), 32-39. 95. Luhn, H. P. (1958). A business intelligence system. IBM Journal of Research and Developmen , 314-319. 96. Marjani, M., Nasaruddin, F., Gani, A., Karim, A., Hashem, I., Siddiqa, A., et al. (2017). Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges. (IEEE, Ed.) IEEE Access . 97. Marjanovic, O., Ariyachandra, T., & Dinter, B. (2015). Introduction to Organizational Issues for Big Data, Business Analytics, and Business Intelligence Minitrack. In IEEE (Ed.), System Sciences (HICSS), 2015 48th Hawaii International Conference on (pp. 4710-4711). 98. Markus, M. L., & Topi, H. (2015). Big data, big decisions for science, society, and business: report on a research agenda setting workshop. 99. McAfee, A., Brynjolfsson, E., Davenport, T. H., Patil, D., & Barton, D. (2012). Big Data. The management revolution. Harvard Bus Rev , 90 (10), 61-67. 100. McBratney, A., Whelan, B., Ancev, T., & Bouma, J. (2005). Future directions of precision agriculture. Precision agriculture , 6 (1), 7-23. 101. McKeown, N., Anderson, T., Balakrishnan, H., Parulkar, G., Peterson, L., Rexford, J., et al. (2008). OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review , 38 (2), 69-74. 102. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). Mllib: Machine learning in apache spark. Journal of Machine Learning Research , 17 (34), 1-7.
73
103. Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). Mllib: Machine learning in apache spark. Journal of Machine Learning Research , 17 (34), 1-7. 104. Michalik, P., Štofa, J., & Zolotová, I. (2014). Concept Definition for Big Data Architecture. In IEEE (Ed.), Applied Machine Intelligence and Informatics (SAMI) (pp. 331-334). Košice. 105. Miloslavskaya, N., & Tolstoy, A. (2016). Concepts to Information Security Issues. In ACM (Ed.), Proceedings of the 2nd international conference on Security of information and networks (pp. 93-97). Moscow. 106. Mohanty, H., Bhuyan, P., & Chenthati, D. (2015). Big Data (Vol. 11). Warsaw, Poland: Springer. 107. Mosavi, A. (2010). Multiple criteria decision-making preprocessing using data mining tools. arXiv preprint arXiv:1004.3258. 108. Mosavi, A., Visual Analytics, Obuda University, 2016. 109. Mosavi, A., Optimal Engineering Design Tech. Rep. 2013. University of Debrecen, Hungary, 2013. 110. Mosavi, A., 2013. Brain-computer optimization for solving complicated geometrical decisionmaking problems. In Proceedings of PEME VI. Ph. D. Conference. 111. Mosavi, A., 2014. Application of data mining in multiobjective optimization problems. International Journal for Simulation and Multidisciplinary Design Optimization, 5, p.A15 112. Motau, M., & Kalema, B. M. (2016). Big Data Analytics readiness: A South African public sector perspective. In Emerging Technologies and Innovative Business Practices for the Transformation of Societies (EmergiTech), IEEE International Conference on (pp. 265-271). IEEE. 113. Nagar, P., Atriwal, L., Mehra, H., & Tayal, S. (2016). Comparison of generalized and big data business intelligence tools. In IEEE (Ed.), Computing for Sustainable Global Development (INDIACom), 2016 3rd International Conference on (pp. 35853588). 114. Nasser, T., & Tariq, R. (2015). Big data challenges. J Comput Eng Inf Technol 4: 3. doi: http://dx. doi. org/10.4172/2324 , 9307, 2. 115. Negash, S. (2004). Business intelligence. Communications of the Association , 8 (1), 177-195. 116. Nejra, H., Dzenana, D., & Nijaz, H. (2012). Climate Data Analysis Using Clustering Data Mining Techniques. International Conference on Applied Informatics and Computer Theory(AICT’12) (pp. 94-101). Barcelona: ISBN.
74
117. Nevo, D., Nevo, S., Kumar, N., Braasch, J., & Mathews, K. (2015). ENHANCING THE VISUALIZATION OF BIG DATA TO SUPPORT COLLABORATIVE DECISION-MAKING. In IEEE (Ed.), ystem Sciences (HICSS), 2015 48th Hawaii International Conference on (pp. 121-130). Hawaii. 118. Noorwali, I., Arruda, D., & Madhavji, N. H. (2016). Understanding Quality Requirements in the Context of Big Data Systems. In ACM (Ed.), Proceedings of the 2nd International Workshop on BIG Data Software Engineering (pp. 76-79). London, Canada: 2nd International Workshop on BIG Data Software Engineering. 119. Olaronke, I., & Oluwaseun, O. (2016). Big data in healthcare: Prospects, challenges and resolutions. In IEEE (Ed.), Future Technologies Conference (FTC) (pp. 11521157). 120. Pang, Y., Wang, T., & Wang, N. (2014). MOOC Data from Providers. In IEEE (Ed.), Enterprise Systems Conference (ES), 2014 (pp. 87-90). 121. Papazoglou, M. P. (2006). E-business: organizational and technical foundations, John Wiley and sons. (London, Ed.) 88-90. 122. Parwez, M. S., Rawat, D., & Garuba, M. (2017). Big Data Analytics for User Activity Analysis and User Anomaly Detection in Mobile Wireless Network. (IEEE, Ed.) IEEE Transactions on Industrial Informatics . 123. Petit, J., & Shladover, S. E. (2015). Potential cyberattacks on automated vehicles. IEEE Transactions on Intelligent Transportation Systems , 16 (2), 546-556. 124. Pfaffl, M. W. (2001). A new mathematical model for relative quantification in realtime RT--PCR. (O. U. Press, Ed.) Nucleic acids research , 29 (9), e45-e45. 125. Pondel, M. (2015). A concept of enterprise Big Data and BI workflow driven platform. In IEEE (Ed.), Computer Science and Information Systems (FedCSIS) (pp. 1699-1704). WrocĹ&#x201A;aw. 126. Popovic, T., & Kezunovic, M. (2012). Measures of value: data analytics for automated fault analysis. IEEE Power and Energy Magazine , 10 (5), 58-59. 127. Radhika, T. V., Gouda, K. C., & Kumar, S. S. (2016). Big data research in climate science. In Communication and Electronics Systems (ICCES), International Conference on (pp. 1-6). IEEE. 128. Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. (B. Central, Ed.) Health Information Science and Systems , 2 (1), 3. 129. Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. (B. Central, Ed.) Health Information Science and Systems , 2 (1).
75
130. Rahimi-Eichi, H., & Chow, M.-Y. (2014). Big-data framework for electric vehicle range estimation. In Industrial Electronics Society, IECON 2014-40th Annual Conference of the IEEE (pp. 5628-5634). IEEE. 131. Reichert, P. (2014). Comarch EDI platform case study: The advanced electronic data interchange hub as a supply-chain performance booster. In Logistics Operations, Supply Chain Management and Sustainability (pp. 143-155). Springer. 132. Reshmy, A. K., & Paulraj, D. (2015). In IEEE (Ed.), Circuit, Power and Computing Technologies (ICCPCT), 2015 International Conference on (pp. 1-7). Chennai-Tamil Nadu. 133. Richtárik, P., & Takác, M. (2016). Parallel coordinate descent methods for big data optimization. (Springer, Ed.) Mathematical Programming , 156 (1-2), 433-484. 134. Robak, S., Franczyk, B., & Franczyk, B. (2016). Business process optimization with big data analytics under consideration of privacy. In IEEE (Ed.), Computer Science and Information Systems (FedCSIS), 2016 Federated Conference on (pp. 1199-1204). 135. Robak, S., Franczyk, B., & Robak, M. (2013). Applying Big Data and Linked Data Concepts in Supply Chains Management. In IEEE (Ed.), Computer Science and Information Systems (FedCSIS), 2013 Federated Conference on (pp. 1215–1221). Poland. 136. Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. In IEEE (Ed.), Collaboration Technologies and Systems (CTS), 2013 International Conference on (pp. 42-47). 137. Sahoo, P. K., Mohapatra, S. K., & Wu, S.-L. (2016). Analyzing Healthcare Big Data With Prediction for Future Health Condition. (IEEE, Ed.) IEEE Access , 4, 97869799. 138. Schermann, M., Krcmar, H., Hemsen, H., Markl, V., Buchmüller, C., Bitter, T., et al. (2014). Big Data, An Interdisciplinary Opportunity for Information Systems Research. (Springer, Ed.) Business \& Information Systems Engineering , 6 (5), 261-266. 139. Shchetko, N. (2014). Laser eyes pose price hurdle for driverless cars. The Wall Street Journal , 21. 140. Sherif, A., Rabieh, K., Mahmoud, M., & Liang, X. (2016). Privacy-Preserving Ride Sharing Scheme for Autonomous Vehicles in Big Data Era. IEEE Internet of Things Journal . 141. Shim, K. (2012). MapReduce algorithms for big data analysis. (V. Endowment, Ed.) Proceedings of the VLDB Endowment , 5 (12), 2016-2017. 142. Singh, K., & Kaur, R. (2014). Hadoop: Addressing Challenges of Big Data. In IEEE (Ed.), Advance Computing Conference (IACC), 2014 IEEE International (pp. 686689). Jalandhar.
76
143. Sivaraman, E., & Manickachezian, R. (2014). High Performance and Fault Tolerant Distributed File. In IEEE (Ed.), Intelligent Computing Applications (ICICA), 2014 International Conference on (pp. 32-36). Coimbatore. 144. Song, H., Basanta-Val, P., Steed, A., Jo, M., & others. (2017). Next-generation big data analytics: State of the art, challenges, and future research topics. IEEE Transactions on Industrial Informatics . 145. Sowmya, R., & Suneetha, K. R. (2017). Data Mining with Big Data. In IEEE (Ed.), Intelligent Systems and Control (ISCO) (pp. 246-250). Bengaluru, Karnataka. 146. Sri, P. A., & Anusha, M. (2016). Big Data-Survey. Indonesian Journal of Electrical Engineering and Informatics (IJEEI) , 4 (1), 74-80. 147. Srivastava, S., & Chaudhari, N. (2016). Appraising a Decade of Research in the Field of Big Data "The Next Big Thing". In IEEE (Ed.), Computing for Sustainable Global Development (INDIACom), 2016 3rd International Conference on (pp. 2171-2175). Greater Noida. 148. Stephanie Baum. (2013). A remote monitor embedded in insulin pen caps could help personalize diabetes treatment. 149. Sun, N., Morris, J., Xu, J., Zhu, X., & Xie, M. (2014). iCARE: A framework for big data-based banking customer analytics. IBM Journal of Research and Development , 58 (5), 1-4. 150. Trachtler, A. (2004). Integrated vehicle dynamics control using active brake, steering and suspension systems. International Journal of Vehicle Design , 36 (1), 1-12. 151. Trnka, A. (2014). Big data analysis. European Journal of Science and Theology , 10 (1), 143-148. 152. U.S. Government. http://www.data.gov
Data.gov.
(n.d.).
Retrieved
April
4,
2015,
from
153. Van Oort, N., & Cats, O. (2015). Improving public transport decision making, planning and operations by using Big Data. Delft: IEEE. 154. Vyatkin, V., Salcic, Z., Roop, P. S., & Fitzgerald, J. (2007). Now that's smart! (IEEE, Ed.) IEEE Industrial Electronics Magazine , 1 (4), 17-29. 155. Wan, K., & Alagar, V. (2016). Characteristics and Classification of Big Data in Health Care Sector. In IEEE (Ed.), Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (pp. 1439-1446). Suzhou, China; Montreal, Canada;. 156. Wang, D., & Low, C. B. (2008). Modeling and analysis of skidding and slipping in wheeled mobile robots: Control design perspective. IEEE Transactions on Robotics , 24 (3), 676-687.
77
157. Wielki, J. (2013). Implementation of the Big Data concept in organizations â&#x20AC;&#x201C; possibilities, impediments and challenges. In IEEE (Ed.), Computer Science and Information Systems (FedCSIS) (pp. 985â&#x20AC;&#x201C;989). Opole. 158. Wright, T. (2004). Security, privacy, and anonymity. Crossroads , 11 (2), 5. 159. Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). Data Mining with Big Data. (IEEE, Ed.) Ieee transactions on knowledge and data engineering , 26 (1), 97-107. 160. Xenopoulos, P., Daniel, J., Matheson, M., & Sukumar, S. (2016). Big data analytics on HPC architectures: Performance and cost. In IEEE (Ed.), Big data analytics on HPC architectures: Performance and cost (pp. 2286-2295). 161. Xin, R., Rosen, J., Zaharia, M., Franklin, M. J., Shenker, S., & Stoica, I. (2013). Shark: SQL and rich analytics at scale. In ACM (Ed.), Proceedings of the 2013 ACM SIGMOD International Conference on Management of data (pp. 13-14). 162. Xinhua, E., Han, J., Wang, Y., & Liu, L. (2013). Big Data-as-a-Service: Definition and architecture. In IEEE (Ed.), Communication Technology (ICCT) (pp. 738-742). Beijing. 163. Yan, Y.-Z., Liu, R.-H., Yang, C.-T., & Chen, S.-T. (2015). Cloud City Traffic State Assessment System Using a Novel Architecture of Big Data. In Cloud Computing and Big Data (CCBD), 2015 International Conference on (pp. 252-259). IEEE. 164. Yang, S. J., & Huang, C. S. (2016). Taiwan Digital Learning Initiative and Big Data Analytics in Education Cloud. In Advanced Applied Informatics (IIAI-AAI), 2016 5th IIAI International Congress on (pp. 366-370). IEEE. 165. Yin, S., & Kaynak, O. (2015). Big Data for Modern Industry: Challenges and Trends. (IEEE, Ed.) Proceedings of the IEEE , 103 (2), 143-146. 166. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., et al. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In U. Association (Ed.), Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (pp. 2-2). 167. Zhang, X.-P. S., & Wang, F. (2017). Signal Processing for Finance, Economics, and Marketing: Concepts, framework, and big data applications. IEEE Signal Processing Magazine , 34 (3), 14-35. 168. Zhang, Y., Qiu, M., Tsai, C.-W., Hassan, M. M., & Alamri, A. (2015). Health-CPS: Healthcare cyber-physical system assisted by cloud and big data. (IEEE, Ed.) IEEE Systems Journal . 169. Zhang, Y., Ren, J., Liu, J., Xu, C., Guo, H., & Liu, Y. (2017). A Survey on Emerging Computing Paradigms for Big Data. Chinese Journal of Electronics , 26 (1).
78
170. Zhang, Y., Wang, W., Kobayashi, Y., & Shirai, K. (2012). Remaining driving range estimation of electric vehicle. In Electric Vehicle Conference (IEVC), 2012 IEEE International (pp. 1-7). IEEE. 171. Zheng, K., Yang, Z., Zhang, K., Chatzimisios, P., Yang, K., & Xiang, W. (2016). Big data-driven optimization for mobile networks toward 5G. (IEEE, Ed.) IEEE Network , 30 (1), 44-51. 172. Zheng, Y. (2015). Methodologies for cross-domain data fusion: An overview. IEEE transactions on big data , 1 (1), 16-34. 173. Zheng, Z., Du, Z., Li, L., & Guo, Y. (2014). Big Data-Oriented Open Scalable Relational Data Model. In IEEE (Ed.), Big Data (BigData Congress) (pp. 398-405). Henan. 174. Zhou, A., Liu, L., Jiang, M., & Guo, X. (2016). Network traffic measurement algorithm based on sampling for big network data. In Advanced Cloud and Big Data (CBD), 2016 International Conference on (pp. 240-250). IEEE.