DAIPEX Project 2012-5-111-R
Deliverable D4.2 A Survey of Complex Event Processing Engines 30 June 2014 Public Document
30 June 2014
Public Document
Project acronym: Project full title:
DAIPEX Data and Algorithms for Integrated Transportation Planning and Execution
Work package: Document number: Document title:
4 D4.2 A Survey of Complex Event Processing Engines 1
Version:
Editor(s) / lead beneficiary: Authors(s):
DAIPEX
Huiye Ma (TU/e) Huiye Ma (TU/e) Remco Dijkman (TU/e)
1
30 June 2014
Public Document
Executive summary This deliverable presents state-of-the-art engines for handling complex events, in order to select the most appropriate engine for the Data and Algorithms for Integrated Transportation Planning and Execution (DAIPEX) project. The most important requirement on which the engine was selected, was the ability to be used as embeddable component in Java and, therefore, the suitability for integration into any Java process. A systematic review was used to search for the available execution engines in practice. The primary outcome of our search returned 8 execution engines. Having a list of 8 systems, we applied other criteria in order to create a shortlist. These criteria were: (1) Does the system have an executable CEP engine? (2) Does the system have an open source license? (3) Does the system have in-depth documentation? (4) Does the system support XML files as input? (5) Has the system been implemented in Java? (6) Does the system have an active developer community for communication? (7) Does the system handle historical data? We found four candidates that have most of these criteria. We reviewed those candidates in more detail and selected the Esper engine as the engine for further development, because it has an Open Source license and because the Esper development team gives continuous and active support online.
DAIPEX
2
30 June 2014
Public Document
Contents 1
Introduction .................................................................................................................................. 4
2
CEPs ........................................................................................................................................... 4
3
Search Methodology .................................................................................................................... 5 3.1
Search Strategy .................................................................................................................... 5
3.2
Selection Criteria .................................................................................................................. 5
4
Results ......................................................................................................................................... 6
5
Comparison of Engines ............................................................................................................... 7
6
7
5.1
Esper .................................................................................................................................... 7
5.2
Oracle Complex Event Processing ....................................................................................... 7
5.3
WSO2 Complex Event Processor ........................................................................................ 8
5.4
SASE .................................................................................................................................... 8
Prototype ..................................................................................................................................... 9 6.1
scenario 1 ........................................................................................................................... 10
6.2
scenario 2 ........................................................................................................................... 11
6.3
scenario 3 ........................................................................................................................... 12
Conclusion ................................................................................................................................. 13
References ....................................................................................................................................... 14 Appendix A ....................................................................................................................................... 15
DAIPEX
3
30 June 2014
Public Document
1 Introduction In this report, we review candidates of Complex Event Processing (CEP) engines in order to select the most appropriate one for the DAIPEX project. For this purpose, we use a systematic review approach to create a list of all possible CEPs that can be integrated and extended with the ability to detect and correct inaccurate and incomplete data where data mining solutions will be developed for this purpose. In addition, we are interested in extending CEP technology with the ability to create views on a data stream in such a way that relevant information to a view can be aggregated. We consider systems of CEPs from a practical point of view. Based on this long list, we create a shortlist, by adopting pre-defined criteria and finally, we select the CEP engine that we intend to use in the project. With one of the engines that we analyzed, we also experimented it in an early prototype of DAIPEX data processing system, in order to build experience with working with such an engine and to learn how it can be integrated. The remainder of this report is structured as follows. Section 2 explains the CEP engines desirable in DAIPEX project. Section 3 presents the search and selection methodology. Search result is given in Section 4. Section 5 evaluates the final shortlist candidates in terms of their applicability to use in the DAIPEX project. Section 6 briefly reviews the early prototype and tests three scenarios. Finally, Section7 presents the conclusions of this report.
2 CEPs Applications that require (near) real-time processing functionalities are changing the way that traditional data processing systems infrastructures operate. They are pushing the limits of current processing systems by forcing them to provide better throughputs with the lowest possible latencies. The main problems to be solved in recent years are not primarily in raw data processing, but rather in the high-level intelligence that can be extracted from the data. As a response, systems were developed that can filter, aggregate and correlate data, correct data, and notify end users about its results, or interesting facts. The latest advance in such systems is the development of high performance complex event processing (CEP) engines that are capable of detecting patterns of activity from continuously arriving data. The development of the area starts with DSMSs (data stream management systems), i.e., TelegraphCQ (Chandrasekaran Sirish et al. 2003), which focused on managing continuous data streams. On the other hand, there are CEP systems, which are event processing systems that combine data from multiple sources to infer events or patterns that suggest more complicated situations. These systems are represented broadly by traditional content-based publish-subscribe systems like Rapide (D. C. Luckham et al. 1998). In the evolution of those systems, a process of convergence between DSMSs systems and CEP systems has generated intersections between those fields. Our main goal is to find CEP engines that can be applied as a component in business process management systems. Hence these CEP engines should be embeddable in the data processing software and should serve the purpose of business process management systems. Moreover, it is important to have CEP engines which can process historical data as well as real time data streams. When handling real-time events, Complex Event Processing (CEP) engines can filter, correct, aggregate real-time event data in a low latency environment. When combined with a historical database and data management solution, CEP engines help users test strategies on historical and real-time data - empowering them to determine whether their strategies will perform as predicted once deployed. In the case of processing transportation data, besides the CEP engine, the data processing software should have additional functionality, such as a component for creating views on a data stream in such a way that relevant information to a view can be aggregated.
DAIPEX
4
30 June 2014
Public Document
3 Search Methodology The purpose of this section is to elaborate the steps that have been taken in our search procedure. For this purpose, we used systematic review, which is a research method to collect and assess the available knowledge regarding the specific topic based on some criteria. The review protocol that we used in this report was developed and executed according to the guidelines and hints provided by (Kitchenham et al., 2009). In the following of this section, firstly, we present our search strategy; secondly, we outline our selection criteria; and finally these criteria are assessed based on the number of results.
3.1 Search Strategy The strategy used in this review is to find as many relevant systems as possible and then narrow down the results by applying predefined selection criteria. In order to build a long list of CEP engines and BPM systems, we therefore search for both these terms and for technical synonyms of the mentioned terms. Survey paper is also searched in case that there are some other alternatives in the literature. Table 1 presents our search terms. Table 1 - Search Terms
CEP engine CEP engine implementation CEP engine architecture CEP engine historical data CEP engine and BPM system integration Survey of complex-event processing models
The next step was to define the search sources. In this report, we focus on the practical systems instead of academic literature. Therefore, we consider general search engine, Google, and Google Scholar, instead of using Academic databases as usually recommended in the research.
3.2 Selection Criteria The aim of this section is to show how we narrowed down the long list in order to remove the systems that were not pertinent to the research goal. We used a number of inclusion criteria (IC) that define when we consider a found work relevant. The inclusion criteria are summarized in Table 2. Table 2 - Inclusion and Exclusion criteria
Criteria Id IC1 IC2 IC3 IC4 IC5 IC6 IC7
Criteria Does the system have an executable CEP engine? Does the system have an open source license? Does the system have in-depth documentation? Does the system support XML files as input? Has the system been implemented in Java? Does the system have an active developer community for communication? Does the system handle historical data?
IC1 was the primary requirement. This criterion only took the systems with an executable CEP engine into account. In other words, it ignored the systems, which have, for example, only process designer functionality to model the processes or functionality to analyze the business without any execution engine.
DAIPEX
5
30 June 2014
Public Document
IC2 simply includes the systems that are freely available and can be extended by other developers. There is a slight deviation in this requirement, where some systems have both community edition for developers and commercial proprietary version for businesses. However we are aware of possible bugs in the open source systems, we prefer to have a system with a full open source license because of additional functionalities that are usually limited in the community editions. IC3 assesses the documentation of system, which is a very important factor for further development. IC4 considers the supporting file format. We prefer to have a system that can accept XML input file, since our goal is to execute the CEP engine that can read XML files generated by the logistics partners in the project directly. IC5 evaluates the programming language that has been used to implement the system. However we prefer to have a Java-based system for further development. IC6 is a very important criterion particularly for the open source systems. Since we predict some difficulties during the implementation phase, we prefer to be able to communicate with the core developer of the system in order to solve the bugs. IC7 is a valuable criterion from a practical perspective. When combined with a historical database and data management solution, CEP engines help users test strategies on historical and real-time data - empowering them to determine whether their strategies will perform as predicted once deployed.
4 Results After executing the search protocol and applying the first two criteria (IC1 and IC2), we found 8 systems that contain execution engines. Overall, for each CEP engine, we provide: - its name - the name of its developer - the web-site where it can be found - its license - which input format it supports - which programming language it uses - when it was last updated, or whether it is still under active development These results are presented in Appendix A. Some well-known companies cannot be found in this list since their systems could not meet the first two inclusion criteria (IC1 and IC2). Table 3 – Result Summary
Aspect Open Source edition JAVA-Based Active
CEP Engine (8) 7 7 6
Table 3 summarizes our results. It shows that out of the 8 engines that we found, all of them have CEP engines, while one of them also supports Business Process Management. Only one out of 8 does not use JAVA. Seven of them can be freely downloaded and six of them are under active development till now. Based on this analysis, we select the 4 engines that have all of the properties specified in Table 3.
DAIPEX
6
30 June 2014
Public Document
5 Comparison of Engines In this section, we review the advantages and limitations of the four selected systems in order to select the most appropriate candidate for the CEP engine component in. These are summarized in Table 4. Table 4 - Final shortlist of candidates Name
Web Site
Esper
http://www.espertech.com/products/
Oracle CEP
Licens Prog. Service e Lang.
Last updat e Active
Free JAVA Online downlo documentation& ad community Active http://www.oracle.com/technetwork/middleware/complFree JAVA Online downlo documentation& ex-event-processing/downloads/downloadsad community
086608.html WSO2 Complex http://wso2.com/products/complex-event-processor/ Event Processor SASE http://sase.cs.umass.edu/
Free JAVA Online Active downlo documentation ad Free JAVA Documentation Active downlo ad
5.1 Esper Goal: Esper Event Processing capabilities features both Event Series and Complex Event Processing in one self-contained solution. Its unique capabilities for event series filtering, continuous queries, aggregation and joins, and its event pattern recognition capabilities ensure rapid implementation of your business-critical situation detection scenarios. They turn large volume of disparate event series into actionable intelligence (Esper 2014). Advantages: Esper is an open-source component available under the GNU GPL license (GPL also known as GPL v2) (Esper 2014). The open-source nature of Esper helps in tailoring the event processing language and other community driven features. Esper is embeddable components written in Java and is therefore suitable for integration into any Java process. The components can run standalone in any development environment making development and testing much easier, while for the target production environment this makes it much more tailored to what you really need, or possibly have already in place. Esper can accept XML file as input format of event. Limitations: Esper Event Processing component cannot yet handle incorrect data or missing data. It requires to manipulate historical data.
5.2 Oracle Complex Event Processing Goal: Oracle Complex Event Processing, or Oracle CEP for short, is a low latency, Java based middleware framework for event driven applications. It is a light weight application server which connects to high volume data feeds and has a complex event processing engine (CEP) to match events based on user defined rules.
DAIPEX
7
30 June 2014
Public Document
Advantages: Oracle CEP has the capability of deploying user Java code which contain the business logic. Running the business logic within Oracle CEP provides a highly tuned framework for time and event driven applications (Oracle CEP 2014). Limitations: Oracle CEP is provided for evaluators under the OTN License Agreement.
5.3 WSO2 Complex Event Processor Goal: WSO2 Complex Event Processor identifies the most meaningful events within the event cloud, analyzes their impacts, and acts on them in real time. Built to be extremely high performing and massively scalable, it offers significant time saving and affordable acquisition. It is powered by WSO2 Siddhi (WSO2 CEP 2014). Advantages: WSO2 Complex Event Processor (CEP) is a lightweight, easy-to-use, open source Complex Event Processing server (CEP) available under Apache Software License v2.0. WSO2 CEP identifies the most meaningful events within the event cloud, analyzes their impact, and acts on them in real-time. It's built to be extremely high performing and massively scalable (WSO2 CEP 2014). It has powerful and Extensible Query Language for Temporal Event Stream Processing. It can Support Rich Event Model. WSO2 CEP is built up on the award-winning, WSO2 Carbon platform, which is based on the OSGi framework enabling better modularity for your service oriented architecture (SOA). Limitations: WSO2 Complex Event Processor does not claim to be able to process incomplete data. More importantly, it cannot be embedded into other JAVA process.
5.4 SASE Goal: SASE (Stream-based And Shared Event processing) research project is conducted by UC Berkeley and University of Massachusetts Amherst. SASE aims to design and develop an efficient, robust RFID stream processing system that addresses executes complex event queries over real-time streams of RFID readings encoded as events. It is designed to handle the data-information mismatch, incomplete and noisy data, and high data volume. In addition it enables real-time tracking and monitoring. Advantages: SASE provides the following features (SASE PROJECT 2014): A rich declarative event language
Formal semantics of the event language
Theoretical underpinnings of CEP
An efficient automata-based implementation
Limitations: SASE is not very well supported in the aspects such as documentation, discussion, community. The release version of SASE is quite old as well. SASE has not been widely adopted by other peers.
DAIPEX
8
30 June 2014
Public Document
5.5 Conclusion Based on our analysis, we can conclude that SASE, Oracle and Esper fulfill our requirements. We selected the Esper engine as the engine for further development, because it has an Open Source license that is not restricted to evaluation purposes, and because the Esper development team gives continuous and active support online.
6 Prototype We implemented a preliminary prototype versions of our data processing software which embeds the CEP engine to have an insight into the integration of an executable Esper engine and other data processing component. This practice helped us to see the real difficulties in this integration for processing transportation data. The Esper engine has been developed to address the requirements of applications that analyze and react to events. What these applications have in common is the requirement to process events (or messages) in real-time or near real-time. The Esper engine was designed to make it easier to build and extend CEP applications. However, the difficulty when we aimed to process transportation data came due to the fact that the transportation data are not real time data but historical data at the current stage. Hence the real time event processing was not going to help in our case. Instead, we converted historical transportation data into event stream concept in Esper. In the prototype, we considered three scenarios to process the transportation data.  Scenario 1: select the interesting parameters from all the transportation parameters so that we can filter the data and focus on the interesting parameters.  Scenario 2: cut the long transportation data into instances (where one instance is defined as one trip of a truck to pick up or drop of something) so that we can look into instances in the future tasks in the project.  Scenario 3: match patterns from the transportation data so that we can generate aggregated events based on the matched patterns in the future steps. The overall procedure is shown in Figure 1 and the current user interface is given in Figure 2. In the following subsections, we focus on these three scenarios and give the results as well.
Filter parameter Read in transportation data in one time
Put the data into stream
Cut instances
Output the result
Match pattern
Figure 1. The procedure to integrate Esper CEP engine with our data processing prototype. The procedure can be extended easily with other components in the future steps.
DAIPEX
9
30 June 2014
Public Document
Figure 2. the user interface where three scenarios are supported.
6.1 scenario 1 The original input data to the data processing software is in a CSV file. The table below gives an example of the appearance of the file. Table 5 - an example of data input file groupedtrucks.csv.
truck latitude longitude activity datetime 4442 5.187.730 456.853 0 1‐5‐2013 0:24 4442 5.193.292 453.752 0 1‐5‐2013 0:29 4442 5.194.012 446.693 0 1‐5‐2013 0:34 4442 5.196.495 440.486 0 1‐5‐2013 0:39 4442 5.202.170 436.611 0 1‐5‐2013 0:45 4442 5.207.255 439.304 0 1‐5‐2013 0:50 4442 5.206.676 438.073 0 2‐5‐2013 14:35 4442 5.201.670 431.509 0 2‐5‐2013 14:40 4442 5.201.014 429.140 51 2‐5‐2013 14:43 4442 5.201.013 429.135 61 2‐5‐2013 14:43 4442 5.199.126 423.857 0 2‐5‐2013 14:49 4442 5.199.697 423.795 21 2‐5‐2013 14:51 4442 5.199.475 423.754 13_DR 2‐5‐2013 14:54 4442 5.199.475 423.754 10_1_LAD 2‐5‐2013 14:54 4442 5.199.475 423.754 23 2‐5‐2013 14:54 4442 5.199.466 423.775 0 2‐5‐2013 14:59 4442 5.199.466 423.775 72 2‐5‐2013 14:59 4442 5.199.466 423.775 71 2‐5‐2013 14:59 4442 5.199.466 423.775 0 2‐5‐2013 15:14
DAIPEX
activity‐explanation Basic record Basic record Basic record Basic record Basic record Basic record Basic record Basic record Start of peak RPM limit violation End of peak RPM limit violation Basic record Task Accepted End of Drive Start of Load Task Busy Basic record Contact OFF Contact ON Basic record
10
30 June 2014
Public Document
When the user selects what are the interesting parameters from the user interface, shown in Figure 3, the CEP engine inside the data processing software will filter the transportation data according to the chosen parameters. The output result is demonstrated in Figure 4.
Figure 3. user can choose his/her interesting parameters from the list.
Figure 4. the output result of filtered parameters: truck id, activity code, and timestamp, in console.
6.2 scenario 2 Given the chosen parameters and the filtering result of the transportation data, we can focus on these interesting parameters and start to cut the continuous events into instances. Here each instance contains the same truck id and the time gap between any two consecutive events is less than or equal to 60 minutes. Hence after the user clicks the button of cut instance on the user interface, the output in console is shown in Figure 5. In the figure, the bold dash line in red is the place which is detected by the CEP engine to divide continuous data into two or more instances. We can see that the event above the red line and the event below the red line have a time gap larger than 60 minutes although these two events belong to the same truck id: 4442.
DAIPEX
11
30 June 2014
Public Document
Figure 5. the part of the output result after cutting the transportation data into instances in console.
6.3 scenario 3 In the transportation data, we have found several patterns of events which can explain better what a truck/driver is doing. For example, we have noticed a pattern that activity of stopping driving (activity=13_DR) sometimes happens before activity of starting loading (activity=10_1_LAD). This event pattern can tell us that the driver stops the truck and starts to load goods. The pattern matching result is demonstrated in the following figure. The pattern is matched twice. In the first matching result, activity of stopping driving (activity=13_DR) happens right before activity of starting loading (activity=10_1_LAD). In the second matching result, activity of stopping driving (activity=13_DR) happens somewhere before activity of starting loading (activity=10_1_LAD). This observation tells us that there may exist other activities between stopping driving (activity=13_DR) and starting loading (activity=10_1_LAD). Based on our first experience of the pattern matching, we can see that the chosen engine can easily apply other event patterns in the future stage.
DAIPEX
12
30 June 2014
Public Document
Figure 6. the output result of pattern matching of activity=13_DR happening before activity=10_1_LAD in console.
In summary, the integrated Esper CEP engine includes the following features: ‐ Showing the filtering of transportation data according to the chosen parameters through user interface; ‐ Showing the cutting of continuous transportation data into instances; ‐ Showing the pattern matching of any interested event pattern; ‐ Being able to extend the engine to more scenarios.
7 Conclusion In this report, we have reviewed complex event processing engines from practical point of view. Based on a search protocol, we have found eight engines in total. Four engines are compared in more detail, e.g., if it supports JAVA language, open source, etc. Among these CEP engines, only one of them has claimed embeddable components written in Java and is therefore suitable for integration into any Java process, open source, and sound documentation and good support. Therefore we have chosen this CEP engine, Esper, as our final candidate. In addition we have built up our data processing prototype software with Esper engine integrated. The event processing capability of Esper engine has been demonstrated in three scenarios in the end.
DAIPEX
13
30 June 2014
Public Document
References Chandrasekaran, Sirish, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. (2003). "TelegraphCQ: continuous dataflow processing." In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 668-668. ACM, 2003. D. C. Luckham and B. Frasca. (1998). Complex event processing in distributed systems. Technical Report CSL-TR-98-754, 1998. URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.876. Esper. (2014). Esper [Online]. Available: http://www.espertech.com/products/. Accessed 2014-0619. KITCHENHAM, B., PEARL BRERETON, O., BUDGEN, D., TURNER, M., BAILEY, J. & LINKMAN, S. (2009). Systematic literature reviews in software engineering–a systematic literature review. Information and software technology, 51, 7-15. Oracle CEP. (2014). Oracle Complex Event Processing [Online]. Available: http://www.oracle.com/technetwork/middleware/complex-eventprocessing/downloads/downloads-086608.html. Accessed 2014-06-09. SASE PROJECT.( 2014). SASE [Online]. Avalable: http://avid.cs.umass.edu/sase/index.php?page=home. Accessed 2014-06-10. WSO2 CEP. (2014). WSO2 Complex Event Processor [Online]. Available: http://wso2.com/products/complex-event-processor/. Accessed 2014-06-10.
DAIPEX
14
30 June 2014
Public Document
Appendix A Name
Esper
C E P
Oracle CEP
Oracle
http://www.oracle.com/technetwork/middleware/complex-eventprocessing/downloads/downloads-086608.html
Triceps
Sergey Babkin
http://triceps.sourceforge.net/
WSO2 Complex Event Processor
WSO2
http://wso2.com/products/complex-event-processor/
Siddhi
University of Moratuwa
http://siddhi.sourceforge.net/
Apache LicenseJAVA
10/07/ 2012
SASE
http://sase.cs.umass.edu/
Free download JAVA Documentation
StreamBase CEP
University of Massachusetts StreamBase Systems
http://www.streambase.com/products/streambasecep/?doing_w Trial kit Restricted p_cron=1401971625.5564069747924804687500
JAVA Online documentation
03/201 1 Active
streamdrill
streamdrill
https://streamdrill.com/
JAVA Online documentation
DAIPEX
B Developer P M EsperTech
Web Site
License
http://www.espertech.com/products/
Free download JAVA Online documentation&co mmunity Free download JAVA Online documentation&co mmunity Free download C++ No official and documentation Perl Free download JAVA Online documentation
download Demo Free download
Prog. Service Lang.
Last updat e Active
Active
23/06/ 2013 Active
Active
15