SmartSociety Hybrid and Diversity-Aware Collective Adaptive Systems When People Meet Machines to Build a Smarter Society Grant Agreement No. 600584
Deliverable D6.2 Working Package WP6
Static social orchestration: implementation and evaluation Dissemination Level (Confidentiality):1 Delivery Date in Annex I: Actual Delivery Date Status2 Total Number of pages: Keywords:
1
PU 31/12/2014 18/12/2014 F 47 compositionality, social orchestration, abstract architecture
PU: Public; RE: Restricted to Group; PP: Restricted to Programme; CO: Consortium Confidential as specified in the Grant Agreeement 2 F: Final; D: Draft; RD: Revised Draft
c SmartSociety Consortium 2013-2017
2 of 47
Deliverable D6.2
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
Disclaimer This document contains material, which is the copyright of SmartSociety Consortium parties, and no copying or distributing, in any form or by any means, is allowed without the prior written agreement of the owner of the property rights. The commercial use of any information contained in this document may require a license from the proprietor of that information. Neither the SmartSociety Consortium as a whole, nor a certain party of the SmartSocietys Consortium warrant that the information contained in this document is suitable for use, nor that the use of the information is free from risk, and accepts no liability for loss or damage suffered by any person using this information. This document reflects only the authors’ view. The European Community is not liable for any use that may be made of the information contained herein.
Full project title:
Project Acronym: Grant Agreement Number: Number and title of workpackage: Document title: implementation and evaluation Work-package leader: Deliverable owner: Quality Assessor: c SmartSociety Consortium 2013-2017
SmartSociety: Hybrid and Diversity-Aware Collective Adaptive Systems: When People Meet Machines to Build a Smarter Society SmartSociety 600854 WP6 Compositionality and Social Orchestration Static social orchestration: Michael Rovatsos, UEDIN Michael Rovatsos, UEDIN XXX, DFKI 3 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
List of Contributors Partner Acronym UEDIN UEDIN TUW UOXF UEDIN
4 of 47
Contributor Dimitrios I. Diochnos Michael Rovatsos ˇceki´c Ognjen S´ Kevin Page Pavlos Andreadis
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
Executive Summary This document summarises the work performed in WP 6 of the SmartSociety during the second year of the project toward achieving a first implementation and evaluation of a static social orchestration architecture based on the description of the deliverable D6.1. We describe the design and implementation of a SmartSociety orchestrator peer that follows a specific model of orchestration but is generic in terms of the domain in which it might be deployed. A preliminary experimental evaluation shows that our design is promising in terms of scalability and robustness. Our report includes extensive detail on actual implementation, including a full description of the orchestration API.
c SmartSociety Consortium 2013-2017
5 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Table of Contents 1 Introduction
8
2 Orchestration Architecture
11
2.1
The SmartSociety Event Loop . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2
Models, Resources, Interactions, and Versioning . . . . . . . . . . . . . . . . 12 2.2.1
Versioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3
Orchestration Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4
Platform Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1
Job Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2
IDs, Enqueueing Platform Jobs and Activity Token Update . . . . . 17
2.5
Provenance by Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6
Priorities, Semaphores and Synchronisation . . . . . . . . . . . . . . . . . . 19
2.7
2.6.1
Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6.2
Lifecycle of Orchestration Resources . . . . . . . . . . . . . . . . . . 20
2.6.3
Synchronisation Example . . . . . . . . . . . . . . . . . . . . . . . . 21
Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Orchestration Workflows
22
3.1
Common Platform Jobs across Workflows . . . . . . . . . . . . . . . . . . . 24
3.2
Asynchronous Read-Only Operations . . . . . . . . . . . . . . . . . . . . . . 24
3.3
Creating Task Requests and Initiating Composition . . . . . . . . . . . . . . 25
3.4
Composition
3.5
Negotiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6
Deletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.7
Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8
A Default Smart Society Orchestrator . . . . . . . . . . . . . . . . . . . . . 28
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Evaluation
30
5 Conclusion
32
A Further Operations
33
A.1 Administrative Monitoring Operations . . . . . . . . . . . . . . . . . . . . . 33 A.2 Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 A.3 Delegated Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
B Orchestration Patterns B.1 Composition Patterns . . . . . . . . . . . . . B.2 Deletion Patterns . . . . . . . . . . . . . . . . B.2.1 Deletion in Full-Negotiation Scenarios B.2.2 Deletion in Crowdsourcing Scenarios . B.3 Negotiation Patterns . . . . . . . . . . . . . . B.4 Execution Patterns . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
C Orchestration API C.1 Application Orchestration . . . . . . . . . . . . . C.1.1 Task Requests . . . . . . . . . . . . . . . C.1.2 Tasks . . . . . . . . . . . . . . . . . . . . C.1.3 Task Records . . . . . . . . . . . . . . . . C.2 Composition Manager . . . . . . . . . . . . . . . C.3 Negotiation Manager . . . . . . . . . . . . . . . . C.4 Deletion Manager . . . . . . . . . . . . . . . . . . C.5 Execution Manager . . . . . . . . . . . . . . . . . C.6 Further Remarks and Functionality . . . . . . . . C.6.1 Administration and Monitoring . . . . . . C.7 The Current Setup in the SmartShare prototype
c SmartSociety Consortium 2013-2017
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . . .
. . . . . .
35 35 35 35 36 37 37
. . . . . . . . . . .
38 38 38 40 41 43 43 44 45 46 47 47
7 of 47
c SmartSociety Consortium 2013-2017
1
Deliverable D6.2
Introduction
This report summarises the outcomes of the WP6 work on the implementation and evaluation of a static social orchestration architecture in year 2 of the project. This work is a continuation of the more conceptual and theoretical investigation into such architectures performed in year 1, and documented in deliverable D6.1. The main objective of both efforts was to develop methods that allow a lightweight composition of collaboration processes in HDA-CAS, which will provide the “scaffolding” which connects component methods delivered by WPs 2, 3, 4, and 5 at a conceptual level, and feeds into the systems and software engineering work of WPs 7 and 8 by providing solid conceptual and algorithmic underpinnings for the software implementations and programming methods developed in those work packages. Before proceeding with the exposition of our new research results, it is worth recalling the main elements of our abstract model introduced in D6.1 and their motivation. There, we started from a purely abstract model of social computation, where different human and machine computations are performed in a network of interconnected peers in order to provide an overall systems functionality. Based on this very general model, we then proceeded to identify a specific architecture that identifies a number of more concrete computations that can be used to orchestrate such a social computation, i.e. peer discovery, task assignment, task execution and feedback. We introduced a mathematical signature for this orchestration model, and linked it to the decision-making situation of the autonomous peers involved in the computation, thus relating it to preferences, incentives, and motivations. This provided the blueprint for the orchestration architecture we would realise in subsequent work, and indeed provide in the present report. The second major contribution of the first phase of WP6 research was the development of design principles for social orchestration systems that were intended to maximise scalability and robustness. These principles, summarised in what we called the “play by data” architecture provide requirements for orchestration architectures that enable compositionality in large-scale systems at the level of systems interoperability, communication, and asynchronous, distributed processing. In this report, we describe the design of a SmartSociety orchestration manager (or orchestrator, for short), which can be viewed both as a (machine) peer that is in charge of organising social computations using the conceptual model of an orchestration architecture provided in our abstract specification, and as a platform that enables organising collaboration among other peers (including task peers that contribute to the execution of the manager peers that contribute functionality to the orchestration process itself) by managing the process and data flow between them. This “dual” nature of the orchestrator is important: On the one hand, we view it as an “intelligent software agent” that has to make decisions regarding its own activities, e.g. which communication requests from others to serve at what time, which internal and external sub-procedures to trigger under what circumstances, and what control structure to apply over them, and how to manage internal and external computational resources. This provides us with ways of producing adaptive and proactive social orchestration methods in the future (as required for the future tasks of WP6) whose flexibility and range of behaviours exceeds that of traditional workflow 8 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
orchestration systems. On the other hand, by considering it from a “platform-as-a-peer” perspective, it provides an initial design for a “playnode” in the Play-By-Data sense that contains an integrated programming interface and communication facility for completing the whole orchestration cycle and producing complex collaborative tasks. This means that our orchestrator can be used as a platform that delivers, essentially, the functionality of an entire SmartSociety application through a single Web server which user peers can interact with as “thin” clients (i.e. they can be very minimalistic playnodes that do not offer any services themselves), and which integrates other manager agents (at the moment the peer manager, reputation manager and provenance service as external services, and the negotiation manager and composition manager as internal “sub-orchestration” peers) as simple Web services. This opens up avenues for lightweight hierarchical and parallel composition of social computations, as we could deploy an arbitrary number of instances of our orchestrators (each with their own different local implementations, based on the general design described in this document) which could, in turn, collaborate on higher-level computation tasks, orchestrated by higher-level orchestrators, etc, within a single architecture. In a sense, this results in a “self-similar” design of the overall SmartSociety systems architecture, which supports scale-free composition of social computations in HDA-CAS. It is worth expanding a bit more on this issue, as it is important for understanding the notion of “architecture” in SmartSociety. In the context of this document (and, in fact, the whole work package), the term (more specifically, orchestration architecture) refers to the data and process model implemented by our orchestration peer, and the shared communication infrastructure assumed for interactions with, within, and between this peer and other peers required for its functionality. In the broader context of the overall project, it refers to the software framework developed by WP8, which enables the integration of various components from individual technical work packages, and over which WP7 will develop programming abstractions. In this context, the orchestration architecture will be just one component provided by WP6 to the overall framework. Equally fundamental is the notion of “task”, which is closely related to the multilevel compositionality alluded to above. In the context of the orchestration architecture, this is something much more specific than a “general purposeful activity that potentially involves actions contributed by various peers”; it is the central object that represents a collaborative computation and which is explicitly represented as such in an orchestrator. This means that while there may be other activities in the system like internal operations of the orchestrator (which we call platform jobs below, e.g. starting a negotiation process) and third-party services used for orchestrations (e.g. a reputation service), these are not explicitly requested, negotiated, agreed to, or tracked during execution in the same way as a (domain) task, but instead are needed to provide the basic functionality of orchestration. This is not to say at the next level “up” they couldn’t be explicated as tasks themselves (in a reified domain of “orchestration”), but any multi-level hierarchical composition of tasks needs to be “grounded” in a level-0 orchestration backend. It is this backend functionality that is provided by the orchestrator peer design presented in this document, which is domain-independent but follows our previously defined single model of orchestration. Some aspects of this design could be reused, in princic SmartSociety Consortium 2013-2017
9 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
ple, to implement any other orchestration model: At the general processing level, our orchestrator reactively iterates over an asynchronous event processing loop, and serves “jobs” maintained on dynamically adapting process queues. In terms of communication, all interaction is via linked data resources using only basic HTTP operations. Provenance tracking, authentication, and access control are enabled for any of these interaction processes, regardless of whether they involve user peers or third-party services. What is specific to our orchestration model are the specific workflows used by the orchestrator, and the decision procedure that determines how processing queues are managed. In collaboration with TUW and UH, we have adapted this backend functionality so that the overall design of orchestration and the orchestration API can better integrate with other platform components. This modified design, which is in line with the formal model of WP2, is general enough and allows us to cover the two extremes of the social computation spectrum that we can see at the moment: that of full negotiation scenarios (as in ridesharing) and that of crowdsourcing scenarios (as in Ask Smart Society!). Further, with the same partners, we have implemented orchestration primitives using the SmartCom communication middleware that is described in D7.1. It is important to point out that what is not provided in this document is a “workflow modelling language”, as we focus much more on the implementation and evaluation of the orchestration process rather than on the formal modelling of the process. The formal computational models for these processes will be provided by WP2, and an early version that is fully aligned with our presentation here is given in milestone MS4. This alignment has taken place during the development of the SmartShare ridesharing prototype (of which our orchestrator design is a generalisation, and which we also use for our preliminary evaluation presented below), and knowledge of the formal model of orchestration presented there is necessary to understand the material presented in this document. It is also worth noting that task execution is not managed in detail yet in the orchestration stage. This is simply due to the fact that, at the time of writing, more fine-grained integration with functionality provided by WP3 and WP5 has not taken place yet (though in both cases, arguably some of that functionality concerns more the user interface than the orchestration procedure; but see milestone MS14 for further discussion of how this information could be useful for future adaptive orchestration). Thus, some of the workflows described below serve only as “hooks” for adding this functionality later and have only been realised at a rudimentary level so far. Last but not least, we also investigated alternative collaboration models appropriate for complex tasks involving large-scale collectives of human-based peers. The remainder of this report is structured as follows: In section 2, we introduce the overall orchestration architecture design in terms of the processing loop it involves, the resources it maintains, and the basic data structures and mechanisms it uses for concurrent processing and synchronisation. Section 3 describes the workflows used in our orchestration model, how they are managed and synchronised, and includes a “default” design for a simple orchestrator logic that determines how different sub-tasks are dynamically prioritised based on the current system state by the orchestrating peer. In section 4, we present preliminary evaluation results obtained in the ridesharing domain through simulation experiments, which mainly assess the scalability and robustness of our platform and have 10 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
produced very promising initial results. Three appendices are included at the end of this report that provide more implementation-level detail on our current social orchestration prototype. In appendix A, we cover auxiliary functionality not described in the main body of the report. Basic patterns for the various stages of orchestration are described in appendix B. Appendix C, finally, provides a specification of the overall, and a brief description of the components integrated in our implemented SmartShare orchestrator (that has been tested and evaluated in the ridesharing domain).
2
Orchestration Architecture
We start by giving an overview of the orchestration architecture. Before we delve into the specifics of orchestration we attempt to give a brief summary of the interactions of peers with the platform. The orchestration service is exposed to the rest of the world from specific URIs; see Appendix C for the API. Every incoming (http) request is translated to a platform job. Each job is loaded to the appropriate orchestration queue. This is the starting point for the execution of the appropriate workflow so that the job can be served. The orchestrator may have many tasks that are waiting to be served in its queues. Hence, orchestration relies on monitoring those queues and deciding from which queue to pick a platform-job. Upon selecting a job from a queue, there is either orchestration-related computation that needs to take place, or the job is delegated to an appropriate service for execution. In either case, the results of the computation are returned to the orchestrator and depending on the workflow of the service, a new job may be loaded into one of the orchestration queues of the application that is providing the service. Eventually, the appropriate step of the execution of the workflow is achieved, and the peer/user that posted the original http request to the platform receives the response from the platform.
2.1
The SmartSociety Event Loop
Orchestrating a SmartSociety application happens in an asynchronous event-driven framework so that many connections can be handled concurrently. Different orchestration queues are associated with different events that can occur in a SmartSociety application. These queues are populated by platform jobs which have all the necessary information that is needed for execution as well as for tracing provenance. Typically the queues and the events (functions) where the computation actually takes place have the same name unless stated otherwise. Hence, there is an event loop that drives the behaviour of the orchestrator; see Figure 1. We note that, in principle, at the end of processing of every platform job, the orchestration manager (or orchestrator, for short) may enqueue more than one platform jobs for further processing. Further, some of these jobs might be delegated to other components of the platform where the actual job processing is expected to take place. In particular, there are four main sub-components that are used by the orchestrator: the composition manager, the negotiation manager, the deletion manager, and the execution manager ; see Figure 2. We will refer to these managers as sub-orchestration managers. c SmartSociety Consortium 2013-2017
11 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Orchestration Queues
process
fetch job
Orchestrator
Figure 1: The event loop of a SmartSociety application. Orchestrator
Composition Manager
Negotiation Manager
Deletion Manager
Execution Manager
Figure 2: The orchestrator and the sub-orchestration managers of a SmartSociety application
2.2
Models, Resources, Interactions, and Versioning
Below we describe the models and the resources that are used for orchestration. The documents in the different collections serve different purposes. We will now attempt to give a glimpse of the use of these resources and how they are affected by the different interactions that are allowed by the system. Exposed Resources. The exposed resources contain the documents that are needed for the interactions of peers/clients with a SmartSociety application. These collections are task requests, tasks, and task records. Figure 3 shows these resources. The exposed resources allow the core functionality mechanisms for SmartSociety applications; please refer to deliverable D2.2 and milestone MS4 for details on the definitions of these notions. At a high level, and without loss of generality, task requests can be seen as expected goals for collaborative work among different peers. Composition blends these task requests and gives rise to specific tasks where some of the details of the goal 12 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Task Requests
Tasks
Task Records
Figure 3: Resources exposed by the orchestrator states may have been altered so that they are actually feasible goal states at the level of the collective that is associated with them (interpreted as a team of participants in our current orchestration model). Thus, negotiation takes place at the level of a task, and the actual execution of the task is recorded on the relevant task record that has been generated by the system once the agreement has been reached. As a side note, changing ones preferences happens either at the level of the peer/user-profile (and thus the peer manager is responsible for storing such a change), or at the level of task requests. Further, the resources are designed in such a way that task requests have single owners, while tasks and task records can be accessed only by the members of the collective that appears at the level of the tasks or task records. Complementary Resources. The complementary resources contain additional information regarding the exposed resources. Figure 4 gives an overview.
Task Request Complements
Task Complements
Task Record Complements
Figure 4: The complementary resources that have additional information that might be needed by the orchestrator or the sub-orchestration managers. Following our approach outlined in deliverable D6.1, SmartSociety aims to provide lightweight applications that follow current standards in Web programming. This implies that we want the exposed resources to be able to provide all the relevant information that is needed to design a client-side logic. However, on the server side, it is often the case that we need to store additional satellite information for the various documents that are exposed to the clients. For example, the composition manager may want to have handy information related to the composition that has been attempted in the past for a specific task request so that in the future, further composition of that task request with other task requests can be sped up. This idea and functionality extends to all the exposed resources by allowing a complement to be stored for each one of them. Auditing Resources. The auditing resources contain information related to the operations that were performed by the sub-orchestration managers for the jobs that were delegated to them. Figure 5 gives an overview of these. c SmartSociety Consortium 2013-2017
13 of 47
c SmartSociety Consortium 2013-2017
Compositions
Deliverable D6.2
Deletions
Negotiations
Executions
Figure 5: The resources that are created by the different sub-orchestration managers. The information contained in them is for auditing and provenance purposes. These resources, one per sub-orchestration manager, are generated so that we can have a full trace of the outcome of the various computations that take place on the different sub-orchestration managers, i.e. they can be seen as summaries of these computations. A property prov on these documents contains all the relevant information that is needed in order to track provenance of the executed processes and their outcomes. Housekeeping Resources. The housekeeping resources contain information necessary for the functionality of the orchestration managers as well as for profiling platform jobs that are executed on them. Figure 6 gives an overview.
Queues
Semaphores
Activity Tokens
Job Profiles
Figure 6: Resources that are needed for various housekeeping jobs of the orchestrator as well as of the sub-orchestration managers. The largest collection in this set is the collection of job profiles. Documents in the job profiles collection contain information regarding the execution of the various jobs; a causal relationship of the workflow that has been executed up to that particular platform job is given together with timings (cpu time, i/o time, queueing time, etc) for all the events in the workflow that led to that particular job. Such documents can be very useful for administrative purposes such as monitoring the behaviour and the responsiveness of the system, and can ultimately be used for an adaptive load-balancing mechanisms that can be implemented on the orchestrator. Regarding the other resources that fall under this group, only the activity tokens actually persist. The idea is that the various events that are processed by the SmartSociety application are given unique IDs and it is these tokens that store the enumeration of the various activities that take place on the system. Thus, these tokens are useful when a SmartSociety application is brought back online, so that the enumeration can continue at 14 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
the point were it was taken offline. Such a collection has one document per SmartSociety application, where the various activity tokens are stored in a single document. Finally, for queues and semaphores all that one really needs is a model of what is supported by each orchestrator. We will provide further information on these models below. We have one model for each case (queues and semaphores) per (sub-)orchestration manager. 2.2.1
Versioning
In some situations it is necessary for the system to store the complete history of a document throughout time. For this purpose a property revision appears in such documents, indicating which version of the document one is looking at. By default, all the collections in the exposed resources have equivalent collections with the versioned history of each document. Thus, for example, in the tasks collection, we can find the latest version of the tasks in the system, while in the versioned tasks collection we can find all the different versions of the tasks of the system. At present we keep all the versions of the documents that are found in the collections for complementary resources. Such versions allow enhanced auditing mechanisms and perhaps can prove useful for recording further provenance information in the future. The documents in the collections for auditing resources are never versioned, since they are created once and never modified by the appropriate sub-orchestration managers. This is also the case with job profiles since we have one document per activity that has been executed by the system. Finally, we do not version activity tokens.
2.3
Orchestration Queues
The various application queues for orchestration can be divided into the following categories depending on the jobs that need to be served: 1. administration and monitoring, which are, to a big extent, complementary to the housekeeping queues (see below), 2. authentication and privacy, 3. delegated services provided by other peers, e.g. the reputation service,
8. deletion, 9. execution, 10. composition, where the task-requests are used to produce tasks that will be negotiated and eventually executed by human or machine peers,
4. read-only operations, 5. new task requests (see D2.2),
11. changing preferences,
6. validation,
12. consequences of special services, and
7. negotiation,
13. housekeeping queues.
c SmartSociety Consortium 2013-2017
15 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
The above categories of queues naturally imply templates for provenance reasons, i.e. one template for each category. However, the orchestrator actually deals with a finer level of granularity regarding queues. For example, we have already described two distinct operations that fall naturally under the housekeeping queues which are (i) update activity tokens and (ii) compute and store platform job profiles. We will comment more on this as we proceed with our description.
2.4
Platform Jobs
Platform jobs are the elements that are stored in the orchestration queues. Figure 7 gives an example of the translation that occurs in the platform on a URL that is exposed and allows negotiation to be performed on a specific task. api.put("/applications/:app/tasks/:taskIndex", function (request, response) { var newJob = { "application" : app, "kind" : "authentication", "id" : activityTokens.authentication, "purpose" : "negotiateOnATask", "inputAdapter" : request, "outputAdapter" : response, "user" : request.remoteUser.username, "credentials" : getCredentials(request), "ip" : getIP(request), "taskRequestIndex" : null, "taskRequestVersion" : null, "taskIndex" : taskIndex, "taskVersion" : getTaskVersion(request), "query" : null, "createdOn" : new Date(), "enqueued" : new Date(), "triggeredBy" : null }; queues [app][newJob.kind].push(newJob); activityTokens[app][newJob.kind]++; requestActivityTokensUpdateWhenLoadIsLow(function () { orchestrator(); }); });
Figure 7: Translating an HTTP request to a platform job In the simple scenario shown in Figure 7 an HTTP request is translated into a job for the platform and is loaded onto the appropriate queue (here, authentication). The activity tokens that are associated with each queue guarantee that we can identify the tasks uniquely (see Section 2.2) and thus have the ability to trace the entire history of a job. Finally, the orchestrator is called since a job is waiting to be served in one of 16 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
its queues. In principle, platform jobs give rise to additional platform jobs in the system. Hence, any platform job can be seen as a node in a singly-linked list where the child-parent relationship is given by the triggeredBy property of each platform job; see Figure 7. Triggered Jobs. We will be using extensively the term triggered (platform) job to explain causal relationships in the workflows between different processes. A triggered job is not necessarily served immediately. Rather, it is, in the general case, created and enqueued in the appropriate queue. It is always the responsibility of the orchestrator to select which platform job to serve next, and therefore a call to the orchestrator is made at the end of every process. 2.4.1
Job Profiling
At the end of execution of every job we can perform job profiling to gather useful information about a job that has been just served by the SmartSociety platform. For example, we can track error messages sent to a client, or we can identify IPs that need to be blacklisted for security purposes. Mostly, however, they will be used to measure processing time for load-balancing purposes, to identify bottlenecks and for further systems optimisation.. Figure 8 gives an example of the information that is stored in the job profiles database that is maintained by the system. 2.4.2
IDs, Enqueueing Platform Jobs and Activity Token Update
Every platform job that is executed by a SmartSociety application has a compound key by which it can be identified uniquely in time. This key can be obtained by looking at the properties application, kind and ID of each platform job. For every platform job profile the ID is a string that is the concatenation of the individual values of the compound key of the respective platform job that is being profiled 3 , see, for example, property id in the JSON document shown in figure 8. We will now revisit figure 7 to look at the last three commands that are made, and which are also shown below. queues [app][newJob.kind].push(newJob); activityTokens[app][newJob.kind]++; requestActivityTokensUpdateWhenLoadIsLow(function () { orchestrator(); }); In this example, the first line pushes the new platform job in the appropriate queue for orchestration. Since every process that runs on a SmartSociety application has a unique ID, the second command increments the relevant activity token (integer) for the platform job that was just enqueued. The last call looks at the queue update activity tokens, and if the queue is empty, creates a platform job indicating that the activity tokens of the system have changed and need to be persisted when the load is low 4 . Regardless 3 4
the last platform job that appears in the workflow of a platform job profile In other words, the queue update activity tokens is a queue of size at most 1 at any point in time.
c SmartSociety Consortium 2013-2017
17 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
{ "_id": "agreeOnTask-0", "kind": "agreeOnTask", "platform_id": 0, "success": true, "error": 200, "errorMessage": "", "user": "SmartAgent", "ip": "abc.def.ghi.jkl", "createdOn": 1405099139505, "time": { "other": 3, "profile": 0, "i/o": 41, "cpu": 24, "auth": 396, "queued": 3, "duration": 467 }, "workflow": [ { "_id": "53c01c83927b668c1b94e94c", "other": 0, "profile": 0, "i/o": 0, "cpu": 396, "queued": 1, "duration": 397, "platform_id": 58, "kind": "authentication"}, { "_id": "53c01c83927b668c1b94e94b", "other": 0, "profile": 0, "i/o": 0, "cpu": 11, "queued": 1, "duration": 12, "platform_id": 0, "kind": "validateTaskDocument"}, { "_id": "53c01c83927b668c1b94e94a", "other": 0, "profile": 0, "i/o": 0, "cpu": 3, "queued": 1, "duration": 4, "platform_id": 0, "kind": "validateTransition"}, { "_id": "53c01c83927b668c1b94e949", "other": 0, "profile": 0, "i/o": 41, "cpu": 10, "queued": 0, "duration": 51, "platform_id": 0, "kind": "agreeOnTask"}] }
Figure 8: Example of a job profile, for a job in which a user agrees on a task that has been proposed by the system. Note that this provides a complete breakdown and profiling of the workflow that follows until the agreement job is completed. of whether or not such a platform job is loaded on the update activity tokens queue, the application orchestrator orchestrator is called in the end using a callback. Thus, in the next step, the orchestrator will decide which queue to dequeue a platform job from and perform the necessary computation. This is how profiling the platform jobs becomes a crucial operation for the orchestrator and how load balancing enters the picture: the orchestrator acts as an online algorithm that has to perform job prioritisation and adapt to the load of the system.
2.5
Provenance by Design
Platform job descriptions provide all the information that will be used as an input for actually processing the job. Additionally, further information is available in every platform job so that decisions can be made by the orchestrator as to which jobs to create and prepare to serve. In order to capture provenance at the level of orchestration we need to account 18 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
for the following items of information: • inputs to the various platform jobs, • ownership and child-parent relationships between jobs, • unique IDs for the various jobs that are processed, • time measurements for the various operations while processing platform jobs, • summaries of the results of the various jobs. The first two properties are captured in the notion of a platform job. Unique IDs can easily be created by using activity tokens. Timing measurements are naturally computed for profiling platform jobs (see section 2.4.1). Finally, the platform jobs together with the rest of the resources that are designed and are available to the system capture the results and consequences of processing the various platform jobs. All this taken together results in a fully provenance-enabled orchestration mechanism.
2.6
Priorities, Semaphores and Synchronisation
As discussed in section 2.3, our orchestration architecture offers a natural partitioning of the different queues into different groups. Each queue is associated with a priority value which indicates how urgent processing the jobs in this particular queue is. Using a machine learning approach like decision tree based learning for dealing with the prioritised nature of the various orchestration queues seems like a natural approach for automating job prioritisation, which we might pursue in the future. In the remainder of this section, we provide general overview of synchronisation mechanisms that are used in our architecture in the context of various orchestration processes. Later, in section 3 the different workflows that are supported by the orchestrator will be described in a more systematic fashion, and their definitions will lead to an overall specification of a general orchestration algorithm that will allow a SmartSociety orchestrator to deal with the platform jobs that are loaded on the various queues at any point in time. After this, we will combine these components in section 3.8, which presents the “default” concrete decision procedure used by a SmartSociety orchestration manager. To start off with, we present a somewhat more structured formalisation of the job selection algorithm. Let Q be the the set of the different orchestration queues such that |Q| = q . Moreover, let pt : Q → N be a priority function where pt (i) indicates the priority of queue i at time t. Further, at any time t, let `t (i) ∈ N indicate the load of queue i at time t, and let χt (i) = 1 if `t (i) ≥ 1 and 0 otherwise; i.e. χt : Q → {0, 1} an indicator function which is 1 if there is at least one job waiting to be served in the queue at time t. For simplicity we will omit the subscript t in the above functions unless it is not clear from the context. Algorithm 1 presents a first approach on selecting a job to be processed from the orchestration queues, where pt is a bijection, i.e. all queues have different priorities (the simplest case). In other words, when a call Pis made to the orchestrator, a platform job will be processed in the next clock tick if j χ(j) = 1, i.e. if there is at least one platform job waiting to be served in some queue. c SmartSociety Consortium 2013-2017
19 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Algorithm 1: A basic job selection algorithm Input: No input; a call imade to the orchestrator Output: The queue from which to serve the available job in the next clock tick 1 for i ← 0, 1, . . . , (q − 1) do /* iterate over priorities */ −1 −1 −1 2 if χ p (i) then return p (i); /* p maps priorities to queues */ 3
return NIL ;
2.6.1
/* No jobs can be served */
Semaphores
Unfortunately, an approach as the one in algorithm 1 is not sufficient for orchestration purposes. The associations between task requests, tasks and task records are data-driven (all these resources are represented as linked data) where changes in a document in one collection may potentially change documents, and this synchronisation has to be managed explicitly. As a consequence, race conditions can easily arise during the orchestration of different jobs, and we use semaphores in order to address this issue. A platform job that is loaded on queue j is said to be a job of type j. Let s(i) ∈ N a semaphore associated with queue i ∈ {0, 1, . . . , q − 1} indicating the number of platform jobs of type i that are being processed by the system at time t. Further, let c(i) ∈ N∗ for i ∈ {0, 1, . . . , q − 1} the maximum number of platform jobs of queue i that can be processed in parallel by the application. One can think of c(i) as the number of cores that are dedicated to serving jobs that belong to queue i. Instead of querying χ(j) for every queue j as in algorithm 1, we will use expressions that depend on these parameters and potentially also take parameters referring to other queues into account. In what follows and unless stated otherwise, we will assume that c(i) = 1 for i ∈ {0, 1, . . . , q − 1} for simplicity, but our considerations below can be adapted to more computing resources per queue. The following semaphores are provided in our implementation and can be used for orchestration purposes: • newTaskRequests
• negotiation
• execution
• composition
• deletion
• updateActivityTokens
We explain their use in the following section.
2.6.2
Lifecycle of Orchestration Resources
In section 2.2 we gave a brief sketch of the interactions that can occur in the system and how these are associated with various orchestration resources. We will now review some of these interactions in order to indicate the necessity and usefulness of the semaphores that are associated with the various queues. In Section 3 we will examine the workflows that are supported by the orchestrator and which, taken together, constitute a full description of our generic prototype for orchestrating platform jobs. 20 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
Lifecycle of Exposed Resources. The three exposed orchestration resources are shown in Figure 3: Task requests are submitted to the system by peers or users. Assuming the documents contain valid information with respect to the application at hand, these will be stored and trigger a platform job for composition. However, task requests arrive at the system continuously, and as a result of this, other task requests may be augmented with tasks that are the result of future composition platform jobs triggered by future task requests. In that sense, task requests are in a composition phase until an agreement on a task has been reached. Moreover, composition takes place in parallel for multiple task requests, and it is the orchestrator’s responsibility to synchronise them based on new information that is available either by other composition platform jobs or negotiations that may take place in parallel. Once agreement is reached on a task associated with a task request, this task request enters its final state, the agreed phase. After this point execution can be performed and feedback can be submitted to the agreed task(s). Tasks are treated in a similar way to task requests. Each task is the outcome of some composition job, and can be in five different states: The potential phase indicates that a task satisfies certain “hard� constraints for all the participants involves, but that the quality of the task (as predicted by the orchestrator based on, inter alia, feedback and reputation for the peers involved) does not fulfill the quality expectations of at least one participant. The negotiable phase indicates that a task is ready for negotiation but none of the involved entities has agreed yet to the task. The under negotiation indicates that at least one peer has agreed on the task. The agreed phase indicates that the negotiation has been completed successfully, and the the task requests associated with the task now transition from the composition to the agreed. The other terminal state of a task is becoming invalid. A task can be rendered invalid for many reasons, for example if one of the involved participants rejects it during the negotiation process. Task records contain information on tasks that have been agreed and can be executed in the domain. As the execution phase is not covered by orchestration (yet), we have not introduced different phases for task records, but most likely, this is either going to be only one phase, the execution phase, or different (sequential or not) phases that have been generated by the orchestrator during the composition of the associated task that is tied to the particular task record. Lifecycle of Complementary and Auditing Resources. Figures 4 and 5 the complementary and auditing resources generated during orchestration. We do not distinguish phases for these resources, since they either provide complementary information for the various exposed resources (which peers cannot interact with directly), or, in the case of auditing resources, since these are never changed after they are generated. 2.6.3
Synchronisation Example
We are now ready to give an example of synchronising resources and the use of semaphores. Recall that our orchestration platform guarantees that different task requests are stored in the system in the order in which they arrive and moreover they obtain unique, sequentially ordered IDs. Now consider the case where an application receives two new task requests c SmartSociety Consortium 2013-2017
21 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
almost simultaneously. These requests need to be persisted. For this reason, a temporary ID is given to the first of them upon arrival and persistence is attempted. However, not all such attempts will be successful. For example some task requests may contain malformed information, miss data that is important for the underlying model of the task request in the specific application, etc. The newTaskRequests is used in this process, so that while this is on, the other task request waits to be served in its queue, which will eventually happen when the I/O operation for the first task request finishes. This is important since I/O operations are of an asynchronous nature, and without semaphores we would attempt to store both task requests simoultaneously, but if persisting the first one was not successful, then we would reach a situation where the IDs of the task requests would not be sequential as expected.
2.7
Scalability
Partitioning platform jobs into different groups suggests possible natural approaches to scaling. For example, jobs that only require read operations can be served by different machine peers, given that they all have access to the peer manager database. However, there are jobs that are more involved like, for example, negotiation, where successful agreement on a task may have an impact on other, seemingly unrelated, tasks that have been produced through composition. In such cases, depending on the platform job workflows that we are going to support, we may find some interesting challenges.
3
Orchestration Workflows
This section provides an overview of all individual workflows involved in our orchestration system, which are first described one by one in different sections at a general level, before completing our exposition with the integrated design of a “default� SmartSociety orchestration peer that integrates them. In describing workflows we use a graph-based notation as shown in figure 9. Such graphs use nodes to denote distinct processesing steps in the workflow, e.g. process-A, process-B, profile, and prov in this example. Moreover, there is an orchestration queue with the same name for every such process that appears in the workflow. The direction of the arrows indicates which platform job triggers which one; for example in Figure 9 process-A triggers process-B. In the example, the execution of a process-A involves the following steps: 1. Dequeue a platform job from the queue process-A. 2. Record the time of dequeuing and start time of execution of process-A. 3. Perform the operation of process-A, which may involve further asynchronous operations. 4. Upon completion of process-A, record the end time of execution. 5. Create the following platform jobs and load them into the appropriate queues: 22 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
profile ↑ process-A ↓ prov ↓ profile
−→
process-B
Figure 9: Workflow notation example: once process-A completes execution, process-B is triggered; a process profile is triggered by default on completion of every core workflow process like process-A or process-B. A process prov related to tracking provenance is triggered at the end of ever core workflow process in order to track relevant provenance information, which is also profiled by a process profile. • If there is another step in the main chain of the workflow, prepare the respective job, record the creation timestamp, enqueueing timestamp and enque this platform job in the respective queue of the application orchestrator. • A platform job profile to compute and store the (running-time) profile of process-A that was just served. • A platform job prov to store the provenance information related to process-A. 6. Upon loading the relevant platform jobs from the previous step enqueue, if necessary, one additional platform job for requesting an update on the activity tokens. Recall from Section 2.4.2 that the relevant queue of requesting updates for the activity tokens has size at most one, since the actual update of the tokens in the database occurs only when the load on the system is low. In the workflows discussed below, we will use the following abbreviations for commonly used resources: TR for task requests, TRC for task request complements, T for tasks, TC for task complements, TD for task records, TDC for task record complements, and CR for client (HTTP) requests. c SmartSociety Consortium 2013-2017
23 of 47
c SmartSociety Consortium 2013-2017
3.1
Deliverable D6.2
Common Platform Jobs across Workflows
There is a number of platform jobs that are common among the different orchestration workflows. These are: auth: used for authentication purposes, prov: used for submitting provenance information to the provenance store, profile: used for computing the profile of a platform job, and update activity tokens: used for updating the tokens of various orchestration activities. At the level of the application orchestrator, auth refers to the authentication that needs to take place for the needs of the orchestrator alone. For example, as we will see in Section 3.3, when a peer posts a task request, the orchestrator first validates the authentication credentials that were provided together with the task request through the peer manager. We use auth to refer to such authentication mechanisms that occur for internal purposes inside the platform. It is worth noting that we do not use queues such as priv for access control purposes. Rather, the access control mechanism is implicit for operations that require it; e.g. read a specific task. Regarding update activity tokens, we will refrain from showing such calls in the workflows that will follow, as a job is prepared and loaded on the update activity tokens queue once another job has been loaded in another queue in a strictly sequential fashion. The reason for this is that loading a job to a queue modifies the activity tokens used by the application, and almost all activities in the system have a unique ID for provenance reasons. The only exception to this rule are the update activity tokens jobs themselves. As concerns priorities for the above queues we assume that auth < prov < profile < update activity tokens . This ordering implies that the orchestrator will first attempt to process auth jobs, then prov jobs, then profile jobs, and finally update activity tokens jobs (when essentially the load is going to be very low on the system). In what follows, similarly to our omission of update activity tokens processes, we omit profile and prov processes for brevity.
3.2
Asynchronous Read-Only Operations
Our architecture allows users and peers to read various resources for purposes of observing aspects of the current state of the process. These read-only operations can occur concurrently without any further effects on the system. The workflow for such operations is shown in figure 10. Regarding the priorities on these read-only operations we can be flexible. We choose to treat ETags as more urgent than the actual resources they represent and then the resources themselves 5 . Beyond these, we also allow users and peers to fetch 5
In other words, HEAD operations are considered to be more urgent than GET operations.
24 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
CR
−→
auth
−→
read-only
Figure 10: Workflow for read-only operations sets of resources in one step, thus requiring only a single authentication in the process (see appendix C.6). These are typical operations that occur when a client loads for the first time. There are many orderings that the above ideas imply, but for simplicity we consider serving sets as more urgent than individual resources (or their ETags) and moreover serving task requests as more urgent than serving tasks. Hence, with ro indicating read-only, we have the following ordering of priorities: roPersonalTRs < roSetOfTRs < roETagOfTR < roTR < roSetOfTasks < roETagOfTask < roTask . As mentioned above, the access control mechanism is embedded in the read-only process.
3.3
Creating Task Requests and Initiating Composition
The workflow for creating task requests is shown in figure 11. This has composition triggered as a result of a new task request, if this was successfully submitted to the system. CR
−→
auth
−→
new TR and TRC
−→
composition
Figure 11: Workflow for new task requests; a successful response is sent back to the client at the end of the process new TR and TRC. The client (HTTP) request CR is authenticated through the process auth. An access control mechanism is applied inside the process new TR and TRC to check whether this is a permissible operation for the respective client. The process new TR and TRC validates the content sent by the client and attempts to persist the new task request TR and its associated task request complement TRC. For composition and negotiation reasons, it is crucial to store satellite information on the database regarding every task request and it is this information that is stored in the complement. Once this process is complete, the response is sent back to the client with a link to the task request that the client has just posted and was accepted by the system. The system now triggers composition so that the new task request can be matched with other task requests, possibly using additional information from the peer manager and from the reputation service. In terms of priorities, we have auth < new TR and TRC for this workflow. In terms of synchronisation, as new TR and TRC attempts to lock a computational resource, it will attempt to increase the semaphore newTaskRequests by one so that the actual execution of the job can take place. Similarly, at the end of execution, the semaphore newTaskRequests is decreased by one and a computational resource is released. c SmartSociety Consortium 2013-2017
25 of 47
c SmartSociety Consortium 2013-2017
3.4
Deliverable D6.2
Composition
At the level of the application orchestration, at the moment, the composition can be viewed as a single process as shown in figure 12. However, internally, we have calls defined by the new TR and TRC
−→
composition
Figure 12: Composition workflow, triggered every time a new task request has successfully been posted to the system). orchestration API (see appendix C) where the actual job of composition is delegated to a composition manager. Composition is expected to be a CPU-intensive operation which may also inolve calls to the peer manager as well as to the reputation service. In a system where we use only a single processing thread for all orchestration managers, composition jobs are therefore the least urgent jobs. If on the other hand, at least one dedicated CPU (or core) is available for composition, delegating the job to the composition manager should have high priority so that the necessary computations can start earlier, in order to ensure the system is more responsive. As regards synchronisation, a composition job will be delegated to the composition manager only if the semaphore is below the threshold allowed by the composition manager. This threshold essentially indicates the number of composition jobs that the composition manager can process concurrently at any point in time. The composition manager matches task requests, and it may retrieve information from the peer and reputation managers for this purpose (e.g. to only match requests with each other if the requesting parties “like” each other, or if their user profiles match the constraints in a request). This is not covered by our current static orchestration architecture, but initial work toward building more involved task recommendation algorithms that could be used by an adaptive composition manager is described in milestone MS14. In our currently implementation, composition simply creates all tasks that are possible in principle given the hard constraints listed in requests and returns all of them as potential solutions to the requesting peers.
3.5
Negotiation
The workflow for negotiation is shown in Figure 13. CR
−→
auth
−→
negotiation
Figure 13: Negotiation workflow The negotiation process that takes place in the end of the workflow is shown in figure 13 and involves a call to the negotiation manager. Synchronisation is treated in the same way 26 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
accept % negotiation
−→
validate doc
−→
validate transition & reject
Figure 14: The two possible workflows supporte by the negotiation manager as for composition (a semaphore called negotiation is used). The negotiation manager executes one of the two workflows that are shown in figure 14. The negotiation manager uses validate doc < validate transition < accept < reject as a priority ordering.
3.6
Deletion
The workflow for deletion is shown in Figure 15. CR
−→
auth
−→
deletion
Figure 15: Deletion workflow The deletion process that takes place at the end of the workflow that is shown in figure 15 involves, per the orchestration API (see Section C), a call to the deletion manager. Synchronisation is managed in the same way as above, using the deletion semaphore. The deletion manager is responsible for handling deletions and synchronising the consequences of such deletions. This is further explained in deliverable D7.2. An overview of its functionality is given in appendix B.2.
3.7
Execution
The workflow for task execution is shown in Figure 16, and it mainly involves a call to the execution manager (this is very similar to deletion in spirit). The execution semaphore CR
−→
auth
−→
execution
Figure 16: Execution workflow is is used for synchronisation purposes which is called execution. The execution manager is responsible for monitoring execution and potentially synchronising the consequences of various executions. Further information can be found in deliverable D7.2. c SmartSociety Consortium 2013-2017
27 of 47
c SmartSociety Consortium 2013-2017
3.8
Deliverable D6.2
A Default Smart Society Orchestrator
We are now ready to present our â&#x20AC;&#x153;defaultâ&#x20AC;? SmartSociety orchestrator which assumes that orchestration runs on a single core of a CPU. Algorithm 2 gives a sketch of the orchestration process for making the decision which platform job to serve on the next clock tick. Apart from the processes described above, this involves several further tunneling, delegation, and housekeeping processes that are described in the annexes. Note that althout the decision procedure looks like a list of choices, it actually represents a more complex decision tree, since many queues can be grouped together under one condition, thus requiring at least one additional branching condition for determining precisely which queue to pick the next job from. Algorithm 2: A sketch of the default orchestrator. Input: No input; a call is being made to the orchestrator. Output: Determining which platform job to serve on the next clock tick. 1 if (admin or auth job waiting to be served) then process(appropriate job); 2 else if (tunnel job waiting to be served) then process(appropriate tunnel job); 3 else if (job waiting to be delegated) then process(appropriate delegated job); 4 else if (read only job waiting to be served) then process(appropriate read only job); 5 else if (may process a deletion job) then process(appropriate deletion job); 6 else if (may process a negotiation job) then process(appropriate negotiation job); 7 else if (may process a new TR job) then process(new TR job); 8 else if (profile job waiting to be stored) then store(profile job); 9 else if (activity tokens update job waiting to be served) then process(update activity tokens job); 10 else if (may compose new job) then process(composition job);
Below we give algorithms that can be used to determine the truth value of the conditions in lines 5-7 and 10 this algorithm. More specifically, algorithm 4 presents the method used for determining the truth value of the condition in line 5. Algorithm 5 describes a method for determining the truth value of the condition in line 6, algorithm 3 is used to determine the condition in line 7, and a Algorithm 5 does the same for line 10. Algorithm 3: Determining whethere a new task request can be processed. The variable sT R is a boolean semaphore indicating if a new task request is being served, while `(new TR) denotes the load on the queue for new task requests. 1 2 3
if sT R then return false; if `(new TR) > 0 then return true; return false
28 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
Algorithm 4: Determining whether a deletion job can be processed. The variables sD , sN and sC are boolean semaphores used for deletion, negotiation and composition respectively, while `(delete) denotes the load on the queue for deletions. 1 2
if (sD ∨ sN ∨ sC ∨ (`(delete) == 0)) then return false; return true
Algorithm 5: Determining whether a negotiation job can be processed. The variables sD , sC and sN are boolean semaphores used for deletion, composition and negotiation respectively, while `(x) denotes the load the load on queue x. 1 2 3 4 5 6
7
if sD ∨ sC then return false; if ¬sN then if `(validate doc) > 0 then return true; else return false;
/* negotiation can start */
else /* already under negotiation */ if (`(validate transition) > 0) ∨ (`(reject) > 0) ∨ (`(accept) > 0) then return true; else return false;
Algorithm 6: Determining whether a composition job can be processed. The variables sT R , sD , sN and sC are boolean semaphores used for new task requests, deletion, negotiation and composition respectively, while `(composition) denotes the load on the composition queue. 1 2 3 4 5 6
if sD ∨ sN ∨ sC ∨ sT R then return false; if (may process new task requests) then return false; if (may process deletion) then return false; if (may process negotiation) then return false; if `(composition) > 0 then return true; return false;
c SmartSociety Consortium 2013-2017
/* Algorithm 3 */ /* Algorithm 4 */ /* Algorithm 5 */
29 of 47
c SmartSociety Consortium 2013-2017
4
Deliverable D6.2
Evaluation
To establish whether the scalability and robustness improvements we expect from the orchestration model we have developed can be actually observed in real operation, we have evaluated its prototypical implementation experimentally in the ridesharing domain, which uses the team task orchestration model as described in milestone MS4 of WP2. Our experiments below focus on the matchmaking and negotiation part of the protocol, as this involves most dependencies among individual behaviours, and requires involves solving a complex combinatorial problem for the composition manages that involves calculating exponential numbers of possible rides presented to every driver and passenger. For these experiments, we have built a small distributed application that simulates peers entering the system and posting their tasks. All agents periodically poll the resources they are interested in to check what the currently available rides are, and to determine whether a ride has been agreed/can no longer be agreed. In terms of which contains a non-blocking associated with
the orchestration platform, our implementation involves a single server deals with all operations on a single CPU. The server runs Node.js6 , event-driven JavaScript library, and implements the processing queues different stages of orchestration as described above.
Our first experiment examines the overall scalability of the platform. We create artificial “groups” of size k in a population of n peers such that all the task requests inside a group “match”, and we can artificially control how many rides will be created (i.e. tasks have to be composed in this “matchmaking” process). This experiment involves up to 10 groups of 6, 9, and 12 peers, i.e. a total of 60, 90, 120 peers, where the ratio of drivers d to passengers p is 1/2 (i.e. p/d ∈ {2/4, 3/6, 4/8} for each group size). Note that the respective number of possible rides generated in each group is (2p − 1) ∗ d as there is a different proposal for every subset of passengers, and the rides different drivers may offer to a group overlap. This means that 30/189/1020 rides have to be created for each group, i.e. the system has to deal with up to 10200 rides overall as we keep adding groups. Note also that, since all ride (=task) requests and agreements to rides occur in very close succession, the load of this system is similar to a real-world system that would experience this level of usage every few minutes (in reality, of course, users take much much longer to check updates and respond), so it is in fact representative of a very large scale real-world application. Finally, to maximise the amount of messages exchanged and the duration of negotiation, drivers accept only the maximally sized ride, and passengers accept all rides. The top two plots in figure 17 show the average time taken in seconds (across all peers, and for 20 repetitions for each experiment, with error bars to indicate standard deviations) for composition and negotiation (all further messages up to the call to the execution manager), respectively. As can clearly be seen from these plots, even though every user peer has a built-in delay of 2 seconds between any two steps, even when when there are 120 peers in the system, the average time it takes a peer to get information about all rides acceptable to her/complete the negotiation of a ride is around 50s/80s even in the largest configurations. 6
http://nodejs.org/
30 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Figure 17: Experimental results
c SmartSociety Consortium 2013-2017
31 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
In the second experiment, we investigate the cumulative effect of adding delays and message failures on the total execution time of an entire negotiation for a ride, in order to assess how robust the system is. For this, we artificially increase the delay between any update a peer receives and its successive operation from 2s to 5s, 10s, and 20s. We use these artificial delays also to emulate failure, e.g. when network resources are temporarily unavailable. The bottom plot in figure 17 shows the results for this experiment, for a group size of 9 and 5 groups (45 peers in total), showing measurements for composition, negotiation, and the total lifespan of an agent (from task request creation to agreement). As can be seen, the overall lifespan of an orchestration increases by a factor of 3 to 4 here when the delay increases by a factor of 10, which is a good indication that the system degrades gracefully under increasing perturbation. Moreover, what is interesting is that the time taken for negotiation, which involves the highest number of messages to the orchestrator (as all passengers accept all rides) only increases by a factor between 1.5 and 2. This is because the larger delays require less effort for task composition, and the orchestrator has more time to process negotiation-related messages during these gaps. This nicely illustrates how separating the processing of different queues leads to effective load balancing for any orchestrator that has to engage in different interactions concurrently.
5
Conclusion
This document has provided a detailled description of the static social orchestration architecture developed in SmartSociety. This architecture is purely data-driven, provides a flexible and generic way of organising collcaboration in HDA-CAS, and initial evaluation results indicate that it results in robust and scalable implementations of SmartSociety systems. The design, implementation and evaluation of this architecture completes the first cycle of research performed in WP6, and provides the foundation for developing adaptive orchestration architectures in the second cycle.
32 of 47
http://www.smart-society-project.eu
Deliverable D6.2
A
c SmartSociety Consortium 2013-2017
Further Operations
Even though basic functionality of Smart Society applications is given by the workflows presented in the main body of this report, in real-world applications one requires many more operations to be orchestrated. For completeness, we present these in this appendix.
A.1
Administrative Monitoring Operations
A SmartSociety application should provide monitoring functionalities that are available only to the administrators of the application. Examples of such operations that are supported at the moment are the following: • obtaining an overview of the state of the application, the orchestrator and the individual resources, • obtaining specific job profiles, • retrieving information by using predefined stored procedures; for example retrieving a five number summary of the job profiles for a specific queue, and • retrieving previous versions of a specific resource regardless of its owner. The workflows are similar to the asynchronous read-only operations that we described earlier in section 3.2. The difference is that – apart from versioned task requests, tasks, and task records – the access control mechanisms should only allow administrators of the application to perform such operations 7 . Regarding the priorities of the different jobs in this category, we use the following: adminOverview < adminJobProfile < adminStoredProc adminNumTQs < adminNumTasks < adminNumTQCs . (1) adminversionedTQ < adminversionedTask < adminversionedTQC Finally, we note that the common platform jobs that were described in section 3.1 are essentially administrative jobs that are taken care of automatically by the system.
A.2
Tunnels
A SmartSociety application can provide tunnels to services provided by third parties. In our current approach we provide such tunnels for authentication and registration of new users. This means that we allow the clients to communicate directly with the peer manager by simply forwarding requests and responses. The workflow for such operations is shown in figure 18. 7 Of course, versioned task requests, versioned tasks, and versioned task records should be accessible not only by administrators, since these are resources that either belong to a user or a peer, or were generated by the system and access is allowed to certain users or peers so that the Smart Society application can actually work.
c SmartSociety Consortium 2013-2017
33 of 47
c SmartSociety Consortium 2013-2017
CR
Deliverable D6.2
−→
tunnel
Figure 18: Tunneling workflow
Regarding the priorities of the two different operations that are allowed through the tunnel (authentication and registration), registration is thought to be more urgent, in a sense that responsiveness is more important for new users who join the platform. This implies tunnelregistration < tunnelauthentication for the priorities in tunneling jobs
A.3
Delegated Services
A SmartSharing application may provide serices that are actually delegated to third parties. One such instance in our current approach is the reputation service, where the application orchestrator plays the role of the middle man between the client and the reputation service. The workflow for such operations is shown in figure 19. In princiCR
−→
auth
−→
delegation
Figure 19: Workflow for delegated jobs ple one can have a queue for each delegated service regardless of the number of API calls that are provided by the-third party service. For illustration purposes, however, we provide an additional refinement on the creation of distinct queues for calls that are using different HTTP verbs. Here, GET operations are mere read operations and are treated as more urgent than POST operations. Hence, regarding priorities we have delegatehead < delegateget < delegatedelete < delegateput < delegatepost . The rationale is similar as in the case of tunnels: At one end of the spectrum we have HTTP operations such as HEAD and GET, which are expected to be the most lightweight in terms of required CPU time for processing them, with head operations being the most lightweight of the two. At the other end of the spectrum, POST operations are expected to be the most costly operations since new information is provided with them and further operations might be triggered in the light of this new information. Thus, POST operations are treated as the least urgent. PUT operations can be meaningful for negotiation protocols for example. However, it is expected that PUT operations are less CPU-intensive compared to post operations. DELETE operations are considered to be more urgent than PUT and POST operations since we may actually save CPU time when PUT/POST operations pending in parallel which otherwise could take into account the deleted resource. Finally, based on this logic, in the case of multiple delegated services we cluster the priorities according to the above inequalities that are based on the verb that is being used for the communication, rather than the services themselves. 34 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
B
Orchestration Patterns
This section introduces specific patterns that are provided by default for orchestration in our current implementation. Note that finalising these patterns within the overall scope of SmartSociety applications is still work in progress. The material presented here should therefore be read in conjunction with deliverable D7.2.
B.1
Composition Patterns
At the moment we support the composition patterns listed below. â&#x20AC;˘ Create one collective (team) for every subset of matched task requests (i.e. one task for each sub-group of participants involved in a task that is a solution for their task requests).
B.2
Deletion Patterns
The orchestration service supports the following deletion pattern. Delete operations are allowed at the level of task requests. Upon receiving such a deletion request, the property deleted on the task request complement is incremented by 18 . B.2.1
Deletion in Full-Negotiation Scenarios
Delete operations can be issued by the owners of the task requests or application administrators. The delete operation is accepted, assuming that agreement has not been reached in a task that is associated with the task request for which deletion has been requested. All the tasks that were associated with the task request that has been deleted are rendered invalid by changing the type of the links pointing to these tasks to invalid. Note that here we have an instance of weak consistency, since the documents of the affected tasks do not change. However, the links on the parent task requests pointing to these tasks are now characterised as invalid. As a consequence, upon successful deletion, the deletion resource that is created by the Deletion Manager lists the following: 1. The ID of the task request that was deleted. 2. The error code to be returned to the client. 3. The set of tasks that were rendered invalid (note that some tasks may had already been invalid in the deleted task request before the deletion took place). 8
A task request that has not received a DELETE signal has value 0.
c SmartSociety Consortium 2013-2017
35 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
4. The set (not multiset) of the affected task requests. This is the set of all the parent task requests of all the tasks that were rendered invalid during deletion. 5. A link pointing to the deletion resource that is written down on the task request complement for future retrieval of the outcomes of the delete operation. In case of unsuccessful deletion, the deletion resource that is created by the Deletion Manager lists the following: 1. The ID of the task request that was deleted. 2. The error code to be returned to the client. Consequences Upon successful acceptance of a delete operation by the deletion manager, the orchestrator returns a 404 error code on all subsequent HHTP operations by the resource owner9 . However SmartSociety administrators of the application still have access to the resource for HEAD and GET operations. Similarly, client-side HTTP requests of the form GET /applications/:app/taskRequests/?peer=:peer do not include the deleted task requests in the JSON responses. Further, all the tasks of the deleted task request are now invalid and thus negotiation (PUT) can no longer take place in these tasks. Undelete operations are not allowed by any peer, neither regular peers nor admin peers. B.2.2
Deletion in Crowdsourcing Scenarios
The first idea is to follow the paradigm of StackOverflow, where deletions are essentially votes for deletion, and this is why we associate a numerical value to the deleted property in the complement of the task request. Thus, once the deleted property reaches a pre-defined deletion threshold, the task request is locked and only GET operations are subsequently allowed by regular peers. When the task request becomes locked, then all associated tasks are characterised as invalid execution stops. Again we have weak consistency when the task request enters its locked state. As a consequence, upon successful deletion, the deletion resource that is created by the Deletion Manager lists the following: 1. The peer who issued the delete operation. 2. The ID of the task request that was deleted. 3. The error code to be returned to the client. 4. The set of tasks that were rendered invalid (might be empty if the deletion threshold has not been reached yet). 9
A 403 Forbidden is returned to all other non-admin peers for all HTTP operations anyway.
36 of 47
http://www.smart-society-project.eu
Deliverable D6.2
c SmartSociety Consortium 2013-2017
5. The set (not multiset) of the affected task requests (again it might be empty if the deletion threshold has not been reached yet). This is the set of all the parent task requests of all the tasks that were rendered invalid during deletion. 6. A link pointing to the previous deletion resource (might be null) that is associated with the delete operations that are taking place on the specific task request. 7. A link pointing to the deletion resource that is stated in the task request complement so that the retrieval of the outcomes of the delete operation can easily be retrieved in the future. Depending on the desired level of flexibility, we could keep incrementing the counter deleted forever, or have delete operations be rendered unsuccessful when one attempts to issue a delete operation on a task request that has reached its deletion threshold and is thus already locked. Consequences HEAD and GET operations on a locked task request are allowed indefinitely. However, PUT operations are not allowed. The treatment of DELETE operations will depend on the considerations above. Finally, undelete operations can be supported if we want to provide such functionality. In this case, the task request complements should be exposed resources to admins, where GET and PUT operations are allowed only for them.
B.3
Negotiation Patterns
At the moment we support the patterns listed below. â&#x20AC;˘ Any participant of a task may reject the task, thus rendering the task invalid for the entire collective. â&#x20AC;˘ All the participants in a task agree so that we can finalise the agreement for the collective.
B.4
Execution Patterns
No specific execution patterns have been defined yet.
c SmartSociety Consortium 2013-2017
37 of 47
c SmartSociety Consortium 2013-2017
C
Deliverable D6.2
Orchestration API
This final appendix describes the implemented orchestration API.
C.1
Application Orchestration
For convenience we split the API intp different sections. C.1.1
Task Requests
We start with task requests. The most basic operations are listed in the following table: verb POST GET GET HEAD GET DELETE
URI /applications/:app/taskRequests /applications/:app/taskRequests/?user=:user /applications/:app/taskRequests/:taskRequestID /applications/:app/taskRequests/:taskRequestID /applications/:app/taskRequests/:taskRequestID/v/:version /applications/:app/taskRequests/:taskRequestID
Create Task Request: POST /applications/:app/taskRequests This is the main URI where new task requests are posted. The JSON object describing the task request is expected in the body of the request. On success a platform call to the composition manager will be made. Access Control. Success.
Any peer or user.
Returns error code 201 together with
â&#x20AC;˘ a JSON document of the form { data: aURI } where aURI is the URI where the client can retrieve (assuming authentication and access control policies have no issues) the latest version of the task request that has been posted, and optionally â&#x20AC;˘ an ETag for the JSON object of the response. Failure. Returns an error code accompanied by an optional error message explaining the failure; e.g. 403 (forbidden), 500 (internal server error), etc. 38 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Comment. Upon success a call is made to the composition manager that will create the associated tasks based on the current state of the system. Get Task Requests of User: GET /applications/:app/taskRequests/?user=:user No parameters are expected apart from authentication purposes (in the header). Access Control. Success.
The peer user or an admin.
Returns error code 200 together with
• a JSON document of the form { data: [[userTaskRequestsURIs], [associatedETags]] } that has the list of the task requests referring to the specific user together with the associated ETags for those links, • an ETag for the JSON object of the response. Failure.
Returns error code 403 together with the error message ”Forbidden”.
Get a Task Request: GET /applications/:app/taskRequests/:taskRequestID No parameters are expected apart from authentication information (if needed). Access Control.
The owner of the task request or an admin.
Success. Returns error code 200 together with the JSON document of the latest version of the task request accompanied by the ETag of the document. Failure.
Returns an error code (403, 404) together with an optional error message.
Get the Head of a Task Request: HEAD /applications/:app/taskRequests/:taskRequestID Similar to GET /applications/:app/taskRequests/:taskRequestID except that the body returned is empty. It just returns the ETag of the latest version of the task request to indicate if there has been a change to the document and thus we need to retrieve its latest version. The access control policy is similar as above; the owner of the task request or an admin can perform the operation. c SmartSociety Consortium 2013-2017
39 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Get a Specific Version of a Task Request: GET /applications/:app/taskRequests/:taskRequestID/v/:version eters are expected apart from authentication purposes (if needed). Access Control.
No param-
The owner of the task request or an admin.
Success.
Returns error code 200 together with the specific version of the task request.
Failure.
Returns an error code together with an optional error message.
Delete a Task Request: DELETE /applications/:app/taskRequests/:taskRequestID This is the main URI for deleting task requests. No parameters are expected apart from authentication information (in the header). A platform job is prepared and is posted to the deletion manager. Access Control.
The owner of the task request or an admin.
Success.
Returns error code 204.
Failure.
Returns an error code (403, 404, or 500).
C.1.2
Tasks
Tasks are generated through composition (more on that below). The most basic operations related to them are listed in the following table: verb GET HEAD GET PUT
URI /applications/:app/tasks/:taskID /applications/:app/tasks/:taskID /applications/:app/tasks/:taskID/v/:version /applications/:app/tasks/:taskID
Get a Specific Task: GET /applications/:app/tasks/:taskID Similar to GET /applications/:app/taskRequests/:taskRequestID but referring to tasks. No parameters are expected apart from authentication information (if needed). Access Control.
The participants of the task or an admin.
Success. Returns error code 200 together with the JSON document of the latest version of the task accompanied by the ETag of the document. 40 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Failure.
Returns an error code (403, 404) together with an optional error message.
Get the Head of a Task: HEAD /applications/:app/tasks/:taskID Similar as above but the body of the response is empty. Essentially this is an easy way for the clients to figure out if the resource has changed. Same access control policy as above. Get a Specific Version of a Task: GET /applications/:app/tasks/:taskID/v/:version No parameters are expected apart from authentication information (if needed). Access Control.
The participants of the task or an admin.
Success.
Returns error code 200 together with the specific version of the task.
Failure.
Returns an error code together with an optional error message.
Negotiate on a Task: PUT /applications/:app/tasks/:taskID The main call for negotiation which will trigger an additional platform call to the negotiation manager. Expects the new version of the document of the task taskID. A platform job for negotiation is prepared and is posted to the negotiation manager. Access Control.
The participants of the task or an admin.
Success. Returns error code 200 together with the new version of the task as is dictated by the negotiation manager. Failure. C.1.3
Returns an error code together with an optional error message.
Task Records
Task records are generated by the orchestrator once execution can start on a specific task. The most basic operations are listed in the following table: verb GET HEAD GET PUT
URI /applications/:app/taskRecords/:taskRecordID /applications/:app/taskRecords/:taskRecordID /applications/:app/taskRecords/:taskRecordID/v/:version /applications/:app/taskRecords/:taskRecordID
c SmartSociety Consortium 2013-2017
41 of 47
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Get a Specific Task Record: GET /applications/:app/taskRecords/:taskRecordID No parameters are expected apart from authentication information (if needed). Access Control.
The participants of the task or an admin.
Success. Returns error code 200 together with the json document of the latest version of the task record accompanied by the ETag of the document. Failure.
Returns an error code (403, 404) together with an optional error message.
Get the Head of a Task Record: HEAD /applications/:app/taskRecords/:taskRecordID No parameters are expected apart from authentication information (if needed). The body of the response is empty. This is another convenience function which allows an easy way for the clients to figure out if the resource has changed. Same access control policy and error codes as above. Get a Specific Version of a Task Record: GET /applications/:app/taskRecords/:taskRecordID/v/:version No parameters are expected apart from authentication purposes (if needed). Access Control.
The participants of the task or an admin.
Success.
Returns error code 200 together with the specific version of the task.
Failure.
Returns an error code together with an optional error message.
Provide Execution Feedback: PUT /applications/:app/taskRecords/:taskRecordID The main call for execution which will trigger an additional platform call to the execution manager. Expects the new version of the task record document taskRecordID. A platform job for execution is prepared and is posted to the execution manager. Access Control.
The participants of the task or an admin.
Success. Returns error code 200 together with the new version of the task as dictated by the execution manager. Failure. 42 of 47
Returns an error code together with an optional error message. http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
C.2
Composition Manager
The composition manager provides the following functionality: verb POST GET
URI /applications/:app/compositions /applications/:app/compositions/:compositionID
Perform Composition: POST /applications/:app/compositions Expects the platform job with the description, for which the main ingredient is the new task request that has arrived on the platform. Access Control.
The orchestrator for the application app can make such a call.
Returns. The call always succeeds and generates a resource describing the outcome of composition. Upon completion it returns an error code 201 and the link to the document with the results of composition. Part of the description of the document with the results of the composition is the error code and message that is returned through the call POST /applications/:app/taskRequests to the client. Get Composition Results: GET /applications/:app/compositions/:compositionID pected.
No parameters are ex-
Access Control. The orchestrator for the application app or an admin can make such a call. Success. Returns error code 200, the JSON document with the description of the results of the composition together with the associated ETag for the document. Failure. message.
Returns an error code (e.g. 404 not found) together with an optional error
Comment. Normally such a call is expected to happen only once from the application orchestrator once the latter has received the 201 error code that the composition that was requested has been performed.
C.3
Negotiation Manager
The negotiation manager provides the following functionality. c SmartSociety Consortium 2013-2017
43 of 47
c SmartSociety Consortium 2013-2017
verb POST GET
Deliverable D6.2
URI /applications/:app/negotiations /applications/:app/negotiations/:negotiationID
Perform Negotiation: POST /applications/:app/negotiations Expects the platform job with the description, for which the main ingredient is the task on which negotiation is being performed. Access Control.
The orchestrator for the application app can make such a call.
Returns. The call always succeeds and generates a resource describing the outcome of negotiation. Upon completion it returns an error code 201 and the link to the document with the results of the negotiation. Part of the description of the document with the results of the negotiation is the error code and message that is returned through the call PUT /applications/:app/tasks/:taskID to the client. Get Negotiation Results: GET /applications/:app/negotiations/:negotiationID No parameters are expected. Access Control. The orchestrator for the application app or an admin can make such a call. Success. Returns error code 200, the JSON document with the description of the results of the negotiation together with the associated ETag for the document. Failure. message.
Returns an error code (e.g. 404 not found) together with an optional error
Comment. Normally such a call is expected to happen only once from the application orchestrator once the latter has received the 201 error code that the negotiation that was requested has been performed.
C.4
Deletion Manager
The deletion manager provides the following functionality: verb POST GET
44 of 47
URI /applications/:app/deletions /applications/:app/deletions/:deletionID
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
Perform Deletion: POST /applications/:app/deletions Expects the platform job with the description, for which the main ingredient is the task request on which deletion is being performed. Access Control.
The orchestrator for the application app can make such a call.
Returns. The call always succeeds and generates a resource describing the outcome of deletion. Upon completion it returns an error code 201 and the link to the document with the results of the deletion. Part of the description of the document with the results of the deletion is the error code that is returned through the call DELETE /applications/:app/taskRequests/:taskRequestID to the client. Get Deletion Results: GET /applications/:app/deletions/:deletionID No parameters are expected. Access Control. The orchestrator for the application app or an admin can make such a call. Success. Returns error code 200, the JSON document with the description of the results of the deletion together with the associated ETag for the document. Failure. message.
Returns an error code (e.g. 404 not found) together with an optional error
Comment. Normally such a call is expected to happen only once from the application orchestrator once the latter has received the 201 error code that the deletion that was requested has been performed.
C.5
Execution Manager
The execution manager provides the following functionality: verb POST GET
URI /applications/:app/executions /applications/:app/executions/:executionID
Perform Execution: POST /applications/:app/executions/:executionID Expects the platform job with the description, for which the main parameter is the task on which a particular step of the execution is being performed. c SmartSociety Consortium 2013-2017
45 of 47
c SmartSociety Consortium 2013-2017
Access Control.
Deliverable D6.2
The orchestrator for the application app can make such a call.
Returns. The call always succeeds and generates a resource describing the outcome of the execution. Upon completion it returns an error code 201 and the link to the document with the results of the execution. Part of the description of the document with the results of the execution is the error code and message that is returned through the call PUT /applications/:app/taskRecord/:taskRecordID to the client. Get Execution Results: GET /applications/:app/executions/:executionID No parameters are expected. Access Control. The orchestrator for the application app or an admin can make such a call. Success. Returns error code 200, the json document with the description of the results of the execution together with the associated ETag for the document. Failure. message.
Returns an error code (e.g. 404 not found) together with an optional error
Comment. Normally such a call is expected to happen only once from the application orchestrator once the latter has received the 201 error code that the execution that was requested has been performed.
C.6
Further Remarks and Functionality
The above are the basic calls for orchestration (ignoring execution). They are designed in such a way so that resources are generated and linked with each other as needed. We allow a full provenance trace in the above paradigm (for the orchestration part of a SmartSociety application). Further, access control policies allow admins to follow links and inspect the system; for example for explanation purposes that can be used for provenance reasons. In addition, and this is tightly related to privacy and security concerns, the clients may want to retrieve sets of task requests or sets of tasks. One approach to accommodate such functionality would be calls of the form: POST POST
/applications/:app/taskRequests/?action=getSet /applications/:app/tasks/?action=getSet
where the client can post a document with the different URIs of tasks or task requests and request the content for all of them instead of performing one-by-one calls for retrieving the individual tasks/task requests. However, such an approach indicates bad design from a REST perspective, so, as we refine and finalise the various details we will allow get operations on specific resources that can actually guess the functionality that is requested from the server. For example, one functionality that can be provided to the clients is to allow the retrieval of all the task requests that have not been completed yet. 46 of 47
http://www.smart-society-project.eu
c SmartSociety Consortium 2013-2017
Deliverable D6.2
C.6.1
Administration and Monitoring
The orchestration paradigm that we have is asynchronous, non-blocking, and allows full profiling of the different steps of the execution of the jobs of the platform. These job profiles are stored and administrators of the application/platform can have access to them. Further, the various administrators may require access to resources for inspection purposes. Such monitoring capabilities are offered under URIs of the form shown below. GET POST
/applications/:app/monitoring/?action=getSet /applications/:app/monitoring
We do provide such functionality and such function calls already
C.7
10 .
The Current Setup in the SmartShare prototype
Figure 20 gives a schematic representation of the services and the communication that is allowed at the moment in our prototypical SmartShare implementation.
peer manager
reputation service
GUI service
client
prov store
orchestration service
Figure 20: A Schematic representation of the services that are used in SmartShare. The dashed arrows indicate communication for provenance reasons. The link between the client and the PROV store is also used for retrieving explanations from the PROV store.
10
For example see http://168.144.202.152:3001/monitoring and references therein for versioned resources but more importantly the analytics of the platform in http://168.144.202.152:3001/monitoring/analytics.
c SmartSociety Consortium 2013-2017
47 of 47