Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

Page 1

Received June 7, 2020, accepted June 22, 2020, date of publication June 26, 2020, date of current version July 6, 2020. Digital Object Identifier 10.1109/ACCESS.2020.3005235

Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices DONGHOON KIM 1 TmaxSoft,

1,

YOUNG-BAE KO

2,

AND SUNG-HWA LIM

3,

(Member, IEEE)

Seongnam 13595, South Korea 2 Software and Computer Engineering Department, Ajou University, Suwon 16499, South Korea 3 Department of Multimedia, Namseoul University, Cheonan 31020, South Korea

Corresponding author: Sung-Hwa Lim (sunghwa@nsu.ac.kr) This research was supported in part by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2017R1E1A1A03070926), and in part by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2020-2018-0-01431) supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).

ABSTRACT The big.LITTLE architecture has been extensively integrated into smart mobile devices for better performance and higher energy efficiency. However, the desired energy savings obtained by the big.LITTLE architecture is not sufficiently achieved because the LITTLE cores are not fully utilized while running real-time user applications. In this study, an energy efficient big.LITTLE core assignment algorithm is proposed to reduce the energy consumption of the mobile device by utilizing the LITTLE core as much as possible while guaranteeing the real-time performance of the mobile application. By applying the proposed multi-core assignment technique on a real test-bed of an off-the-shelf smartphone, we prove that the proposed technique can improve the energy saving effect while guaranteeing real-time performance. The energy efficiency of the proposed scheme is compared with that of the legacy scheduler in various environments. In addition, we propose a machine learning-based method to predict the expected processing time more accurately for a task before assigning to one of multi-cores. The presented prediction method is expected to reduce the chances of missing a deadline when employed on the proposed multi-core assignment scheme. INDEX TERMS Energy conservation, asymmetric multi-cores, mobile devices, scheduling, real-time systems.

I. INTRODUCTION

In recent times, there has been an increasing demand for processing high workloads in real-time on smart mobile devices, such as smartphones or smart-pads [1]. A great number of computationally intensive real-time applications employing video encoding/decoding, machine learning, augmented reality and interactive gaming are being extensively appreciated by users on these smart mobile devices. These real-time systems typically have a deadline for each task, ensuring that every task is completed within each deadline. Satisfying the time constraint, i.e., the deadline, in real-time system may require higher performance. To cope with the required increase in performance, the processing power has to be enhanced (e.g., employing a central processing unit The associate editor coordinating the review of this manuscript and approving it for publication was Ilsun You 117324

.

with a higher clock speed). Furthermore, to provide real-time multi-tasking capability, multi-core architectures should be employed. Running a multi-core architecture with high speed CPU requires high power dissipating the battery power, which is one of the most crucial resources for mobile devices. However, the gap between battery performance and power consumption by the hardware module is increasing annually [2]. For instance, in the past five years, the battery capacity on off-the-shelf smartphones has increased by only about 25%, whereas the energy consumption has been multiplied1 . 1 The Samsung Galaxy S5, released in February 2014, has a 2800mAh battery, and the Huawei Ascend G7, released in September 2014, has a 3000mAh battery. Five years later, the Samsung Galaxy S10, released in February 2019, has a 3400mAh battery, and the Huawei P30, released in March 2019, has a 3650mAh battery.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

VOLUME 8, 2020


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

To address this problem, the asymmetric multi-core architecture was introduced into the mobile device environment, which employs multiple processors with different processing powers and energy efficiency. One of the most popular asymmetric multi-core architectures is the big.LITTLE architecture developed by ARM [3]. For example, the Samsung Galaxy S7 edge smartphone employs a quad-core CortexA53 (i.e, LITTLE core) and a quad-core Exynos M1 (i.e., big core) as shown in Figure 1. The ARM big.LITTLE architecture combines the big core cluster, which has higher processing power and energy consumption, with the LITTLE core cluster, which has lower energy consumption and relatively lower performance. The energy efficiency can be increased by assigning application tasks or threads to the LITTLE core that do not require high processing power or have the sufficient due time to be completed (i.e., deadline).

intensive applications is addressed. By exploiting the characteristics of the real-time tasks, we propose an energy efficient big.LITTLE core assignment algorithm that estimates the deadline compliance status of the real-time tasks, and assigns the tasks that are guaranteed to be completed on a LITTLE core within the deadline to the LITTLE core first. To evaluate the performance of the proposed scheme, various experiments are conducted by implementing the proposed scheme as a test-program on the real test-bed. Consequently, it is proved that the proposed scheme improves the energy saving effect while guaranteeing real-time performance (i.e., meeting the deadline) by comparing to a legacy multi-core assignment scheme. Furthermore, as the proposed technique is implemented at the application layer, it can be easily applied to the existing mobile devices without kernel level modifications. A preliminary version of this study was presented in [5]. The main differences and contributions of this study is as follows: •

• FIGURE 1. Example of the heterogeneous multi-core architecture for smartphones.

Nevertheless, the desired energy savings obtained by the big.LITTLE architecture may not be sufficiently achieved unless the LITTLE cores are fully utilized. It has been shown that not only urgent and high workload tasks but also non-urgent and light workload ones tend to be assigned to the big cores instead of the LITTLE cores [4]. This is because the criteria for assigning an application’s tasks to one of multi-cores are based on the task’s priority and workload. Unfortunately, most user application tasks such as graphical user interface (GUI) tasks typically have the highest priorities to fulfill the user satisfaction. Moreover, because a real-time task with a deadline is considered as urgent, it generally tends to be run on the big core. Thus, real-time tasks usually consume more energy than the non real-time ones. However, it is not necessary that a real-time task be completed immediately as long as it can be completed within its stipulated deadline. Therefore, it is more energy efficient to assign a real-time task to a LITTLE core rather than a big core provided the task could be completed on the LITTLE core within its deadline. In order to maximize the energy saving effect of the big.LITTLE architecture, a novel energy efficient multi-core assignment technique supporting real-time tasks is required to increase the utilization of the LITTLE core. In this study, the problem of high power consuming multi-core assignments on asymmetric multi-core-based smart mobile devices running real-time computationally VOLUME 8, 2020

An energy-efficient multi-core assignment technique is proposed supporting real-time applications to reduce the energy consumption of smart mobile devices by maximizing the utilization of the LITTLE core as much as possible while guaranteeing that all the deadlines are satisfied. The application of the proposed multi-core assignment technique on the real test-bed using a test program is proved that the proposed technique improved the energy saving effect while guaranteeing the real-time performance. The performance of the proposed scheme is compared with the performance of the legacy scheduler, which is extensively employed by android smartphones in various environments. To reflect the real environment, the experiments are conducted in different scenarios with varying important system parameters, such as the number of tasks in an application, the deadline laxity of a task, the parallelism degree of the task graph. We present an intellectual method to predict the expected processing time for a task before assigning one of multi-cores by employing a support vector machine (SVM) which is one of machine learning techniques. The presented prediction method is expected to reduce the chances of missing a deadline when employed on the proposed multi-core assignment scheme.

This study is organized as follows. A discussion of the related studies that have been conducted to reduce energy consumption in asymmetric multi-core architectures is presented in Section 2. The system and computation models proposed in this paper are described in Section 3. The proposed algorithm and a simple example of the benefits of using this proposed algorithm is also presented in Section 4. The performance evaluation with experimental results of the proposed scheme are shown in Section 5. Further discussions about employing a machine learning technique to enhance the prediction accuracy of the expected processing time for 117325


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

a task before assigning one of multi-cores are presented in Section 6. Finally, the conclusion of the study is presented in Section 7. II. RELATED WORK

The introduction of the ARM big.LITTLE structure by ARM Holdings has fueled countless research studies. In the white paper [3], the performance and the energy saving effect of the big.LITTLE architecture is introduced by comparing the legacy symmetric multi-core architecture. The studies in [4] and [6] provides a detailed comparison between the ARM big.LITTLE structure and the existing high performance CPUs and provides a better performance/energy trade-off. In the study [7], [8], an application assisted core assignment technique for the ARM big.LITTLE structure is proposed to save energy especially running the web browser. The study [9] proposes a technique for allocating CPU resources according to the concentration of users. For this purpose, one application is set to run in the foreground and others are set to run in the background. Each time the user clicks on the touchscreen, the priority of the application increases. Then, the highest priority application is run in the foreground. Study [10] presents an algorithm based on the actual temperature prediction methodology. The proposed algorithm uses the predicted temperature to dynamically calculate the power budget and to control the type, number and frequency of running CPUs. The study in [11] proposes an energy efficient heuristic scheme for the big.LITTLE core architecture to schedule multiple applications using offloading.

in G represents a task and a directed edge e(Vi , Vj ) indicates the precedence constraint between tasks Vi and Vj such that task Vj cannot start the execution until its precedent task Vi is completed. For example, there are 8 tasks in total, as shown in the Figure 2. Similarly, tasks V2 and V3 can not begin if the task V1 is not completed first. These tasks must be run on one of the heterogeneous multi-cores (big / LITTLE). The time required to complete a task is different for big or LITTLE cores. A completion time table is utilized to map the core execution time for the tasks in the scenario. Table 1 displays a completion time table representing the completion time of each task. For example, task V1 needs 9 seconds on the LITTLE core whereas on the big core it takes 5 seconds. TABLE 1. Completion time table for the example scenario.

Furthermore, a real-time system typically has a completion deadline for each task. Hence, a deadline table is used to map the tasks’ deadlines. Table 2 displays the deadline table representing the deadline for each task in the example scenario. For instance, task V1 ’s deadline is 10 second in the Table 2. It indicates that task V1 should complete its execution within 10 seconds after the application is started. In addition, tasks V2 and V3 have their deadline set at 10 second. However, this does not mean that tasks V2 and V3 should be completed within 10 seconds after the application is started, rather they are to be executed after task V1 completes. In other words, the deadline of a task means that the task must complete within the stipulated deadline once all of its precedent tasks have completed. TABLE 2. Deadline table for the example scenario.

FIGURE 2. Example of the task graph representing the relationship between tasks.

III. SYSTEM MODEL A. APPLICATION MODEL

Generally, a smartphone application does not executes only one task alone [12]. The execution of an application consists of several tasks with processing interdependencies. Figure 2 shows a directed acyclic task graph of the application representing the relationship between the tasks. A directed acyclic task graph G = (V , E) is used to describe the relationship between the tasks. Each node Vi ∈ V 117326

As there are interdependencies amongst the tasks within an application, to assign a certain task to a core it must be in a ready state. A task is said to be in the ready state only when all its precedent tasks are completed. Thus, using the task graph and the deadline table, a ready state table can be created. Table 3 presents a ready state table representing the ready state of each task. The ready state value(RSj ) of the task VOLUME 8, 2020


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

TABLE 3. Ready state table for the example scenario.

the scheduler and it does not affect the scheduling except when waiting time is the same. The scheduling algorithm works as follows:

Vj can be obtained as follows: RSj = −n(predVij ), (1) predVij = {i|i is one of the precedent tasks of Task Vj }. (2) where predVi,j is the set of the precedent tasks that must be completed before the task Vj begins executing, and Vi is an arbitrary task from the set of the precedent tasks before task Vj . Additionally, RSj is a negative value of the number of elements in the PredVi,j set, and n(S) means the number of all elements in the set S. B. COMPUTATION MODEL

The task Vj is able to begin executing only when the value of RSj is zero, which means that all the precedent tasks has been completed. Let RTj be the ready time of task Vj . The value of RTj can be obtained by finding out the most recent completed task from predVi,j as shown further: RTj =

max (FTi )

Vi ∈predVi,j

(3)

Here, FTi is finish time of task Vi , which is one of immediately preceding tasks of task Vj . Amongst all the tasks in predVi,j , the task having the highest value of the finish time is chosen as the value of RTj . Utilizing the ready time obtained from equation (3), the shortest ready time RT min amongst all the ready tasks (i.e., ready state value equal to zero) is obtained as follows: RT min = min RTm,i 1≤i≤I

(4)

where 1 ≤ i ≤ I denotes all the ready tasks and the smallest value of RTj is the value of RT min . IV. MULTI-CORE ASSIGNMENT ALGORITHM

The fundamental principle is that if a task can finish its execution running on a LITTLE core before its deadline, then that task is assigned to the LITTLE core. The important input data items are: 1) a task graph for the target application and 2) the waiting queue for all the tasks in the task graph. If there is only one application, the task at the top of the task graph with a waiting time of 0 will be scheduled first, followed by the task with the shortest waiting time. If there are multiple applications, the topmost tasks on the task graph for each application are scheduled first, and then the task with the least ready time will be scheduled, regardless of the application. Here the queue is just the order in which the tasks come into VOLUME 8, 2020

1) Initialization: The readiness of a task is checked in order of its arrival. If the task is found to be not ready, it is enqueued into the waitList. If the task is found to be ready, i.e., the ready time of the task is zero, the schedule procedure is initiated. 2) ScheduleTask: Once the task scheduling process begins, the scheduler starts to look for a core that can satisfy the deadline Di of the task Vi with reference to D-table. The search is carried out from the LITTLE to the big core. As soon as the scheduler finds a suitable core, it stops searching and immediately assigns that task to that core. 3) ScheduleWaitList: The subsequent task with the shortest ready time is identified amongst all the tasks in the waitList and step 2) is repeated until there are no more tasks in the waitList. 4) TaskDone: For real-time scheduling, the TaskDone callback function is implemented when a task is completed on a core through scheduling. After checking that the task is completed in the deadline, it is true, update the ready time and ready sate of the tasks. However, if the time for the finish time of the task exceeds its deadline, a deadline miss is recorded. The proposed multi-core allocation algorithm described in Algorithm 1 is compared with the legacy scheme employed in several off-the-shelf smart-phones. The scheduling sequence is performed using the task graph shown in Figure. 2. In addition, the completion time table as shown in Table 1, deadline table as shown in Table 2 and the ready state table as shown in Table 3 are built. The priority queue is as follows: (V1 , V2 , V3 , V4 , V5 , V6 , V7 ). It is assumed that one task is assigned for execution to one core, and should be completed within the stipulated deadline. In this example showing comparison, two big cores and two LITTLE cores are utilized. Figure 3 shows the scheduling sequence of a conventional smartphone. Initially, the task V1 is assigned to one of the multi-cores while the other tasks are waiting for execution. The reason being that none of the tasks with the exception of task V1 are ready to be assigned to a core, i.e., their ready states are not zero. In the legacy scheme, the operating system schedules tasks to maximize the performance rather than energy saving [7]. Therefore, big cores are usually preferred to LITTLE cores. Moreover, in the legacy scheme, big/LITTLE core selections are made depending on a task’s priority and workload. More important and highly loaded tasks tend to be assigned to big cores rather than LITTLE cores. The GUI related tasks in android applications usually have the highest priority in LINUX systems. Hence, the tasks of android user applications may be assigned to big cores rather than LITTLE cores. LITTLE cores are assigned to the tasks when all of big cores are occupied. In the example of Figure 3, the task V6 is 117327


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

FIGURE 3. Example of the legacy scheduling. It took 28 seconds for all the tasks in the application to complete.

FIGURE 4. Example of the proposed scheduling. It took 34 seconds for all the tasks in the application to complete.

assigned to the LITTLE core because all of big cores are occupied already by other tasks (i.e., V4 and V5 ). The legacy scheme can not take advantage of the ARM big.LITTLE architecture due to the low utilization of the LITTLE core. Figure 4 shows the method of operation of the proposed scheme works using the same example of the task set shown in Figure 2. The proposed scheduling, unlike the traditional scheduling, assigns as many tasks to the LITTLE core as possible. From task V1 to task V8 , all tasks are assigned to LITTLE core except task V6 , as they can be completed within the deadline. The task V6 can not be completed on the LITTLE core within the deadline; hence, it is assigned to a big core. TABLE 4. Example of remaining deadline table for scheduling comparison.

Table 4 presents the remaining deadlines of each of the tasks in the example scenario using both the legacy scheduler and the proposed scheduler. Compared with the legacy scheduler, the proposed scheduling scheme have a smaller remaining deadline except Task V6 . Consequently, the proposed scheduling scheme can provide significant energy benefit while guaranteeing similar performance to the legacy scheme. 117328

Algorithm 1: Multi-Core Assignment Algorithm input : a task graph G, a queue PQ of all tasks, a D-table, a CT-table, a RS-table output: Schedule sequence S 1 Function Initialization: 2 Waiting list waitList 3 Schedule sequence S 4 while PQ ! = 0 do 5 Task Vi ← PQ 6 RSi ← Eq.(1) 7 if RSi == 0 then 8 ScheduleTask(Vi ) 9 else 10 Add Vi into waitList 11 end 12 Delete Vi from PQ 13 end 14 ScheduleWaitList(waitList) 15 Function ScheduleWaitList(waitList): 16 while waitList ! = 0 do 17 Vi ← FindMinReadyTime(waitList) 18 ScheduleTask(Vi ) 19 Delete Vi from waitList 20 end 21 Function FindMinReadyTime(waitList): 22 Compute RT min ← Eq.(4) 23 return Vi 24 Function ScheduleTask(Vi ): 25 for k=1 to K do 26 if k-th core is available then 27 Ti,k ← CT-table 28 RTi ← RT-table 29 Di ← D-table 30 Now ← current time 31 if (Now + Ti,k ) < (RTi + Di ) then 32 Assign Task Vi onto k-th core 33 Update S 34 Break 35 end 36 end 37 end 38 Function TaskDone(Vi ): 39 FTi ← current time 40 if FTi < (RTi + Di ) then 41 for each Task Vj on follow Vi do 42 RSj + + 43 RTj ← Eq.(3) 44 end 45 else 46 Deadline miss 47 end V. PERFORMANCE EVALUATION

The performance of the proposed scheme is evaluated by implementing a test program on an off-the-shelf smartphone. VOLUME 8, 2020


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

FIGURE 5. Experimental equipment.

A randomly generated task graph and a static task graph are used for the evaluation. Further, the scheduling is carried out according to the proposed Algorithm 1. The proposed scheduling scheme is compared with the android’s default CPU scheduler2 . The configuration of the conducted experiment is presented further followed by the evaluation results. A. EXPERIMENTAL SETUP

The experiments are conducted with the proposed scheduler on the Samsung Galaxy S7 edge [14], an off-the-shelf smartphone. Samsung Galaxy S7 edge is equipped with four big cores and four LITTLE cores. The clock speed of the LITTLE core is set at 1.6 GHz and that of the big core is set at 2.3 GHz, which is the default setting. The monsoon HV power monitor [15] is used to measure the energy consumption of the smartphone during the experiment, as shown in Figure 5. For more accurate measurements, the smartphone’s battery is removed and powered directly from the monsoon power monitor. The smartphone is maintained in the airplane mode for the duration of the experiment to ensure the accuracy, preventing other functions from working. The performance of the proposed scheme is evaluated when the number of tasks within an application varies. Each task has the same size, and runs the Linpack benchmark [16]. Linpack is one of the most extensively used methods to measure the performance of CPUs executing series of linear functions. To compare the proposed scheme with the legacy scheme, experiments are conducted in various environments. The main system parameters, which may greatly affect the energy consumption in the experiments, are as follows: • The number of tasks: An application consists of a bunch of tasks. As the total number of tasks increases, the workload of the application increases, consequently the total consumed energy is also increased. • Deadline of tasks: A deadline of a task means the due time for the task to be completed within. In the proposed 2 Completely Fair Scheduler (CFS) [13] is widely used in modern operating systems, including Android, so we utilize it as an default scheduler. VOLUME 8, 2020

scheme, if the task’s deadline is small, the scheduler will try to run the task faster, which consumes more energy. Parallelism degree: It is the number of tasks running simultaneously in the application. As the parallelism degree increases, the utilization of the multi-cores will be increased.

First, to show the effects of the aforementioned three system parameters intuitively, the experiment is conducted with simple task graphs, in which all the tasks in an application are simultaneously executed as much as their parallelism degrees. Subsequently, to consider the actual environment, a second experiment is conducted with a random graphs. The random task graph of the application program is randomly generated using Task Graphs For Free (TGFF) [17]. In this case, the maximum parallelism degree is employed instead of the parallelism degree. The experimental result is averaged with 10 repetition for each task graph. In the result graph, the legacy scheduling technique is labeled as ‘‘Legacy’’ for convenience and the proposed algorithm is labeled as ‘‘Proposed’’. B. EXPERIMENT RESULTS 1) EXPERIMENT WITH FIXED GRAPHS OF SIMPLE TASK SETS

To demonstrate the effects on the three main system parameters, i.e., the parallelism degree, the deadline laxity and the number of tasks intuitively, experiments with simple static task graphs are conducted, in which all the tasks in an application are executed simultaneously as much as their parallelism degrees. For example, if an application has six tasks and two degrees of parallelism, then a task graph of width equal to two and depth equal to three is generated. Figure 6 shows the change in the energy consumption by varying the number of tasks where the deadline is tight (i.e., deadline=3), normal (i.e., deadline=6), and loose (i.e., deadline=9). For the same parallelism degree (i.e., parallelism degree = 4), when the deadline laxity increases from 3 (Figure 6. (a)) to 9 (Figure 6. (c)), the tasks in the application can be more assigned to LITTLE core. In the three figures, the energy consumption shown according to the number of tasks in the application program seems to be similar. As the number of tasks increases, the amount of work in the application increases, which incurs the increase of the energy consumption. However, as can be seen from the graph, Proposed can significantly reduce the energy consumption compared to Legacy due to the high utilization of the LITTLE core. For example, as shown in Figure 6. (c), the energy consumption is reduced by 45.33% comparing to Legacy at the number of tasks in the application program is 40. In order to check the energy consumption according to the deadline laxity in more detail, the following experiment is conducted. Figure 7 shows the change in the energy consumption by varying the deadline laxity where the number of tasks is 32 and the parallelism degree is four. The experiment is conducted for the scenarios where the deadline of each task is tight (i.e., deadline=3), normal (i.e., deadline=6), and loose 117329


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

FIGURE 6. Energy consumption by varying the number of tasks within an application with a fixed graphs.

FIGURE 7. Energy consumption by varying task deadline with a fixed graphs.

(i.e., deadline=9). As the deadline relaxes, Proposed gradually reduces the energy consumption. However, the legacy schedule shows little difference in energy consumption as the deadline changes. That is because Legacy functions by assigning most of the tasks to big cores regardless of the deadline laxity. In contrast, Proposed functions by assigning the tasks to the big core only if the predictions state that the task can not possibly be completed within its deadline on the LITTLE core. Otherwise, most of the tasks are assigned to the LITTLE core as much as possible provided there is enough time to satisfy. For this reason, as the deadline is relaxed, Proposed reduces the energy consumption even further. When the deadline is sufficiently relaxed, i.e., deadline = 9 s, Proposed reduces the energy consumption by 42.99% compared to Legacy. If deadlines of every task are tight and workloads are high, deadline misses occurred may greatly increase. To show the real-time performance (i.e., guaranteeing of deadlines) of Proposed, we measure occurrences of the deadline miss under intensive real-time environments (i.e., various workloads with a tight deadline laxity). Figure 8 shows 1) average number of deadline misses and 2) the consumed energy, by varying the size of workloads (i.e., 24, 32, and 40 tasks in an application) when the deadline laxity is very tight (i.e., deadline is 2). In both Proposed and Legacy, deadline misses occurred boost up as the workload (i.e., the number of tasks) increases. This is due to the physical limitations of 8 cores (big + LITTLE) handling tasks within their deadlines. Proposed incurs slightly less deadline misses than Legacy does whereas Proposed consumes far less energy than Legacy up to 32%. 117330

FIGURE 8. Deadline misses and energy consumption by varying workloads under tight deadlines.

FIGURE 9. Energy consumption by varying the parallelism degree.

To show the effect of parallelisms of tasks, we measure the energy consumption of Proposed and Legacy by varying the parallelism degree where the number of tasks is 16 and deadline is 9 as shown in Figure 9. Both Legacy and Proposed show a reduction in the energy consumption as the parallelism increases from two to four. This indicates that the obtained gain by the increased parallelism reduces the total processing time of all the task (i.e., reduced depth of the task graph), which is resulted to the reduced energy consumption. And when the parallelism is equal to four, as shown in the graph, Proposed significantly reduces the energy consumption compared to Legacy. For example, as shown in Figure 9, with parallelism=2, the energy consumption is reduced by 48.28% comparing to Legacy. In Proposed, the energy consumption increases when the parallelism degree is greater than four. The energy consumption turns to be increasing when the parallelism degree is five. That is because the utilization of big-cores also increases. However, the energy consumption turns to be decreasing VOLUME 8, 2020


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

where the parallelism is six, and continually be decreasing where the parallelism is increasing. It is assumed that the power management of the big.LITTLE cores in the test-bed are cluster-based. Once one of big cores is running a tasks, the energy consumption greatly increases because the big-core cluster (i.e., 4 big-cores) is activated. The parallelism gain can be obtained where the parallelism exceeds five as shown in the graph. To see the effect of the numbers of big cores and LITTLE cores and their ratio, we measure the energy consumption with various combination sets of big cores and LITTLE cores as shown in Figure 10. It is assumed that maximal parallelism is two, the number of tasks is 8, and deadlines for each tasks are 9 for an application. As we can see the result, Proposed achieves better performance (i.e., less power consumption) than Legacy does as the number of LITTLE cores increases. As the number of LITTLE cores increases, more tasks may be allocated to LITTLE cores, thereby we can obtain more energy saving gain.

FIGURE 10. Energy consumption by varying the ratio of big cores and LITTLE cores.

seems to be similar. As the number of tasks increases, the amount of work in the application increases, which incurs the increase of the energy consumption. Proposed significantly reduces the energy consumption compared to Legacy due to the high utilization of the LITTLE core. For example, as shown in Figure 11. (c), the energy consumption is reduced up to 46.37% by comparing to Legacy at the number of tasks in the application program is 40. Figure 12 shows the change in the energy consumption by varying the deadline laxity while the number of tasks is 32 and the maximum parallelism degree is four. The experiment is conducted for the scenarios where the deadline of each tasks is tight (i.e., deadline=3), normal (i.e., deadline=6), and loose (i.e., deadline=9). For overall the deadline laxities, Proposed surpass Legacy. As the deadline is relaxed, Proposed reduces the energy consumption even further. When the deadline is sufficiently relaxed (deadline = 9), Proposed reduces the energy consumption by 40.59% compared to Legacy. Figure 13 shows the change in the energy consumption by varying the maximum parallelism degree where the number of tasks is 16 and the same completion time constraint(i.e., deadline=9). The results are similar to those shown in Figure 9. All over the cases, Proposed show less energy consumption than Legacy does, especially the parallelism degree is smaller than 5. For example, with the maximum parallelism is two, Proposed reduced 51.8% energy consumption comparing to Legacy. Above maximum parallelism 5, Proposed also increases the utilization of big cores as well as LITTLE cores. That is why Proposed does not show markedly better result comparing to Legacy where the maximum parallelism exceeds four.

2) EXPERIMENT WITH A RANDOM TASK GRAPH

VI. FURTHER DISCUSSION

To simulate the actual commercial application, we conducted experiments in the environment where an application has a random task graph. The random task graph of an application program was randomly generated using TGFF for each task number (i.e., 8, 16, 24, 32 and 40). Two task graphs may have different shape and show different performance even though they have the same system parameters (i.e., the number of tasks, the deadline laxity, and the maximum parallelism degree). For example, if two applications have six tasks and tight deadline, one can have a depth of four and the other can have a depth of three. Figure 11 shows the change in the energy consumption by varying the number of tasks where the deadline is tight (i.e., deadline=3), normal (i.e., deadline=6), and loose (i.e., deadline=9). For the same maximum parallelism degree (i.e., maximum parallelism degree = 4), when the deadline laxity increases from 3 to 9, the more tasks in the application are assigned to LITTLE cores rather than big cores. The results are similar to those shown in Figure 6. In the three graphs (i.e, Figure 9.(a), (b), and (c), the energy consumption shown according to the number of tasks in the application program

A multi-core assignment scheme is designed for real-time applications in a multi-core mobile device environment, as shown in Algorithm 1. However, the proposed scheme is heavily influenced by the completion time table. If the value in the completion time table is larger than the actual completion time of the task on that core, the task may be assigned to a big core instead of a LITTLE core, although, it could have been completed within the deadline on the LITTLE core as well. Conversely, if the value in the completion time table is smaller than the actual completion time of the task on that core, then a deadline miss will occur. In real-time systems, missing a deadline brings up severe negative impact on the system. Hence, the proposed algorithm should employ a more accurate completion time table. An intellectual completion time prediction method is created by exploiting one of machine learning techniques, as shown in Figure 14. The predicted completion time table has greatly aid the proposed algorithm in working more accurately, as shown in Fig. 15.

VOLUME 8, 2020

A. SVM MODEL

To make the completion time table more accurate, the support vector machine (SVM) model is employed with a sigmoid 117331


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

FIGURE 11. Energy consumption by varying the number of tasks within an application with a random graphs.

FIGURE 12. Energy consumption by varying task deadline with random task graphs.

FIGURE 13. Energy consumption by varying the maximum parallelism degree.

1) FEATURE SELECTION

function kernel [18]. The SVM model has been utilized in several recent studies because it can achieve high performance with low computing power compared with other techniques such as the kNN and ANN [19], [20]. To build the SVM model, the open source software LIBSVM library [21] can be utilized. For the SVM model with a sigmoid function kernel, parameter gamma, γ , and C must be set. The best performing parameter set of γ and C can be identified by performing a grid search [22]. The input for the proposed model is a set of tasks in an application. The output of the model is the completion time table that indicates the expected completion time for the tasks on each cores. To build and utilize the SVM model, the following steps should be executed: (1) collection of raw data (2) data pre-processing, and, (3) precision evaluation. B. RAW DATA

The SVM model is trained by selecting five to ten applications per category that are most used by mobile users ranked by Worldwide Mobile App User Behavior Data set [23]. These categories include Games, Social networking, Utilities, Music, Photo & video. The applications is executed with different process settings for each category of the application tasks. The category of the application, the completion time and the energy consumption per processor using android studio CPU profile is recorded. Additionally, the task graphs of the applications are extracted.

The feature selection is performed using information about the category of application, the task graph, the core (i.e., one of multi-cores) energy consumption and the completion time from the raw data. To predict the completion time of a task on a core, the workload of a task is traced back with the core completion time and the energy consumption, and it is used as a feature. After all, the features employed are the category of the application, the task graph, the workload of the task, and the energy consumption for each cores3 . The information gain, which is a filter method, is used to select three of the highest ranked features, which are subsequently applied to the SVM model. 2) FEATURE NORMALIZATION

Feature normalization is a process of matching data distributions by converting a range of the selected feature data. To fit a distribution of the selected feature data, the values of each of the selected features is scaled within the range of 0 and 1. D. PRECISION EVALUATION

C. DATA PRE-PROCESSING

The coefficient of determination (r 2 ) and the mean square error (MSE) are employed to evaluate model’s performance. The values of r 2 and MSE are the indicators of the precision and the accuracy of the model [24]. The higher the value of r 2 , the lesser the MSE, which indicates that the model shows the higher precision, accuracy, and stability. Pn 2 i=1 (yi − ŷi ) MSE = (5) n−2

Data pre-processing is required to convert raw data into a format acceptable for application of the SVM.

3 More detailed characteristics of each tasks may also be included according to the type of the application.

117332

VOLUME 8, 2020


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

FIGURE 14. Training the predictor with SVM.

FIGURE 15. Improving the performance of our algorithm using predictor.

Pn (yi − ŷi )2 r = 1 − Pi=1 n 2 i=1 (yi − ȳ)

(6)

where yi , ȳ, ŷi represents the validation set for the measures, the average, and the predictive value, respectively. n is the plots of the validation set. VII. CONCLUSION

In this study, an energy efficient multi-core assignment scheme is proposed that processes real-time tasks in asymmetric multi-core mobile devices while satisfying the deadlines of each tasks. Implementation of the proposed multi-core assignment technique on the real test-bed proved that the proposed scheme could improve the energy saving effect of the asymmetric multi-core mobile device while guaranteeing real-time performance by comparing to the legacy scheme in various environments. The experimental results showed that the proposed scheme reduced the energy consumption by up to 48.28% compared to legacy scheme for the simple task sets. In addition the proposed method reduces energy consumption by up to 51.8% compared to legacy scheme for the random task graphs. Based on the experimental results, the proposed scheme showed high energy efficiency while guaranteeing the required high performance. ACKNOWLEDGMENT

The authors would like to thank Editage (www.editage.co.kr) for English language editing. REFERENCES [1] M. Fan, ‘‘Real-time scheduling of embedded applications on multi-core platforms,’’ Ph.D. dissertation, Florida Int. Univ., Miami, FL, USA, 2014. [2] J. A. Paradiso and T. Starner, ‘‘Energy scavenging for mobile and wireless electronics,’’ IEEE Pervas. Comput., vol. 4, no. 1, pp. 18–27, Jan. 2005. [3] ‘‘Big.little technology: The future of mobile,’’ ARM Ltd., Cambridge, U.K., White Paper, 2013. [4] W. Seo, D. Im, J. Choi, and J. Huh, ‘‘Big or little: A study of mobile interactive applications on an asymmetric multi-core platform,’’ in Proc. IEEE Int. Symp. Workload Characterization, Oct. 2015, pp. 1–11. [5] D. H. Kim, Y.-B. Ko, and S.-H. Lim, ‘‘Power-efficient big.little core assignment scheme for real-time smartphone applications,’’ in Proc. 4th Int. Symp. Mobile Internet Secur. (MobiSec), 2019, pp. 1–7. [6] E. L. Padoin, M. Castro, L. L. Pilla, P. O. A. Navaux, F. Z. Boito, and J.-F. Méhaut, ‘‘Performance/energy trade-off in scientific computing: The case of ARM big.little and intel sandy bridge,’’ IET Comput. Digit. Techn., vol. 9, no. 1, pp. 27–35, Jan. 2015. VOLUME 8, 2020

[7] Y. Zhu and V. J. Reddi, ‘‘High-performance and energy-efficient mobile Web browsing on big/little systems,’’ in Proc. IEEE 19th Int. Symp. High Perform. Comput. Archit. (HPCA), Feb. 2013, pp. 13–24. [8] D. H. Bui, Y. Liu, H. Kim, I. Shin, and F. Zhao, ‘‘Rethinking energyperformance trade-off in mobile Web page loading,’’ in Proc. 21st Annu. Int. Conf. Mobile Comput. Netw. (MobiCom), 2015, pp. 14–26. [9] P. C. Hsiu, P. H. Tseng, W. M. Chen, C. C. Pan, and T. W. Kuo, ‘‘Usercentric scheduling and governing on mobile devices with big.little processors,’’ ACM Trans. Embedded Comput. Syst., vol. 15, no. 1, p. 17, Feb. 2013. [10] G. Singla, G. Kaur, A. K. Unver, and U. Y. Ogras, ‘‘Predictive dynamic thermal and power management for heterogeneous mobile platforms,’’ in Proc. Design, Automat. Test Eur. Conf. Exhib. (DATE), 2015, pp. 960–965. [11] Y. Geng, Y. Yang, and G. Cao, ‘‘Energy-efficient computation offloading for multicore-based mobile devices,’’ in Proc. IEEE Conf. Comput. Commun. (INFOCOM), Apr. 2018, pp. 46–54. [12] Y. Li, M. Chen, W. Dai, and M. Qiu, ‘‘Energy optimization with dynamic task scheduling mobile cloud computing,’’ IEEE Syst. J., vol. 11, no. 1, pp. 96–105, Mar. 2017. [13] R. Love, Linux Kernel Development. London, U.K.: Pearson, 2010. [14] J. Park, K. Kiseong, H. Yeo, J.-H. Lee, J. Chung, D. Choi, and M. Lee, ‘‘Mobile phone,’’ U.S. Patent 29 577 834, Apr. 11, 2017. [15] Monsoon Solutions. (Mar. 2019). High Voltage Power Monitor. [Online]. Available: https://www.msoon.com [16] J. J. Dongarra, P. Luszczek, and A. Petitet, ‘‘The LINPACK benchmark: Past, present and future,’’ Concurrency Comput., Pract. Exper., vol. 15, no. 9, pp. 803–820, 2003. [17] R. P. Dick, D. L. Rhodes, and W. Wolf, ‘‘TGFF: Task graphs for free,’’ in Proc. 6th Int. Workshop Hardw./Softw. Codesign. (CODES/CASHE), 1998, pp. 97–101. [18] V. Vapnik, The Nature of Statistical Learning Theory. Berlin, Germany: Springer, 2013. [19] L. J. Cao and F. E. H. Tay, ‘‘Support vector machine with adaptive parameters in financial time series forecasting,’’ IEEE Trans. Neural Netw., vol. 14, no. 6, pp. 1506–1518, Nov. 2003. [20] Y. Shynkevich, T. M. McGinnity, S. A. Coleman, A. Belatreche, and Y. Li, ‘‘Forecasting price movements using technical indicators: Investigating the impact of varying input window length,’’ Neurocomputing, vol. 264, pp. 71–88, Nov. 2017. [21] C.-C. Chang and C.-J. Lin, ‘‘LIBSVM: A library for support vector machines,’’ ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011. [22] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, ‘‘A practical guide to support vector classification,’’ Dept. Comput. Sci., Nat. Taiwan Univ., Taipei, Taiwan, 2003. [23] S. L. Lim, ‘‘Worldwide mobile app user behavior dataset,’’ Havard Dataverse, Tech. Rep., 2014. [24] L. O. Tedeschi, ‘‘Assessment of the adequacy of mathematical models,’’ Agricult. Syst., vol. 89, nos. 2–3, pp. 225–247, Sep. 2006.

DONGHOON KIM received the B.S. degree in software and the M.S. degree from the School of Computer Engineering, Ajou University, South Korea, in 2018 and 2020, respectively. He is currently a Researcher with the TmaxSoft, Seongnam, South Korea. His research interests are in wireless communication, energy conservation, mobile devices, scheduling, and network security.

117333


D. Kim et al.: Energy-Efficient Real-Time Multi-Core Assignment Scheme for Asymmetric Multi-Core Mobile Devices

YOUNG-BAE KO received the Ph.D. degree in computer science from Texas A&M University, College Station, TX, USA. In 2002, he joined the Department of Ubiquitous Networking and Security, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA, as a Research Staff Member. He is currently a Professor with the Department of Software and Computer Engineering, Ajou University, South Korea, also leading the Intelligence of Connected Systems (iCONS) Laboratory funded by the Brain Korea 21 Plus (BK21+) National Project. His recent research areas are wireless ad hoc/mesh networks, the Intelligent IoT, indoor position systems, and trustworthy tactical combat networks.

117334

SUNG-HWA LIM (Member, IEEE) received the B.S., M.S., and Ph.D. degrees in computer engineering from Ajou University, South Korea, in 1999, 2001, and 2008, respectively. He was a Postdoctoral Researcher with the Coordinated Science Laboratory, University of Illinois at Urbana–Champaign (UIUC), from 2008 to 2009. He is currently an Associate Professor with the Department of Multimedia, Namseoul University. His research interests include the Internet of Things, power-aware computing, and real-time systems.

VOLUME 8, 2020


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.