Enabling Flexible Resource Allocation in Mobile Deep Learning Systems
Abstract: Deep learning provides new opportunities for mobile applications to achieve higher performance than before. Rather, the deep learning implementation on mobile device today is largely demanding on expensive resource overheads, imposes a significant burden on the battery life and limited memory space. Existing methods either utilize cloud or edge infrastructure that require to upload user data, however, resulting in a risk of privacy leakage and large data transfers; or adopt compressed deep models, nevertheless, downgrading the algorithm accuracy. This paper provides Deep Shark, a platform to enable mobile devices with the ability of flexible resource allocation in using commercial-off-the-shelf (COTS) deep learning systems. Compared to existing approaches, Deep Shark seeks a balanced point between time and memory efficiency by user requirements, breaks down sophisticated deep model into code block stream and incrementally executes such blocks on system-on-chip (SoC). Thus, Deep Shark requires significantly less memory space on mobile device and achieves the default accuracy. In addition, all referred user data of model processing is handled locally, thus to avoid unnecessary data transfer and network latency. Deep Shark is now developed on two COTS deep learning systems, i.e., Caffe and Tensor Flow. The
experimental evaluations demonstrate its effectiveness in the aspects of memory space and energy cost. Existing system: We deployed Deep Shark on a Samsung S5 smart phone as study case. On this basis, four most popular convolutional neural network (CNN) models (i.e., VGG, CaffeNet, Google Net and Alex Net), that are originally unaffordable on smart phone, now can be carried out efficiently on mobile devices by using Deep Shark. The evaluation results show that, in most cases, Deep Shark only uses less than 300MB RAM of smart phone, and can achieve an average 70% of memory consumption reduction for one-time trial image recognition. Compared to existing mobile deep learning system using compressed models, Deep Shark can employ the sophisticated deep-inference without any accuracy loss, thus provide higher learning quality and offer more adaptive trade-offs between time and memoryusage efficiency. Moreover, Deep Shark only costs as little as 0:2% energy, which is preferable for mobile users. Proposed system: In practice, mobile application can often be time sensitive; however, our proposed technique extends inference time from the order of seconds to minutes. To show the significance of Deep Shark, here we demonstrate several real-world mobile application cases. First, within a face recognition application, e.g., automatically unlock screen, Deep Shark makes mobile devices to incrementally extend user database on the backend with fewer resource (energy and memory) overheads. Second, Deep Shark enables effectiveness deep learning systems on top of low-end IoT devices. Third, Deep Shark is naturally designed with the time-insensitive applications, e.g., albums classification, which makes our proposed technique more significant. Taken together, these factors mean the potentials in designing Deep Shark atop COTS deep learning system is applicable here. Advantages: We profile the executing time of Deep Shark and smart phone SoC itself for comprehensive efficiency evaluation. This is because the I/O efficiency of SoC is
unnegligible when in frequency I/O requesting scenarios (as Section 5.4.1 illustrates). With Deep Shark, existing deep learning on mobile devices can use the developed technique to relieve the resource hungry performance. This section discusses potential performance of Deep Shark in scenarios that might interest readers. Accompanying any future experiments with Deep Shark will be a close investigation into the use of even larger-scale datasets and advanced neural network models of mobile sensing. Disadvantages: Using edge device though might address these two problems, it still demands significant time consumed to deliver data (including deep model and user data) and thus results in inefficient system performance. A natural strategy is to run mobile deep learning system that mitigates both data transfer and privacy issues, however, leading to a long model inference time. Deep learning system has achieved significant success in many application areas, e.g., image and speech recognition. Figure 1 shows a convolutional neural network (CNN) based deep learning procedure. Here, the CNN model is consists of multiple layers with over thousands of parameters. Modules: Deep learning system: Deep learning nowadays has achieved remarkable results. Such a significant success of deep learning enables us to envision an intriguing life that mobile devices are to become our brilliant assistants. There are continuous efforts devoted to process mobile user data, e.g., photos, leveraging existing deep learning systems. Those efforts provide users with high quality of experience and more intelligent applications such as albums classification. However, high quality deep learning models are often computation-intensive and hard to be applied on commercial off-the-shelf (COTS) mobile devices (due to constrained resource and limited battery life). As a result, existing cloud deep learning systems, e.g., Amazon Recognition, usually require users to upload their personal data to remote
machine for high quality inference processing. However, to upload all referred data to remote can lead to large data transfer and potential privacy risks . Using edge device though might address these two problems, it still demands significant time consumed to deliver data (including deep model and user data) and thus results in inefficient system performance. A natural strategy is to run mobile deep learning system that mitigates both data transfer and privacy issues, however, leading to a long model inference time. Mobile devices: We deployed Deep Shark on a Samsung S5 smart phone as study case. On this basis, four most popular convolutional neural network (CNN) models (i.e., VGG, CaffeNet, Google Net and Alex Net), that are originally unaffordable on smart phone, now can be carried out efficiently on mobile devices by using Deep Shark. The evaluation results show that, in most cases, Deep Shark only uses less than 300MB RAM of smart phone, and can achieve an average 70% of memory consumption reduction for one-time trial image recognition. Compared to existing mobile deep learning system using compressed models, Deep Shark can employ the sophisticated deep-inference without any accuracy loss, thus provide higher learning quality and offer more adaptive trade-offs between time and memoryusage efficiency. Moreover, Deep Shark only costs as little as 0:2% energy, which is preferable for mobile users. Deep Shark manager: The manager maintains a range of device-wide or user-specific available resources. For the former, Deep Shark manager reads available memory space through "/proc/meminfo" command; while for the latter; user can determine the available memory and thus trade off time efficiency and memory-usage efficiency. Then, Deep Shark efficiently dispatches processing tasks with available memory space, and coordinates deep learning process from integrated model to meet the target of incrementally code execution on SoC. With such manager, Deep Shark can resource - efficiently carry out the deep learning inference with pre-trained model. Specifically, with our design, memory requests from applications can be maintained (clean and collect) since we periodically check whether there is out-ofuse memory space.
Extensibility: Deep Shark aims to build an extensible platform that does not require large-scale code rewriting for any applications. Deep Shark also provides easily accessed API library (Table 2), thus researchers/developers can simply execute deep learning process on smart phones through calling the APIs. In addition, Deep Shark also targets at offering adaptive trade-offs for users to decide if they want time efficiency or memory-usage efficiency. Note that this paper does not target at any deep model training on smart phone. In practice, the training process is usually the most time consuming and computationally demanding task. In our experimental experience, it can take up to 78 hours to train CaffeNet model with Image Net Large Scale Visual Recognition Competition (ILSVRC) 2012 dataset (Section 5.1.1) on a four Pascal architecture GTX 1080 GPUs equipped machine that we used in this paper. Today, most sophisticated deep learning models are pertained for commercial use. Thus, Deep Shark focuses on the inference (testing) steps over mobile devices with pertained models, which was supposed to bring rich applications to users, but actually suffers from mobile resource constraints.