Source: Deep Learning on Medium
Alibaba Open-Source and Lightweight Deep Learning Inference Engine — Mobile Neural Network (MNN)
By Mobile Neural Network (MNN) Team
Recently, Alibaba officially published the source code of its lightweight mobile-side deep learning inference engine — Mobile Neural Network (MNN) — on GitHub.
Jia Yangqing, a famous AI scientist, commented that “compared with general-purpose frameworks like TensorFlow and Caffe2 that cover both training and inference, MNN focuses on the acceleration and optimization of inference and solves efficiency problems during model deployment so that services behind models can be implemented more efficiently on the mobile side. This is actually in line with ideas in server-side inference engines like TensorRT. In large-scale machine learning applications, the number of computations for inference are usually 10+ times more than that for training. Therefore, optimization for inference is especially important.”
How is the technical framework behind MNN designed? What are future plans regarding MNN? Today, let’s have a closer look at MNN.
1. What Is MNN?
Mobile Neural Network (MNN) is a lightweight mobile-side deep learning inference engine that focuses on the running and inference of deep neutral network model. MNN covers the optimization, conversion, and inference of deep neutral network models. Currently, MNN has been adopted in more than 20 apps such as Mobile Taobao, Mobile Tmall, Youku, Juhuasuan, UC, Fliggy, and Qianniu, covering live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control and other scenarios. MNN stably runs more than 100 million times per day. In addition, MNN is also applied in IoT devices like Cainiao will-call cabinets. During the 2018 Double 11 event, MNN was used in scenarios like smiley face red envelopes, scans, and a finger-guessing game.
MNN has already been made an open-source project on GitHub. Follow the official WeChat account “Alibaba Technology” and enter “MNN” in the dialog box to learn more and obtain the GitHub download link to this project.
2. Advantages of MNN
MNN loads network models, does inference, make predictions and returns relevant results. The inference process consists of loading and parsing models, scheduling computational graphs, and running models efficiently on heterogeneous back-end devices. MNN has four advantages: lightweight, versatility, high performance, and ease of use.
- Supports popular frameworks like TensorFlow, Caffe, ONNX, and supports common neural networks such as CNN, RN
- Supports 86 TensorFlow ops, 34 Caffe ops; MNN ops: 71 for CPU, 55 for Metal, 40 for OpenCL, and 35 for Vulkan.
- Supports iOS 8.0+, Android 4.3+, and embedded devices with POSIX interface.
- Supports hybrid computing on multiple devices. Currently supports CPU and GPU. The GPU op plugin can be loaded dynamically to replace default (CPU) op implementation.
- MNN is customized and optimized for mobile devices and has no dependencies. It can be easily deployed to mobile devices and a variety of embedded devices.
- iOS platform: Static library size for armv7+arm64 platforms is about 5 MB, size increase of linked executables is about 620 KB, and metallib file is about 600 KB.
- Android platform: so size is about 400 KB, OpenCL so is about 400 KB, Vulkan so is about 400 KB.
- MNN is independent from any third-party computing libraries. It implements core operations by relying on large amounts of handwritten assembly code to make full use of the ARM CPU.
- For iOS, GPU acceleration (Metal) can be enabled. MNN supports iOS later than version 8.0. The common model in MNN is faster than Apple’s native CoreML.
- For Android, OpenCL, Vulkan, and OpenGL are available to meet as many device requirements as possible. It is deeply tuned for mainstream GPUs (Adreno and Mali).
- Convolution and transposition convolution algorithms are efficient and stable. The Winograd convolution algorithm is widely used to efficiently implement convolutions like 3×3 -> 7×7.
- Additional optimizations have been made for the new ARM v8.2 architecture with half-precision calculation support available for further acceleration.
Ease of use:
- Lots of documents and instances are available.
- It features an efficient image processing module that covers common features such as transformation and conversion. In general, additional libyuv or opencv libraries are not required for image processing.
- Callbacks are supported to conveniently extract data or precisely control the execution.
- It is supported to run some paths in the network model or run CPU and GPU in parallel.
3. Core Content of MNN
3.1 Module Design
As shown in the preceding figure, MNN consists of two parts: Converter and Interpreter.
Converter consists of Frontends and Graph Optimize. The former supports different training frameworks. Currently, MNN supports TensorFlow Lite, Caffe, and ONNX. The latter optimizes graphs by operator fusion, operator substitution, and layout adjustment.
Interpreter consists of Engine and Backends. The former is responsible for loading the model and the scheduling calculation graph; the latter includes the memory allocation and the op implementation under each computing device. In Engine and Backends, MNN applies a variety of optimization schemes, including applying the Winograd algorithm in convolution and deconvolution, applying the Strassen algorithm in matrix multiplication, low-precision calculation, Neon optimization, hand-written assembly, multi-thread optimization, memory reuse, and heterogeneous computing.
3.2 Performance Comparison
The following figures compare performance of popular open-source frameworks on MobileNet and SqueezeNet.
MNN shows over 20% better performance than NCNN, Mace, TensorFlow Lite, and Caffe2. We actually focus on optimizing our internal business models and has made in-depth optimizations for models like face detection. Detecting a single frame on iPhone6 only takes about 5 ms.
Note: Mace, TensorFlow Lite, and Caffe2 all use the master branch of the GitHub repository as of March 1, 2019. Due to compilations, NCNN uses the pre-compiled library released on December 28, 2018.
4. History of MNN
4.1 Why do we work on inference on mobile devices?
With the increasing computing power of mobile phones and the rapid development of deep learning, especially the growing maturity of small network models, inference and prediction originally done on the cloud can be implemented on mobile devices. Mobile-side intelligence is about deploying and running AI algorithms on mobile devices. Compared with server-side intelligence, mobile-side intelligence has a series of advantages, including low latency, higher data privacy, and saving on cloud resources. Currently mobile-side intelligence is an increasing trend and has played a big role in scenarios such as AI photography and visual effects.
Mobile Taobao is an e-commerce app that covers many kinds of services. Business scenarios like image search-based online shopping (Pailitao), live short videos, interactive marketing, makeup try-on, personalized recommendation/search require the use of mobile-side intelligence. Mobile-side intelligence can bring users a new interactive experience and facilitate break-through business innovations.
Generally, deep learning on mobile devices can be divided into the following stages:
- Model training. In this stage, model training is performed and tagging data is used to train model files. When a model is designed for mobile devices, the model size and the amount of computing must be taken into consideration.
- Model compression. In this stage, the main work is to optimize the model size. The model size can be reduced by pruning, quantification or other methods so that models can be used better on mobile devices.
- Model deployment. This stage mainly implements model deployment, including model management and deployment, maintenance, and monitoring.
- Mobile-side inference. This stage mainly deals with model inference, that is, loading models and completing all inference-related computations.
The preceding information shows that a mobile-side inference engine is a core module of a mobile-side intelligence application. This core module needs to efficiently use resources and quickly complete the inference under restrictions such as limited computing power and limited memory. It is safe to say that the quality of the mobile-side inference implementation directly decides whether algorithm models can run on mobile devices and whether a business scenario can be implemented. Therefore, we need an excellent mobile-side inference engine.
4.2 Why did we make MNN an open-source project?
Before we started our engine R&D work in early 2017, we had researched system solutions and open-source solutions quite thoroughly and conducted in-depth analysis from the perspective of versatility, lightweightness, performance, and security. CoreML is a system framework used across Apple products. MLKit and NNAPI are two system framework used on Android. The biggest advantage of system frameworks is lightweightness: System frameworks have a relatively low package size restriction. The biggest disadvantage of system frameworks is versatility. CoreML only supports iOS 11+, and MLKit and NNAPI only supports Android 8.1+. These system frameworks apply to only a highly limited range of device models and barely support scenarios where embedded devices are used. In addition, system frameworks only support a relatively small number of network types and op types. They have poor scalability and cannot make full use of the computing power of devices. Model security is another big issue for system frameworks. All the preceding disadvantages indicate that system frameworks are not a good choice. As to open-source projects, TensorFlow Lite was announced but has not been released yet. Caffe is a relatively mature project, but it is not designed and developed for mobile scenarios. NCNN is a recently released project, which is not mature enough. The final conclusion was that we cannot find a simple, efficient and secure mobile-side inference engine that targets different training frameworks and deployment environments.
Therefore, we decided to develop a simple, efficient and secure mobile-side inference engine that can be used in different business algorithm scenarios, different training frameworks, and different deployment environments. The engine is known as MNN later. MNN eliminates the limits imposed by different operating systems (Android and iOS), devices, and training frameworks. It supports fast deployment on mobile devices and allows flexible addition of ops and in-depth performance optimization on CPUs and GPUs based on business models.
Projects such as NCNN, TensorFlow Lite, Mace, and Anakin gradually become open source and get continuously upgraded over time, providing us with good input and meaningful reference. We are also continuously making iterations and optimizations of MNN as our business requirements grow. MNN has been proven in Double 11 events and is relatively mature and full-fledged. Therefore, we decided to make MNN an open source project so that it can benefit more application and IoT developers.
5. Applicable Scenarios
At present, MNN has been integrated into 20+ apps, including Mobile Taobao, Maoke, Youku, Juhuasuan, UC, Fliggy, and Qianniu. It is currently applied in business scenarios such as image search-based online shopping (Pailitao), live short videos, interactive marketing, real-name authentication, trying on makeup, and search recommendation. Currently, MNN runs more than 100 million times each day. During the 2018 Double 11 event, MNN was used in scenarios like smiley face red envelopes, scans, and a finger-guessing game.
Pailitao is an image search and recognition feature in Mobile Taobao. Since its first release in 2014, Pailitao has become an application with more than 10 million UVs over years of development. Technologies behind this feature are also continuously evolving and developing. Originally, to search an item, a photo was taken of an item and then uploaded to the cloud for image recognition. Currently, object recognition and picture editing are first performed on the mobile side and the edited pictures are uploaded, significantly improving the user experience and reducing the computing cost on the server side. Currently it supports recognizing common simple objects and logos in real time by using mobile-side models.
Smiley Face Red Envelopes was the first show performed on the 2018 Tmall Double 11 evening gala. This performance was based on real-time face detection and emotion recognition. Compared with interactive events based on screen touch, this show was a typical example of the transformation from tradition interaction through screen touch to natural interaction by using real-time facial detection and recognition algorithms in the camera. IT brings users a new experience.
Jiwufu (collection of five Fu cards) was one of Chinese new year campaigns in 2019. It was the first time that Mobile Taobao participated in this campaign by scanning goods for the Chinese new year. By scanning the red-colored goods purchased for the Chinese new year, people can obtain Fu cards and have a chance of winning physical rewards including duvets, Wuliangye, Maotai, and king crabs as well as Tmall supermarkets and Tmall Gene coupons with no threshold.
We plan to release a stable version every two months. Our current roadmap is as follows:
- Improve Converter graph optimization
- Improve support for quantization and add support for sparsity
- Add the model flops statistics feature
- Dynamically schedule and run policies based on hardware features of devices
- Continuously optimize existing backends (CPU/OpenGL/OpenCL/Vulkan/Metal)
- Optimize Arm v8.2 Backend and add support for quantitative models
- Use NNAPI to support the NPU backend
- Use fast matrix multiplication and Winograd algorithm to optimize the performance
- Include more documents and examples
- Improve test and benchmark tools
- Support more ops