Source: Deep Learning on Medium
Cloud based architecture
Here’s how a cloud based setup would look like, it would involve the steps detailed below:
Step 1: Request with input image
There are two possible options here:
- We can send the raw image (RGB or YUV) from the edge device as it’s captured from a camera. Raw images are always bigger and take longer to send to cloud.
- We can encode the raw image to JPEG/PNG or some other lossy format before sending, decode them back to raw image on cloud before running inference. This approach would involve an additional step to decode the compressed image as most deep learning models are trained with raw images. We will cover some more ground on different raw image formats in future articles in this series.
To keep the setup simple, first approach [RGB image] is used. Also HTTP is used as the communication protocol to POST an image to a REST endpoint (http://<ip-address>:<port>/detect).
Step 2: Run inference on cloud
- tensorflowjs is used to run inference on an EC2 (t2.micro) instance, only a single nodejs worker instance (no load balancing, no fail over, etc) is used.
- Mobilenet version used is hosted here.
- Apache Bench (ab) is used to collect latency numbers for HTTP requests. In order to use ab, RGB image is base64 encoded and POST ed to an endpoint. express-fileupload is used to handle the POST ed image.
Total latency (RGB) = Http Request + Inference Time + Http Response
ab -k -c 1 -n 250 -g out_aws.tsv -p post_data.txt -T "multipart/form-data; boundary=1234567890" http://<ip-address>:<port>/detectThis is ApacheBench, Version 2.3 <$Revision: 1843412 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/Benchmarking <ip-address> (be patient)
Completed 100 requests
Completed 200 requests
Finished 250 requestsServer Software:
Server Hostname: <ip-address>
Server Port: <port>Document Path: /detect
Document Length: 22610 bytesConcurrency Level: 1
Time taken for tests: 170.875 seconds
Complete requests: 250
Failed requests: 0
Keep-Alive requests: 250
Total transferred: 5705000 bytes
Total body sent: 50267500
HTML transferred: 5652500 bytes
Requests per second: 1.46 [#/sec] (mean)
Time per request: 683.499 [ms] (mean)
Time per request: 683.499 [ms] (mean, across all concurrent requests)
Transfer rate: 32.60 [Kbytes/sec] received
287.28 kb/s sent
319.89 kb/s totalConnection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 5.0 0 79
Processing: 530 683 258.2 606 2751
Waiting: 437 513 212.9 448 2512
Total: 530 683 260.7 606 2771Percentage of the requests served within a certain time (ms)
100% 2771 (longest request)
As we can see here 95% percentile request latency is around 1084ms.