Virtual Machines for TensorFlow 2.0

Source: Deep Learning on Medium

Virtual Machines for TensorFlow 2.0

Why risk your current workstation tool chain when you can try first?

Photo by Markus Spiske on Unsplash

Since TensorFlow 2.0 is now available, and a period of hectic business travel appears to have passed, I got to thinking about getting back to machine learning, software engineering, and doing what I would do if I wasn’t travelling. You can read the announcement about TensorFlow 2.0 from the TensorFlow Medium publication. Now just because TensorFlow 2.0 Beta has emerged does not mean you should just jump straight in. You need to test the changes, review your code base and assess any likely changes that might become necessary in due course. The question becomes how to do such an evaluation without breaking your current set-up. The answer is to use a Virtual Machine (VM) and you could either do a VM on a powerful local host, with RedHat (RHEL7/8) or just head over to the Cloud. In this article we head over to the AWS cloud and spin up a VM for a road test.

AWS Marketplace Machine image offerings

Anyone who has ever had to configure a workstation for Deep learning will know, only too well, how complicated things can get. First you need a CUDA capable Graphics Card. NVIDIA provide clear information about all their boards and this is the first step. The workstation must have a CUDA capable card. Next you need to install the CUDA toolkit to match your hardware. Again there are very clear instructions over at NVIDIA. Compiling the NVIDIA examples and running ‘deviceQuery’ can seem like a victory.

https://docs.nvidia.com/cuda/cuda-samples/index.html

Once you can run the ‘deviceQuery’ program, or any other NVIDIA provided program example, you have a verified CUDA capable tool chain.

That is usually good news but you will descend into dependency hell with Python. Consider Python 2 versus Python 3 and then the versions of Python 3.x. Next there are cuDNN versions, Blas library versions and many other dependencies you will wish you never heard of. Thankfully, there has been a great answer for quite a while and that is the AWS Marketplace and configured Machine images.

Bitfusion was always my the vendor for Ubuntu based configured Images

With the release of TensorFlow 2.0 I went looking for an image to try the new library out and I am not disappointed!

https://aws.amazon.com/marketplace/pp/B07MFRDXTB?ref_=aws-mp-console-subscription-detail
https://aws.amazon.com/marketplace/pp/B07MFRDXTB?ref_=aws-mp-console-subscription-detail

The ML Workbench for TensorFlow seemed like a good find. Signing up for the trail takes 5 minutes when you have an AWS account. Having found a potential configuration the next step is to deploy the image using a virtual machine.

Running the ML Workbench image

Once you have subscribed to the trail, you can automatically spin up a VM using a button from the subscription page. The VM may take a few minutes to come up the first time. Be patient!

The ML Workbench image running on a t3.2xlarge
Each Amazon Machine Image (AMI) comes with Usage Instructions right from the EC2 console.

Visit the description tab and retrieve the EC2 instance’s external IP address. Drop that into the Browser and that is all there is!

You might have got a Security issue from Firefox. I just accepted the risk
The image allows a range of connection options.
Logging in to the Desktop provides an RDP connection.

With a little more exploration we even have Visual Code installed

RDP connection — working with the Desktop directly

We can also work directly with Jupyter Notebooks which I guess is my preferred way to work for exploration and proof of the concept work.

You can work with the mnist dataset directly from the new machine

Okay, I guess you might say that is a lot of screenshots and if you haven’t faced configuring a workstation for the Deep Learning toolchain, then I guess you might not accept that these screenshots are a revolution for those who have suffered.

Once the model trains — you are good.

What about costs? couldn’t I do the same thing on my Computer for free? Let’s discuss costs and then wrap up.

AWS EC2 costs for this example

When I configured my instance, I choose a t3.2xlarge which has 8 vCPU’s and 32.0 gig of RAM. You can see the cost schedule for that computer resource below. It will cost $0.3341 per hour.

https://aws.amazon.com/ec2/instance-types/t3/
Using the subscription pricing tool

Adding the software subscription, which is the configuration know-how, of $0.07 cents per hour, the full cost is $0.403 per hour.

Or just go annual for $350.00

Now the other interesting thing is that, if you need more or less power, it is easy to vary the instance type. I could go from t3.medium to t3.2xlarge with just a setting change.

Select an alternative Instance Type and start-up the instance.

$350.00 dollars might seem like a lot, especially annually. Could we build our own or buy for less?

Rent versus Buy versus build

In the past I have built a few Systems and completed the entire configuration. Once you build one system, the next ones are just variations on a theme. Getting the Deep Learning tool chain to run is always a challenge. When you build your own, you need to keep a budget and match the components up carefully. A custom build system normally costs me around $1,500. Buying an off-the-shelf system is way more expensive. Here is a small illustration

https://lambdalabs.com/deep-learning/laptops/tensorbook

So you could spend $3,355 on a laptop or more. Renting makes sense and I will not leave the machine running for an entire year. In fact, I only run the machine when I need to and that is like the Power switch on the Physical computer system. Just don’t forget to turn the instance off!

The t3 instance has stopped and I only get billed for actual hours used.

Likely the entire evaluation of TensorFlow 2.0 will cost me less than $10.00 and I get to keep my current configurations with no risk of a broken tool-chain or other dependency hell items.

Conclusion

In this article I have provided some illustration of the pain involved in using GPU based Deep Learning and provided a good option to help you avoid the pain and do so cheaply. Naturally there are other good ways to avoid the pain, such as Google Colab, or Watson Studio. The important thing is to work with the data and models rather than dealing with tool-chain related error messages. I hope this article helps! Now go do some Machine Learning with a Virtual Machine or in the Cloud. It is way easier!