Source: Deep Learning on Medium
Virtual Machines for TensorFlow 2.0
Why risk your current workstation tool chain when you can try first?
Since TensorFlow 2.0 is now available, and a period of hectic business travel appears to have passed, I got to thinking about getting back to machine learning, software engineering, and doing what I would do if I wasn’t travelling. You can read the announcement about TensorFlow 2.0 from the TensorFlow Medium publication. Now just because TensorFlow 2.0 Beta has emerged does not mean you should just jump straight in. You need to test the changes, review your code base and assess any likely changes that might become necessary in due course. The question becomes how to do such an evaluation without breaking your current set-up. The answer is to use a Virtual Machine (VM) and you could either do a VM on a powerful local host, with RedHat (RHEL7/8) or just head over to the Cloud. In this article we head over to the AWS cloud and spin up a VM for a road test.
AWS Marketplace Machine image offerings
Anyone who has ever had to configure a workstation for Deep learning will know, only too well, how complicated things can get. First you need a CUDA capable Graphics Card. NVIDIA provide clear information about all their boards and this is the first step. The workstation must have a CUDA capable card. Next you need to install the CUDA toolkit to match your hardware. Again there are very clear instructions over at NVIDIA. Compiling the NVIDIA examples and running ‘deviceQuery’ can seem like a victory.
Once you can run the ‘deviceQuery’ program, or any other NVIDIA provided program example, you have a verified CUDA capable tool chain.
That is usually good news but you will descend into dependency hell with Python. Consider Python 2 versus Python 3 and then the versions of Python 3.x. Next there are cuDNN versions, Blas library versions and many other dependencies you will wish you never heard of. Thankfully, there has been a great answer for quite a while and that is the AWS Marketplace and configured Machine images.
With the release of TensorFlow 2.0 I went looking for an image to try the new library out and I am not disappointed!
The ML Workbench for TensorFlow seemed like a good find. Signing up for the trail takes 5 minutes when you have an AWS account. Having found a potential configuration the next step is to deploy the image using a virtual machine.
Running the ML Workbench image
Once you have subscribed to the trail, you can automatically spin up a VM using a button from the subscription page. The VM may take a few minutes to come up the first time. Be patient!
Visit the description tab and retrieve the EC2 instance’s external IP address. Drop that into the Browser and that is all there is!
With a little more exploration we even have Visual Code installed
We can also work directly with Jupyter Notebooks which I guess is my preferred way to work for exploration and proof of the concept work.
Okay, I guess you might say that is a lot of screenshots and if you haven’t faced configuring a workstation for the Deep Learning toolchain, then I guess you might not accept that these screenshots are a revolution for those who have suffered.
What about costs? couldn’t I do the same thing on my Computer for free? Let’s discuss costs and then wrap up.
AWS EC2 costs for this example
When I configured my instance, I choose a t3.2xlarge which has 8 vCPU’s and 32.0 gig of RAM. You can see the cost schedule for that computer resource below. It will cost $0.3341 per hour.
Adding the software subscription, which is the configuration know-how, of $0.07 cents per hour, the full cost is $0.403 per hour.
Now the other interesting thing is that, if you need more or less power, it is easy to vary the instance type. I could go from t3.medium to t3.2xlarge with just a setting change.
$350.00 dollars might seem like a lot, especially annually. Could we build our own or buy for less?
Rent versus Buy versus build
In the past I have built a few Systems and completed the entire configuration. Once you build one system, the next ones are just variations on a theme. Getting the Deep Learning tool chain to run is always a challenge. When you build your own, you need to keep a budget and match the components up carefully. A custom build system normally costs me around $1,500. Buying an off-the-shelf system is way more expensive. Here is a small illustration
So you could spend $3,355 on a laptop or more. Renting makes sense and I will not leave the machine running for an entire year. In fact, I only run the machine when I need to and that is like the Power switch on the Physical computer system. Just don’t forget to turn the instance off!
Likely the entire evaluation of TensorFlow 2.0 will cost me less than $10.00 and I get to keep my current configurations with no risk of a broken tool-chain or other dependency hell items.
In this article I have provided some illustration of the pain involved in using GPU based Deep Learning and provided a good option to help you avoid the pain and do so cheaply. Naturally there are other good ways to avoid the pain, such as Google Colab, or Watson Studio. The important thing is to work with the data and models rather than dealing with tool-chain related error messages. I hope this article helps! Now go do some Machine Learning with a Virtual Machine or in the Cloud. It is way easier!