Source: Deep Learning on Medium
Whats the deal ?
Kaggle kernels and Google collab gives data science practitioners chance to work with Deep Learning libraries preinstalled. These services are free and from last year they also provide free gpu access … though for limited amount of time. Google silently rolled out “google-tpu” support as well, though better than gpu’s in some cases mostly we do not get any performance gain over gpu’s.
For Kaggle it makes more sense, because now it becomes very fare game between students and the ones who have costly gpu’s available. When I used to participate few years ago I never tried running neural networks for any competition as I could not afford cloud-gpu and did not own any graphics card myself. And that made huge different those days during competition. Even running hyperparameter tuning for XGBoost was difficult task as with only cpu’s that could take up a whole day !!
Limitations in free services and cloud solutions
Before purchasing new laptop with max-q gpu I tried out these services. Its free so why not !! But there are limitations, which at-least I can not afford. One of the major limitation I faced is installing new library other than those come preinstalled … and it is very painful and most of the time not possible to do.
Second limitation is that notebooks becomes inactive after few hours on google-collab. This is a major issue as deep learning algorithms takes hours and you won’t always work on them OR keep doing something on notebook to keep them active. Workaround for that is save model every few epoch so to start from there onward. But I hate this as most of the time I run a code in the night and go to sleep and if code doesn’t run after few hours I have already lost so much of time by the time I wake up. In case of kaggle, when you commit and if code takes >1 hour … script or notebooks run fails. It fails randomly also sometimes !!
I wanted some solution that I can use to run overnight without any bother and with similar performance. Also, I do not always want to be connected to the internet. So, I chose “gtx-1070-maxq” based laptop .. most affordable I could find. Now I am aware that 8Gb of memory could be limitation in some cases when data is huge and I have to limit batch-size to fit in training-data in gpu memory. But for doing proof-of-concept and experimentation that’s not at all a limitation. But I wanted to check how this mobile gpu stacks up against K80 12Gb gpu given free by google-coolab and kaggle-kernel in performance.
Hardware specs comparison
Its an older architecture “Kepler” but has 384bit memory bus. 2496 cuda cores. Don’t be fooled by 2x here. Free services like kaggle and google only gives 12Gb of vram … actually it is 11Gb if you check yourself. Nonetheless good for a free gpu.
In comparison, gtx 1070 max-q has newer architecture but less memory bus of 256bit and less cuda cores than K80. But if you check above core-clocks and memory-clocks are way better … almost double !!
But will they even matter ??
I chose keras-examples to check the performance of these two gpu’s. These examples are neatly written and easy to run. I ran the examples where number of epoch were less and only took mnist examples this time. I chose, kaggle-kernel notebook and compared with my laptop gpu.
Here are the results for the runs. I did run multiple times and took the best results for both the platforms.
As you can see in above comparison, I noted down validation accuracy and loss for each of the program possible. Did not change code in any way … just ran them. Accuracy and Loss are almost similar as expected but did saw that on K80 there was variation in multiple runs I did. Similar case for google-collab where multiple runs gave slightly different accuracy . Something to bear in mind that these platforms though free are shared. But on my gtx-1070 results were consistent.
Most notable thing in above chart is time to run for each epoch. Check the Speedup column. Gtx-1070 max-q was 2-to-3 times faster all the time. Now that was very surprising to me. My perception of gpu performance till now was more bandwidth/memory-bus and more cuda-cores are always better. But here more memory & core clock speeds outperformed K80 easily. It may be a thing of new generation architecture … I am not a hardware expert.
This experiement changes many things for me.
Real world scenarios
I have worked on neural-network based projects before and unlike kaggle there are more challenges involved. Data was huge and run-times even with gpu were in several hours. In most scenarios I was given Macbook and aws p2.xlarge to work with. p2.xlarge instance costs around 1USD per hour and Macbook-Pro costs fortune (Literally :P). So, everything was a costly affair.
Doing experimentation and testing with this setup is eventually costly. Now, why I am talking too much about the cost … because it matters when you work professionally. Your contribution, project outcome is judged given the resources provided and IT-team and Management keeps a keen eye on the same.
Now coming back to the main topic of discussion … in nutshell using laptop with gtx gpu makes more sense than using Macbook-Pro and aws-gpu instance. Money-wise and performance wise as well. In productionisation, cloud has no replacement but for every day work it makes lot of difference. Imagine running a forecast-model in 1 hour instead of 2.5 hour … it gives you flexibility to finish-up daily task and breather period for any failure. Also, tuning model and testing can be 2–3 times faster which is huge improvement for me at-least !!!
Not suggesting anything here … just some hard-core number shown above. Above comparison though run over kaggle kernel service with gpu … should also be considered similar for google-collab and p2.xlarge as all of them use same gpu and there is no cpu bottleneck here. I also made sure that I am getting all 11Gb to use. At least, now I am happy about my choice to buy a laptop.
Give data scientists some workstation machine or at-least gpu-based laptop to work with. It will certainly lift their spirit up and also project delivery time will be less.