Data Science with Python: Getting Started

Original article can be found here (source): Artificial Intelligence on Medium

If you’re new to all this deep learning stuff, don’t worry — I’ll take you through it all step by step. If you’re an old hand, then you might want to skip ahead a few posts. I do however assume that you’ve been coding for at least a year, and also that (if you haven’t used Python before) you’ll be putting in the extra time to learn whatever Python you need as you go.

If you have a computer, an internet connection, and the will to put in the work, that’s about all you require. You don’t need much data, you don’t need university-level math, and you don’t need a giant data centre.

You’ll be surprised how easy it is to get started!

Do you need a GPU?

GPUs (Graphics Processing Units) are specialized computer hardware created to render images at high frame rates. Since graphics texturing and shading require more matrix and vector operations executed in parallel than a CPU (Central Processing Unit) can reasonably handle, GPUs were made to perform these calculations more efficiently.

It so happens that Deep Learning also requires super-fast matrix computations. So researchers put two and two together and started training models in GPU’s and the rest is history. Deep Learning only cares about the number of Floating Point Operations (FLOPs) per second, and GPUs are highly optimized for that.

Source: fast.ai

In the chart above, you can see that GPUs (red/green) can theoretically do 10–15x the operations of CPUs (in blue). This speedup very much applies in practice too.

If you would like to train anything meaningful in deep learning, a GPU is what you need — specifically an NVIDIA GPU (it’s the fastest one out there currently).

But despite how lucrative GPUs seem, you DON’T require one as you’re getting started. Unless your project is that advanced and requires a ton of calculations, your CPU can handle it pretty much. However, if you do want a GPU (if your computer doesn’t have one built-in), I would suggest you rent access to a computer that already has everything you need pre-installed and ready to go. Costs can be as little as US$0.25 per hour while you’re using it.

Code Editors & Environments

Visual Studio Code is my go-to code editor

In Data Science, the general advice (especially if you’re a beginner) is to use some sort of a beginner-friendly environment like Jupyter or Anaconda, but I use VS Code having configured it to support my Data Science projects.

Prior Knowledge on Python

Source: python.org

This mini-series on Data Science does assume you’ve been coding for at least a year. It doesn’t matter which language — as long as you’ve had good experience with programming, you should be fine. If you aren’t familiar at all with Python, don’t fret! I’ll link helpful resources along the way.

If you haven’t had any experience with code, I’d recommend learning Python. It’s (really very) easy and it’s the programming language we’ll be using in this Data Science Mini-Series.

Helpful Resources

Photo by Ed Robertson on Unsplash

Quick Resources to get gain (or refresh) your Python knowledge

    Intermediate programmers:

      Advanced Programmers (but maybe new to Python):

        Python Numeric Programming:

        This is worth a read whether you’re a beginner or advanced programmer. We’ll be using a lot of numeric programming throughout this series.