Source: Deep Learning on Medium
Easy and proper step by step guide on installing python, IDE, deep learning framework(Tensorflow/ Theano/ PyTorch), keras, OpenCV, dlib and other popular data science packages with a short description of them.
For anyone who is new to data science and with a windows’ OS, often face problems in setting up their laptop for machine learning/deep learning. So in this article, I would share my personal experience in setting up a system typically for Data science task.
First and foremost let us begin with installing Python. Python is a powerful programming language for many different applications. In recent years python has become popular among data science for its capability to analyze data with an overgrowing community. So let’s begin with installing python
Click here to download python 3.6.6 -64bit (Recommended*) or Click here to download the latest version of python as of date. Make sure to click on “Add Python 3.6 to Path” while installing python, as this will add python to windows’ environment path. Follow the screen below:
*Recommended version is 3.6.6 but you can any-day choose a latest version available.
There are plenty of IDEs available on the internet but Pycharm by Jet Brains is the coolest IDE I’ve ever come across, it saves lots of coding time, by assisting you with code completion, code debugging, code navigation, etc. So let’s begin with installing Pycharm
Click here to download PyCharm community.
After installing Pycharm (64 bit). Let’s setup Pycharm and properly integrate it with base Python, which we have just installed. (here we are not creating any virtual environment, for simplicity) Follow along with the screen below:
Step 1–3 is pretty straightforward, just create a new project to get started.
Step 4–9 will setup your project interpreter with base python on your system.
We are done with setting up a Python environment, but we aren’t done yet. You need to install all data science packages too.
Python can import functions from a library (package) and we need to install libraries to use their function in the first place. So here pip comes very handy, pip is a package management system used to install python’s library and luckly it was installed while installing python.
Numpy is a math library/ foundation library for scientific computing in Python. It enables us to do mathematical/ matrix computation efficiently and effectively and it’s anyway better than regular python because of its efficiency and speed of calculation and it is highly used by analyst and data scientist.
Open Command prompt window and type in “ pip install numpy” and hit ‘enter’ to install numpy. Yes! It’s that simple.
#Update pip by using below command then install numpy
python -m pip install --upgrade pip
pip install numpy
Validate the installation by using ‘import’ command followed by the library name in Python console.
Pandas provide Excel-like table with high-performance data manipulation and analysis tool using data structures (Data Frame). The name Pandas is derived from the word Panel Data — an Econometrics from Multidimensional data.
In Command prompt type in the below command to install pandas
pip install pandas
SciPy library provides convenient and fast N-dimensional array manipulation for mathematics, science, and it highly depends on NumPy.
In Command prompt type in the below command to install Scipy
pip install scipy
Scikit-learn library provides a range of supervised and unsupervised learning algorithms
pip install scikit-learn
Deep Learning Frameworks:
TensorFlow is one of the world’s biggest open source project, helps us to build and design Deep Learning models. It performs numerical computations in the form of a Dataflow graph. (released by Google in 2015)
pip install tensorflow
For GPU enabled machine, try out tensorflow for GPU it’s much faster than CPU. You can find out the performance test here. Install/Update nvidia driver, cuda toolkit and cuDNN before installing tensorflow for GPU.
#only for GPU user
pip install pycuda scikit-cuda pygpu
pip install tensorflow-gpu
Theano allows us to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays and it runs efficiently on either CPU or GPU architectures just like tensorflow.
pip install Theano
PyTorch is deep learning research platform that provides maximum flexibility and speed. It’s quite similar to tensorflow.
pip install http://download.pytorch.org/whl/cpu/torch-0.4.1-cp36-cp36m-win_amd64.whl
pip install torchvision
the above pip command will only work if you have python 3.6 and for a CPU version, for some different version of python and GPU follow this link.
Keras is a high-level neural networks library, capable of running on top of TensorFlow or Theano and it is easy to understand.
pip install keras
OpenCV (Open Source Computer Vision Library) as the name suggests is a computer vision and machine learning library. The library has more than 2500 optimized algorithms. These algorithms can be used to detect and recognize faces, identify objects, track moving objects, etc.
pip install opencv-python
pip install opencv-contrib-python
Dlib is written in C++, also available in python. It’s mainly used for machine learning algorithms, including computer vision.
pip install dlib==19.5.1
I was facing some issue with the latest version of dlib, so i suggest version 19.5.1
Matplotlib is a visualization library and it’s used to make 2D graphics in different interactive environments.
pip install matplotlib
Seaborn is a visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics. It is built on top of matplotlib and closely integrated with the pandas data frame.
pip install seaborn
Natural Language Processing (NLP)
NLTK (Natural Language Toolkit) is one of the most powerful NLP libraries. Tokenization, Stemming, Lemmatization, Character count, classification modeling are some of the few functionalities of NLTK.
pip install nltk
Gensim is used for robust semantic analysis, topic modeling and vector-space modeling.
pip install gensim