Install Tensorflow GPU with CUDA 10.0 and cudNN 7.4 for Python on Windows 10

Source: Deep Learning on Medium

Believe me or not, sometimes it takes a hell lot of time to get a particular dependency working properly. I was stuck for almost 2 days when I was trying to install latest version of tensorflow and tensorflow-gpu along with CUDA as most of the tutorials focus on using CUDA 9.0. So, I decided to take a step forward and build it from source.

This version is dedicated to Windows but let me know in the comments below if you want it for *NIX .

This tutorial focuses on installing tensorflow, tensorflow-gpu, CUDA, cudNN. At the time of writing this post, the latest observed version of tensorflow was 1.12.

For simplicity purpose, I will be using my drive d for cloning tensorflow as some users might get access permission issues on c drive. Also, make sure to have atleast 15 GB of free space.

To be able to build TF from source, you first need to have CUDA Compatible GPU which you can check in your Device Manager -> Display Adapters

Device Manager | Video Adapters


Download and install the following Microsoft Products:

  • Install Visual C++ 2015 Redistributable Update 3 x64
  • Microsoft Build Tools 2015 Update 3

You can find the above software here:


Download and install CUDA. If you have previously installed any CUDA products, I would strongly recommend to remove all existing CUDA drivers and Reboot the system.

Download CUDA here. Reboot the system after installation. You can check the installation using the following command in cmd: nvcc –version

You should get an output similar to this:

NVCC Version


Install MSYS2. It provides you a set of basic Linux commands on Windows.
MSYS here and install with all default settings selected.

Starting MSYS: Start MSYS by executing the following command in cmd:

"C:\msys64\msys2_shell.cmd" -use-full-path

It needs to be updated before use, so copy paste the below commands one by one.

pacman -Syu

Restart MSYS by closing it and then starting MSYS

pacman -Su

pacman -S git patch unzip zip ed

Complete all the installation with [Y]


Install Bazel on Windows

Start MSYS and then run the following commands:

cd /c/

mkdir bazel

cd bazel/

wget -O bazel.exe


Install cuDNN

Download cudNN from here {May require you to signup}

Extract the files to some location. Go to the extracted folder and copy all the files / folders (Bin, Include, Lib, etc.) and paste it in “C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0”


Install the following python dependencies before proceeding to download tensorflow as this is the last pre-requisite step (😅)

pip install pip six numpy wheel mock
pip install keras_applications
pip install keras_preprocessing


Configuring Tensorflow, downloading the build, source configuration.

Again, start MSYS if its not running and follow the following commands:

export PATH=/c/bazel:$PATH
cd /d/
git clone
cd tensorflow/
git checkout r1.12

Specify Python Path and script location else press Enter if its default.

Please specify the location of python. [Default is C:\Users\xyz\AppData\Local\Programs\Python\Python36\python.exe]: {press enter}
Found possible Python library paths:
Please input the desired Python library path to use.  Default is [C:\Users\thete\AppData\Local\Programs\Python\Python36\lib\site-packages] {press enter}
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: Y

Do you wish to build TensorFlow with XLA JIT support? [y/N]: N

Do you wish to build TensorFlow with ROCm support? [y/N]: N

Do you wish to build TensorFlow with CUDA support? [y/N]: Y

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 10.0
Please specify the location where CUDA 10.0 toolkit is installed. Refer to for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0]: {press enter}
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]:
Please specify the location where cuDNN 7 library is installed. Refer to for more details. [Default is C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.0]: {press enter}

You also need to check your GPU Device compute capability. You can check the number here. (GTX 1060, 70, 80 has compute capability of 6.1)

Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5, 7.0]: 6.1
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]: {press enter}
Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]: Y


There’s a small bug in source code which fails when compiling for half eigen. So, you need to edit few files before proceeding on to the next step.

  • Download eigen_half.patch.
  • Paste the file to: D:\tensorflow\third_party
  • Open the following file: D:\tensorflow\tensorflow\workspace.bzl in a text editor.
  • Search for name = “eigen_archive”, 
    And add the following line, in that block:
    patch_file = clean_dep(“//third_party:eigen_half.patch”),


Build Tensorflow using Bazel. This process takes long time. For me it was roughly 70 minutes but it highly depends on your system configuration.

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

This will generate a Build file in the below location. However, the location will be printed on the last / second last line once the compilation is successful and the build is ready.


Navigate to your build folder and issue the following command:

build_pip_package tensorflow_pkg

The above command will generate a tensorflow_pkg folder with a wheel file. You can install the wheel file using pip install {wheel_file_name_here}


A Sample code to run on CUDA GPU:

import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

# Device to use:
# cpu
# gpu
device_to_use = "gpu"
i = 1000
shape = (i, i)
if device_to_use == "gpu":
device_to_use = "/gpu:0"
device_to_use = "/cpu:0"
with tf.device(device_name):
random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
dot_operation = tf.matmul(random_matrix,tf.transpose(random_matrix))
sum_operation = tf.reduce_sum(dot_operation)
startTime =
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
result =
print("\n" * 3)
print("Shape:", shape, "Device:", device_to_use)
print("Time taken:", - startTime)
print("\n" * 3)

Hope the above code works fine. I was getting average time of 04.10 seconds on gpu and 46.5 seconds on cpu with shape of matrix as (15000, 15000) .