Can we run multiple deep learning models on the same GPU?

Source: Deep Learning on Medium

Can we run multiple deep learning models on the same GPU?

The GPU would be kicking their fans like a jet, as long as you have enough memory to host all the models. There is a feature in tensorflow , you can instantiate individual tensorflow sessions for each model, and by limiting each session’s simultaneously as long as you’re using multiple threads. Although if you want to retrain your model, you’d have to host a copy of it, save the weights and tehn reload them in your prediction session.
But before you do this stunt, NVIDIA doesn’t support virtualization of GPU. To break one GPU for several tasks running at the same time, there’re two possible ways:

1. Either Divide the cores of GPU into multiple groups and divide the GPU memory accordingly. As it difficult , Cuda only supports the exclusive usage of GPU memory. It means that you have to rewrite a big chunk of Cuda. For deep learning training, GPU memory is very important. Reducing GPU memory slows down the training.

2. Or make a way to train two models to train in certain time. Time sharing fives each running task the whole GPU memory.

But I would suggest you to buy another GPU. That could be 1060 or 1070 which comes around 500$ to 700$

On the other hand, you should take a look at services like AWS Sagemaker, which take care of autoscaling for both prediction and training, and seamlessly retrain the model without interrupting prediction. This will invariably use a second GPU instance for training but it gets rid of headaches related to prediction downtime.