Online vs. offline Machine learning: Which one to use?

So you have decided to implement machine learning into your app and you want to know all of your options? or you just want to learn more about all the different approaches? then this article is made for you.

If you are reading this article probably you know or have some ideas of what exactly is machine learning but let’s review it, shall we?

Machine learning has many definitions but the most accepted and occurate one is:

Machine learning is the ability for computers to use statistical techniques in order to “learn” with data, without being explicitly programmed.

This definition has 3 main parts: Statistical techniques, Data and not being explicitly programmed. But what do these really mean?

Well, based on the definition of machine learning for most solutions, they combined of 3 main steps:

  1. Providing the Data
  2. Training a model from the provided data
  3. Using the trained model to make predictions

let’s say an example to make this more understandable. For instance you decided to create an app that can recognize different types of flowers such as Aconitum, Anthurium, Narcissus… etc.


For that, first you need to gather a data set including the pictures of all the different flowers that the app should recognize.

Then in the next step you need to create a model with a specific purpose of recognizing the types of the flowers and gives the exact type of information that app requires.

At last with the information provided by the model the app can predict the answer.

They are many important decisive factors when it comes to implementing the machine learning: How accurate it should be? What is the size of the data set needed? Should it require network connectivity? What sort of model you need?… etc. and then based on your app decide if you want to:

  1. Use an existing model or training your own?
  2. Train on your own computer or use the cloud?
  3. Access the trained model and do the predictions locally on device (offline) or in the cloud (online)?

So now let’s start comparing the different approaches in order of the questions above, one by one.

Use an existing model

One of the main questions is whether you need to train your own model or that is not necessarily compulsory for you and you can use one that is already exist.

Quickest and easiest way to start

If you don’t need to have your own model then one of the quickest and easiest ways is to use someone else’s model that is already trained and ready to use. (There is another easier and cheaper way to use offline machine learning SDK like Core ML and ML Kit which we will discuss in details later)

There are many companies that are providing machine learning packages that are designed for a specific purpose such as image classification, text analyses, speech recognition… etc. that you can make use of.

Some of the most popular service providers are:

If these services provide one of the specific tasks that your app suppose to do, they definitely worth to take the look at and to be considered.

You should note that by using these services you cannot access their models directly and you can only use them through an API.

architecture of how fully managed machine learning works

How to work with this? Your app simply just need to send an HTTPS request to the web service with the required data, like the picture of the flower in our example and then in the matter of seconds the service replies with the requested information.

How to keep the model up-to-date? One of the biggest advantages of using a web service is that you don’t need to be worried about the model and deal with all the headaches that will come with re-training your model or updating it as it is all being done automatically by the web service itself. They are constantly updating their models therefore the best part of it is that you don’t have to know anything about machine learning in order to use these services.

Do I have to pay? Of course, usually with these services you need to pay per request. Most of these services provide an SDK which makes it really easy for you to do the call to service’s API endpoint.

What do I need to know to start? nothing, that’s the best thing about it. You don’t need to have any kind of background in machine learning and you can simply just call the API and get the information you are looking for. You just need to read the documentation of the web service you choose to know how to call the API end points.


  • It is very simple and easy to use and you don’t need to have any kind of background in machine learning.
  • You don’t need to deal with all the headaches that will come with how to re-train or update your model and it is done all automatically.


  • You need to pay for the service, something around 1$ per 1000 requests.
  • These services only work with the most common data such as pictures, videos and speech therefore if you app uses some kind of unique data type then it won’t be the best option to use this option.
  • You cannot have direct access to the trained data model. what it means is that if you decide to inference locally on the device, you cannot.
  • There is a small delay when you send a request until you get the result.
  • Your app won’t work without the network connectivity.

You should note that with some of these services you are actually allowed to do a bit of training. For instance with Clarifai you can create your own model by uploading custom training images. Also with ML Kit for firebase(in beta) you can upload your own existing model by TensorFLow Lit.

Train on your own computer or use the cloud?

(if you already have your training model and don’t need to create one, skip this part.)

Training your model

Training is usually a really difficult and expensive process to do and it is only worth to do and make sense for big applications and businesses. For small applications it usually more often to use someone else’s model which we discussed in previous part. But if your data is unique or the existing solutions doesn’t satisfy you then you will need to train your own data model.

The accuracy and success rate of your training model depends on the amount of data you provide for your training model. Basically in most cases as much more data you provide the chance that the training model makes the right prediction increases more.

Also it depends on your app and the range of your possible predictions and that how important it is to have the right answer all the time in a cost of much more work.

Usually training the model with the provided data set is done in three ways: in your own computer, in the cloud or on the device itself.

When to use my own computer and when the cloud? Well it totally depends on the amount of data provided.

  • Small data set: when you are having a small data set you can train your model on your own computer or any other machine that you have.
  • large data set: as it requires much more power, you need a huge machine with multiple GPUs and it is really a job of for a high-performance computer.

Providing that high-performance machines to do such a work is usually really expensive and out of reach for most developers unless you have your own data center. For that reason for most cases it would make more sense to just rent a computer power as they are many cloud platform that are happy to do so. But in a long run if you re-train your model often or the data changes alot then for big businesses it actually might be better to provide a data center.

At the end it all depends on your situation to decide whether renting would be cheaper than actually buying?

Then What about training on device? in the case of small data set, if all the data needed for making a trained model is available on user’s device then there is an option to do all the work on locally on the device itself. But again it is only possible for small data set, if it doesn’t require much computing power. As the computing power of the smart phones is increasing day by day and by introduction of new SDKs such as Core ML and ML Kit which have been introduced recently it might be even the best option to use this method.

Training in the cloud

If you decide to choose the cloud option for training your model then you would facing with 2 options from it.

Either you can use hosted machine learning or you can use the general purpose cloud computing.

In most cases renting a cloud seems like much better and cheaper option compare to actually buying a high-performance computer. When you choose a cloud then your hands are open and it is very flexible depending on your needs. For instance in you want more compute power you can just simply upgrade your current package for that period of time.

So now first let’s start with the most affordable a suitable option which is renting a hosted machine learning.

Renting a hosted machine learning:

architecture of how hosted machine learning works

If the first option which was using a fully managed machine learning service isn’t suitable for you because you have special data that the service cannot provide then this option might be the best option for you.

This is a case between using a fully managed machine learning and doing everything by yourself on the cloud.

In this scenario you don’t need to have any knowledge about how machine learning works and how to train your model. You just simply upload your data set to the service and then it will handle it all automatically and let’s you to use your training model by calling it through an API.

What it means is that in most cases you won’t be able to download your training model and therefore you can’t have inference on the device (offline) and you are locked into that service and you can only use that service to be able to access your trained model.

You should note that there is an exception to this which is Google’s new Cloud Machine Learning engine. Unlike all the other service providers, this cloud service let’s you to actually not only to train your model but to also be able to export it, which let’s you do predictions offline.


  • You don’t need to deal with training the data. Just upload the data that’s it.
  • It’s very simple to integrate these services into your app.


  • In most cases you are locked into their service and cannot export your trained data. which means you won’t be able to do inference on device locally.
  • You not only need to pay for the computing a training model per hours for your usage but also you need to buy storage to upload your data, SQL service…etc. and you need to pay for each of these services individually.
  • It is less flexible as in some cases, the service provider only supports a limited number of models that you can choose from.

General purpose cloud computing:

architecture of how general purpose cloud computing works

In this option you can rent a cloud computing service from one of the service providers and then just pay for computing hours.

You are able to do any kind of task and you don’t have any limitations with this option. You can run your favorite training software and upload your own data set.

Some of the Service providers are:

When you are choosing this option you need to know what you are doing and how to train a model as your hands are open and if you don’t know what to do it will be just waste of money.

Also when using a cloud computing service, not only you need to pay for the compute hours for your training model but also you need to pay for the storage to upload your data.

After your training model is complete you can delete the instances and use your training model either locally on device (offline) or in the cloud (online).


  • It is very flexible, your hands are open and you can use any kind of training software to train any type of data.
  • You only need to pay for compute hours and since training can be done in a period of time, you only need to pay for short period of time and then when you wanted to update or re-train your model pay again.


  • You really need to have full confidence on what you are doing and have complete knowledge about it. Otherwise you should hire a person to do the training for you.
  • Not only you should pay for the service but you need to also pay for the storage for your data to upload.

Training on your own computer

architecture of how training or your own computer works

Similar to the process done in the cloud computing, for training on your own computer the only difference would be that you use one or more of your computers physically . Similarly your hands are open to run any of your preferred training softwares for any type of data that you have.

It is very flexible but again you should have in mind that when you are choosing this option you need to have knowledge about model training and that how machine learning really works.

Also it will be really suitable option for you if your data is sensitive and you need to have full insurance that your data is safe. Even-though the cloud companies usually has really good security for your data it won’t be as safe as having it locally on your own computer.


  • It is very flexible, your hands are open and you can use any kind of training software to train any type of data.
  • You have complete control over all of your data.
  • You won’t pay for renting a cloud computer or a cloud storage


  • You really need to have full confidence on what you are doing and have complete knowledge about it. Otherwise you should hire a person to do the training for you.
  • You need to buy a high-performance computer if you don’t have access to one.
  • You need to pay for the software and all the electricity bills and everything that keep your computer running.

Summary for training options

At the end it all comes to you that whether you have knowledge of how to train a model or not, and whether you want to have your model in your hands which ables you to implement it in anyway you want (online or offline on the device).

There is an exception that is how often would you like to re-train your model. If you are planing to do online training then you should really think about whether it would be better option to use the cloud or training on your own computer would be cheaper after all.

So based on those two main questions let’s compare them.

Whether you have knowledge of how to train a model?

If you don’t know how to train a model and you don’t have any background in that area then you should consider using hosted machine learning as you only need to upload your data set to the service and then it takes care of training the model by itself. By using these services you will be locked into their system and you can only access your model through API requests but the exception to this would be Cloud Machine Learning engine.

If you do know how to train a model then your options are:

  • Training on your own computer
  • Use a general purpose cloud computing for training

And based on their advantages and disadvantages decide which one is more suitable for your based on your use.

whether you want to have the option to export your model?

If you need to have the ability to export your trained model which let’s you to be able to use it in any way you like (either locally on device or in the cloud) then your options are:

  • Training on your own computer
  • Use a general purpose cloud computing for training

And if it is not your main concern then the hosted machine learning can also be considered.

Do the predictions locally on device (offline) or in the cloud (online)?

When it comes to use your trained model to do the predictions you have two options:

  • Locally on device (offline)
  • In the cloud (online)

Certainly choosing the right option in this part is one of the most important decisions when it comes to implementing your machine learning as this decision might have big impact in the speed, power, privacy and cost.

Each of these services have their own tradeoffs. For instance when by choosing the cloud option the app should be connected to the network for it to work and in the other hand if you do the predictions locally on device you are always limited to the hardware restrictions can cannot perform any type of machine learning due to the RAM and CPU limitations.

But let’s first see what are these options really? and how each of them works?

As you have seen the first steps for implementing a machine learning is gathering all of the data that you need to use for your model and by the options we discussed train a model. These are the common steps between both inferences on device(offline) and in the cloud(online).

From here if you choose:

  • On device (offline), then it means in most cases, once you have your training model in use for the predictions then you cannot update that model easily. Over time when your model got out-of-date and didn’t work perfectly in the way that it should, you need to re-train your model with more/newer data and then update the app which includes the new model.
  • In the Cloud, You don’t need to be worried about re-training your model and how to publish the new model. Your model is continuously being updated. Also if you decided to re-train your model because of any reason then it won’t be a problem and you can easily just update the model in the cloud and everyone can get benefit from it.

Locally on device (offline)

architecture of how inference locally on device works

Whether choosing this option is totally up to the case of your usage. If your model is not that heavy and huge then inference locally on device can be an option for you to consider.

One of the biggest examples of this would be the Core ML framework from Apple. When you reference to the documentation of this framewrok on their website, you can see this part,

Core ML is optimized for on-device performance, which minimizes memory footprint and power consumption. Running strictly on the device ensures the privacy of user data and guarantees that your app remains functional and responsive when a network connection is unavailable.

This text is referring to two of the main reasons why to use this method: ensuring the privacy of user’s data and no need for network connection.

But what are the reasons generally? It is all divided into 3 categories of: Speed, Cost and Privacy.

Speed: Since you don’t need to rely on network connectivity in order to use this method and all the requests are sent locally on the device itself, then speed is the main advantage in this method. It is much faster and more reliable compare to inference in cloud and you can use it without any worries. It also allows you to do things like realtime predictions on the big data types since you won’t need to send the data over the network which is impossible when you use inference in the cloud.

Cost: You use the user’s device itself for doing predictions and don’t need to pay for the cloud and all the other small things that comes with it. It will have a huge impact on big applications and the number of requests increase and you can keep the cost low and have control over it.

Privacy: This is more about the privacy of the user’s data as it is all being handled inside the device and no data will be upload to anywhere. In one sentence, data will never leave the device.


  • There is no need for any type of network connectivity for it to work
  • Speed is much faster and in some cases allows you to do things which are impossible when you use the cloud
  • You don’t need to deal with all the server side problems. For instance when your app becomes more popular there is no need to you to scale up.
  • The data of the users is safe.
  • Since you don’t use the cloud, then you don’t need to pay for anything.


  • There is always hardware limitations and if your trained model is big it is almost impossible to do inference on the device due to the performance limitation of the device.
  • Including the model in the app bundle will increase the size of the app download signficantly, often by many megabytes.
  • It is usually really difficult to update the model. Users need to update their app in order to benefit from the new model or the app needs to download it automatically.
  • Hard to use in other platforms. For other platforms, you need to do the inference for each individually.
  • Other developers can dig around inside your app bundle. It’s easy to copy over the learned parameters, and if you’re including a TensorFlow graph definition or caffemodel file it’s very simple for unscrupulous people to steal your entire model.

In the cloud (online)

architecture of how inference in the cloud works

Once you have the training model you can setup a server either on your own computer or the one you have rented from a cloud service provider. After that you need to upload your training model into it and the server gives you API endpoints that your app can call through a network connection.

When you are choosing this option, most of the complexity of the app will be on the server side and the app itself would be simple. The model can continuously be updated or also you can re-train and improve a model with new features and deploy it on the server. With this option you won’t be needing for update your actual app every time that you have a new model and the app can automatically use it through the server.

You should note that instead of rolling your own API from scratch you can also use existing tools, such as TensorFlow Serving.

What are the main reasons to choose this method? there are 2 main categories that why you may be considering to choose this over inference on device: Convenience and Performance.

Convenience: there are few reasons why it would be more convenient to use a server. The first one would be that since your inference is in the cloud you can easily get to use it in any platform easily and you won’t be needing to implement the inference individually for each, over and over. The second reason is that you don’t need to obfuscate the data since normally, nobody can steal your training model which means it can be more secure for you as a developer.

Performance: Since the model is in the cloud, it can continuously improve and get better without the need to be re-trained and it can be much smarter and be more accurate. Also since there will be no hardware limitations, you can do any type of predictions and your hands would be open to do anything.


  • You can update the model anytime you want.
  • Since machine learning logic would be on the server, you can easily port the app to different platforms such as iOS, Android, Web…etc.
  • It will be more secure for you as developer since normally, no one can steal your trained model and parameters.
  • The app will be much simpler as all the logic and complexity will be on the server and it will be really nice specially if you already use a back-end for your app as you can integrate with it.


  • User always has to be connected to a network in order to use the service
  • You need to deal with all the typical server headaches such as protecting against hackers, denial-of-service attacks, preventing downtime…etc.
  • You need to pay for it. As users are constantly sending files and data to the server you need to pay for the bandwidth and it will be hard to have control over the cost.
  • If your app becomes very popular, you may need to scale up to many servers. It’s not good for business if your app goes down because the servers are overloaded.


So there are many options that you can choose from and many ways to start implementing and using a machine learning. Each of these ways has their own advantages and disadvantages and at the end it all depends on you and your usage.

It is really hard to recommend any of these ways for all cases as one can be the best option for one person but it won’t be for another person. But based on the conclusions and comparison between each of these parts separately we can say:

  • If you just want to get started and you don’t know anything about the machine learning and how you can train a model and the costs isn’t your main concern then it would be better for you to consider fully managed machine learning.
  • If you have exported your trained model and having the best speed and availability without the network connection is crucial, and you don’t plan to implement really huge machine learning and also you want to just deploy the app for maximumly one or two platforms then your best bet would be to choose to do the predictions locally on device (offline).
  • And if you are planing to do something performance heavy or you want to deploy the app for multiple platform then you should take a closer look at inference on the cloud (online).

I hope this article has been useful for you and helped you to have a better idea on how to start implementing machine learning into your new app.

Good luck with your new app and have a nice coding time ;)

Some of the references:

Source: Deep Learning on Medium