Difference between scikit-learn and tensorflow

Source: Artificial Intelligence on Medium

Difference between scikit-learn and tensorflow

1.Different functions

Scikit-learn (sklearn) is positioned as a general-purpose machine learning library , while TensorFlow (tf) is positioned as a deep learning library .

An obvious difference: tf does not provide the powerful feature engineering of sklearn, such as dimensional compression, feature selection, etc. The root cause, I think, is because of two different ways of processing data with machine learning models:

  • Traditional machine learning: use feature engineering to artificially refine and clean the data
  • Deep learning: using representation learning, the machine learning model itself refines the data

Sklearn prefers users to process data by themselves , such as selecting features, compressing dimensions, and transforming formats. It is a traditional machine learning library. The deep learning library represented by tf will automatically extract valid features from the data, and does not need to do this manually, so it does not provide similar functions.

2. Different degrees of freedom

The modules in scikit-learn are highly abstract. All classifiers can basically be completed in 3–5 lines. All converters (such as scaler and transformer) also have a fixed format . This abstraction limits the user’s freedom, but increases the efficiency of the model and reduces the difficulty of batching and standardization (through the use of pipelines).

Tf is different. Although it is a deep learning library, it has a high degree of freedom . You can still use it to do what traditional machine learning does, at the cost of implementing algorithms yourself. Therefore, it is not suitable to use TF analogy with scikit-learn. Keras encapsulated in tool libraries such as TF is more like scikit-learn in the deep learning world.

From the perspective of degrees of freedom, tf is higher; from the perspective of abstraction and encapsulation, sklearn is higher; from the perspective of ease of use, sklearn is higher.

3. Different groups and projects

sklearn is mainly suitable for small and medium-sized, practical machine learning projects, especially those that have a small amount of data and require users to manually process the data and choose the appropriate model. This type of project can often be completed on the CPU and has low hardware requirements .
tf is mainly suitable for projects that have clearly understood the need for deep learning and have low data processing requirements. Such projects often have a large amount of data and ultimately require higher accuracy, and generally require GPU-accelerated operations . For “learning” of deep learning, you can use keras for quick experiments on small sampled data sets. If you haven’t seen friends, if you look at the keras sample code, you can understand why keras is comparable to sklearn on deep learning.

model = Sequential() # define
model.add(Dense(units=64, activation='relu', input_dim=100)) #define network structure
model.add(Dense(units=10, activation='softmax')) # define network structuremodel.compile(loss='categorical_crossentropy', # define lossfunction, optimization method, evaluation criteria
optimizer='sgd',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=32) # training model
loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128) # evaluation modelclasses = model.predict(x_test, batch_size=128) # use the trained data for prediction

It is not difficult to see that sklearn and tf are very different. Although there are also neural network modules in sklearn, it is impossible to rely on sklearn for serious and large-scale deep learning. Although tf can also be used for traditional machine learning, including cleaning data, it is often more effective.

4.Scikit-learn & tensorflow combined use

In more common cases, you can use sklearn with tf, or even keras . sklearn is responsible for basic data cleaning tasks, keras are used for small-scale experiments to verify ideas, and tf is used for serious parameter adjustment (alchemy) tasks on complete data.
If you take sklearn out and look at it alone, its documentation is particularly good. Beginners will probably have a basic understanding of many aspects of machine learning when they follow the features supported by sklearn. As a simple example, sklearn often summarizes individual knowledge points, such as simple anomaly detection. Therefore, sklearn is not just a simple tool library, its documentation is more like a simple beginner’s guide.
Therefore, traditional machine learning libraries represented by sklearn (universal but highly abstract like the Swiss Army Knife) and free and flexible more targeted deep learning libraries represented by tf (highly free but cumbersome to use like Lego) are both It is a tool that machine learners must understand.

But sklearn is still necessary to learn

In theory, deep learning technology is also a component of machine learning . Learning other traditional machine learning methods is very helpful for deep understanding of deep learning technology. Knowing the conditions of the model’s convexity can better understand the non-convexity of neural networks. Knowing the advantages of traditional models can better understand that deep learning is not a panacea. There are also many problems and scenarios when using deep learning methods directly will encounter bottlenecks and problems that require traditional methods to solve.
In practice, deep learning methods generally require a large number of GPU machines. Even large companies in the industry have limited GPU resources. Generally, deep learning methods are only considered if they have far better results than traditional methods and greatly improve the business. Using deep learning methods, such as speech recognition, image recognition and other tasks are now more used in deep learning methods. In addition to machine translation in the NLP field, most other tasks still use traditional methods more often. Traditional methods are generally more interpretable, which is also very helpful for checking the debug model. The industry generally likes to recruit people who can solve problems, rather than those who have mastered the hottest technologies. Therefore, while learning about deep learning techniques, it is beneficial to learn about traditional methods.

end

To be honest, even now that deep learning is popular, many times you still have to solve problems with traditional machine learning methods. First of all, not everyone has a sturdy computer / server, and secondly, most problems really don’t require a deep network. Finally, programmers who only call the toolkit are not good machine learners.