.Net Developers can Write Machine Learning Code Too: The Case for and Against ML.NET

If you are working on machine learning projects in the real world you live and die by Python. Over the years, the Python ecosystem has slowly building a rich assembly of frameworks, research and tools that makes it the favorite destination for data scientists. However, more and more we are seeing new frameworks and tools that are attempting to bring machine learning capabilities to existing large developer communities. In the case of .Net. Microsoft has been slowly complementing its impressive machine learning infrastructure platform with frameworks and libraries that make machine learning/machine learning more accessible to .Net developers.

Earlier this week, Microsoft announced a new version of its ML.NET stack which attempts to enable a simple experience for .Net developers building machine learning applications. Originally developed by Microsoft Research, ML.NET provides C# and F# programming models that enable the creation training and execution of machine learning models. The core architecture of ML.NET can be divided in four fundamental components:

· Data Transforms: These are components in a machine learning pipeline that enable the data transformation routines. The current version of ML.NET supports different types of data transforms such as combiners and segregators, featurizers, row filters and many others.

· Learners: These are the basic machine learning models included in ML.NET. The algorithm portfolio is still relatively basic but it includes some of the fundamental machine learning models such as linear regression or k-means.

· Misc: These are utility components that are necessary to build machine learning capabilities such as optimization or regularization. Examples of components in this category include evaluators, calibrators and several others.

· Extensions: ML.NET applications use extensions to leverage different underlying runtimes such as TensorFlow, Accord.NET and Microsoft’s own Cognitive Toolkit.

Using ML.NET

The first step to get started with ML.NET is to install the ML.NET NuGet package using the following code:

dotnet add package Microsoft.ML

After that, we can create an instance of the LearningPipeline class which is the main component used for data loading and featurization of a machine learning model.

var pipeline = new LearningPipeline();

Using the pipeline, we start assembling our machine learning application. For instance, loading data from a text file can be easily accomplished as follows:

pipeline.Add(new TextLoader(_dataPath).CreateFrom<SentimentData>());

In most machine learning scenarios, data needs to be pre-processed and cleaned. That can be accomplished using a technique called feature engineering which is another element we can add to our pipeline.

pipeline.Add(new TextFeaturizer("Features", "SentimentText"));

At this point, we can select the algorithms we are going to use and set the appropriate configuration of hyperparameters.

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

The model can be trained by calling the Train operation and evaluated using the Evaluate routine.

PredictionModel<SentimentData, SentimentPrediction> model =     pipeline.Train<SentimentData, SentimentPrediction>();

Finally, we can execute our model and evaluate the results.

//evaluate the model
BinaryClassificationMetrics metrics = evaluator.Evaluate(model, testData);
//get the predictions
IEnumerable<SentimentPrediction> predictions = model.Predict(sentiments);

As you can see, ML.NET provides a very basic entry point for .Net developers to enter the world of machine learning. However, like many of other “bridge machine learning frameworks” we should setup the right constraints for the use of ML.NET

What ML.NET is Not?

ML.NET is a fantastic vehicle for .Net developers to get started implementing basic machine learning applications. However, the framework has some very tangible limitations that makes it large unpractical for many of the machine learning scenarios we encountered in real world applications. In its current form, there are several challenges that developers should be aware of before embarking on using ML.NET.

· For starters, the algorithm library is fairly limited which reduces the possible architectures that we can structure on a specific pipeline.

· The simple nature of the Pipeline model is also one of its main limitations as most machine learning workflows hardly follow the 4-step sequential structure of ML.NET Pipelines.

· There are virtually no optimization or regularization tools that currently support ML.NET which limits its viability in real world applications.

· The idea ML.NET is meant to be used for implementing basic machine learning is challenged by the fact that there is no clear programming model to transition to more sophisticated stacks such as the Cognitive Toolkit.

Overall, ML.NET is a viable effort to bridge the .Net developer and machine learning communities. If the .Net community embraces ML.NET, then the new framework can become an interesting force to be reckon with in the machine learning ecosystem. For now, there is still a lot of work to be done to bring ML.NET to part with competitive alternatives.

Source: Deep Learning on Medium