Introduction to Machine Learning in Python
List of Abbreviations
ML – Machine Learning is a data analysis method that automates the analytical approach.
DS – Data Science is a concept, which unifies statistics, data analysis, machine learning and their related methods.
MOOC – Massive Open Online Course is an online course with unlimited participation and open web access.
Software technologies are becoming more and more involved into our daily routine. Robotic constructions and devices are pushing human employees out of the market, smart electric advisors (e.g. Echo Alexa, which will be considered later in this publication) are fulfilling needs of curious consumers, health care devices are providing a vital aid for mature patients in hospitals and the list is far from the conclusion. Those and many other reasons could be seen as a motivation to learn or at least get well acquainted with ML technologies and their usage in contemporary digital solutions.
Objectives of the publication
The aim of the publication is to increase awareness of ML in general and specifically the possibilities of Python programming language in neural networks and DS. In terms of setting the stage for further research of these technologies, this publication will first and foremost broaden mind about various stages of ML, what groups are these stages divided into what algorithms and techniques are applied for evaluation of decisions that machine is supposed to make alongside with learning process etc.
The article is also can be considered as an illustrator of designing concepts of ML applications. It is concurrently meant to improve knowledge about application development process, which can be generally beneficial in the field of software engineering.
In conclusion, there is a space left for covering laboring and successful instances of manipulation with ML and DS nowadays. It is beneficial to take a gaze on already existing and executed digital solutions to build, on fundamentals of them, a view on potential technologies, which are yet to come.
Basic concepts of Machine Learning
ML is a subsection of Artificial Intelligence that has become incredibly popular in the last decade. It is based on studies fetched from statistical models and computer algorithms to accomplish tasks without relying on detailed manuals and commands, but using own embedded patterns instead. Patterns require to analyze some relevant to the specific task test information or “training data” in advance to be fully proficient in making right decisions when it comes to production phase data analysis. There are also various types of approaches and the most common of them are supervised and unsupervised learnings. [1, 1–17] The illustrative schema can be seen in Figure 3–1.
Types of learning algorithms
Supervised learning is meant to use the computer ability to identify elements based on the provided data. The computer remembers it to evolve the ability to recognize some brand new data, based on already researched one. Supervised learning algorithms may be implemented through different techniques such as classification and regression. [1, 19–23]
In classification the data is classified under different labels in accordance with data parameters, so that suitable labels are less complex to predict for the upcoming input data. [1, 57–66] Segregating email letters as either spam or spam free is an instance of classification algorithms at work. The curve line in Figure 3–2 shows the functionality of this technique.
In ML, regression algorithms formulate tendencies by estimating numeric or continuous input data. It compares known and estimated values and reveals the difference between expectation and prediction values as a bool variable (true or false). [1, 25–40] Estimating a person’s income according to age, years spent at the university etc. is an instance of the regression task, which is visually represented on the line below (Figure 3–3).
In unsupervised learning, your machine acquires only a group of input information. Afterwards, the machine might see the difference between the entered data and any other not suitable data. In contrast to supervised learning, where the machine is given some “testing data” for learning, unsupervised learning implies that the machine itself can distinguish patterns and relationships between totally different data sets. Unsupervised learning is usually implemented utilizing clustering technique. [1, 19–23]
The task of clustering algorithms is to group the most similar elements into provided number of clusters. [1, 67–76] For example, a network which consists of numeric values 1,2,5,6,8,9 was provided with 3 as a number of clusters, then the network tends to remove numeric values into those clusters, so that values 1 and 2 go to the first one, 5 and 6 to the second one and the rest goes to the last one. On the diagram in Figure 3–4 is indicated the result of clustering technique compilation.
Application development process in Machine Learning
The key purpose of this section is to deconstruct and consider more in details one of the most common application development processes, which is relevant to concepts of ML. First and foremost, it is necessary to find out what is the technology adoption in ML, what are it`s phases and how is it presented in the industry.
Technology adoption is a term that refers integration and inclusion of new technology. There are several phases of the technology adoption that take place in any sector of ML based applications. They are quick, early and assisted applications and independent operations. [2, 2]
Quick applications stage is where the business tries to apply ML technology on the goals that may be most easily achieved. In early applications the business could also think of improving the total worth of its existing operations. Here it also sorts cost revision for its various business operations conducted by its employees. [2, 3–4]
Assisted application is an application of machine intelligence, which purpose is to assist specialists with complicated issues. The main aim here is to expand the human capacity of business growth and advancement. The benefit here is to use the evaluation the business requirements for boosting the business. In this phase, the company is finding out the information about customers, employees, and operations, eventually as a result, attempting to grasp the business issues. [2, 4–5]
Finally, we proceeded to independent operations. In this stage, the machine reaches its maximum possible independence in making decisions and continuing self-learning process. As a result, during this transition happens the replacement of the human workforce to robotization. It also means that the ML capability can be used all the way to its peak performance. The machine has also developed capability to find and use hidden patterns and yet unknown trends to increase the efficiency of the overall production. [2, 5–6]
Below, on the Figure 4–1, you may contemplate aforementioned phases in the precise order of their execution with brief characteristics about every part included.
Representative Python Machine Learning projects
The following section is dedicated to the actual illustration of ML projects, that have been successfully implemented and ended up being beneficial to their users. There is a bunch of prosperous projects deployed to the production nowadays, but I would like to take a closer gaze on those of them, who stand out among others, in my opinion.
ML features and technologies have been introduced to many segments of our life, so healthcare segment is far away from being an exception. The healthcare industry is constantly requiring more and more staff for its infinite tasks such as reminding patients to take their pills in time, urge and swift records research, making some suggestions on how to improve the daily routine of a patient et cetera.
Those tasks are handled by an artificial tabletop bot Echo Alexa. In its essence, Alexa is developed using ML and Amazon Web Services (AWS) cloud infrastructure by Amazon company. Under the hood of Alexa is the ML code written in Python that assists it to learn from completed tasks through an embedded feedback gear. Echo Alexa is available to any Python developer to use and develop their own programs and products leaning on Amazon’s own base.[2, 8–10] The working principle of the device is described more specifically in Figure 5–1.
Dropout probability in MOOCs
Another research, which purpose initially was to reveal to the tutors of MOOC course the probability of a participant to drop out, succeed or fail during the upcoming course is also highly worth consideration.
At first, the program is meant to loop through data provided by Stanford University Center and archived in a form of SQL tables to extract the data, explore and analyze it to avoid the misapprehension of parameters and finally, extract the most suitable statements from the table. Then, the extracted and formatted data has been provided to five different ML algorithms to be evaluated. When the results of the evaluation have been compared between themselves, the launchers of the program are able to see similarities between behaviors of learners who dropped out from the MOOCs. In Figure 5–2 can be seen the entire lifecycle of the program.
This system achieves incredible accuracy of 95.8 per cent  by using the association technique together with Python programming language, simultaneously automating the work for many other human researchers.
ML is a concept that can be implemented via various approaches. All of them require confident handling of at least one coding language, as well as to be proficient in certain mathematical topics and themes, but I think one of the most fundamental requirements is broad mindedness and sincere curiosity in this topic. For this publication, I decided to come up with illustrating possibilities provided by the functionality of Python programming language as a method of ML implementation, as well as passing in review the basic concepts of ML and its basic workflow. There are many advantages of opting to incorporate Python into ML applications, such as relative simplicity and low entry threshold, effectiveness and noticeable popularity among the coder community, so upcoming issues during the development process would be less complex to resolve.
- Gopinath Springer, An Introduction to Machine Learning, Rebala, International Publishing 2019
- Puneet Mathur, Machine Learning Applications Using Python: Cases Studies from Healthcare, Retail, and Finance, 2018
- Youssef Mourdi, Mohamed Sadgal, Hamada El Kabtane, Wafaa Berrada Fathi, A Machine Learning -based methodology to predict learners’ dropout, success or failure in MOOCs, 2019