State of Financial Inclusion in India: A Deep Learning & AI with Python way

Source: Deep Learning on Medium

State of Financial Inclusion in India: A Deep Learning & AI with Python way

Photo by Charl Folscher on Unsplash

Financial inclusion and related entities

The World Bank defines financial inclusion as individuals and businesses having access to useful and affordable financial products and services meeting their needs — transactions, payments, savings, credit, and insurance — delivered in a responsible and sustainable way. For years now, financial inclusion has been one of the key development objectives of various developing countries across the world. The Global Findex Survey provides a holistic view of data related to financial inclusion for most of these developing countries.

The Findex Survey further suggests that there is a significant improvement in unbanked people getting financial services, in the last several years, mainly due to various initiatives taken from national governments (such as the PMJDY scheme in India) to boost financial inclusion. However, according to the Bill & Melinda Gates Foundation (BMGF), about 1.7 billion people worldwide are still excluded from formal financial services such as savings, payments, insurance, and credit (Global Findex data).

With most of the developing countries still finding significant implementation-level challenges in developmental initiatives, it is very vital to have a national-level strategy framework for achieving financial inclusion objectives. The Alliance for Financial Inclusion (AFI) believes most of its member countries see national financial inclusion strategies essential to have a strategic framework to facilitate policy reforms in the financial inclusion space.

Financial inclusion in India

In the Indian context, the term ‘financial inclusion’ was used for the first time in April 2005 in the Annual Policy Statement presented by the then governor of the Reserve Bank of India. There on, the Government of India and the Reserve Bank of India have taken several initiatives to achieve the goals of financial inclusion. More recently, the PMJDY scheme launched by the Government of India had a groundbreaking effect on providing millions of people access to financial services.

Regression model, variables, data, and relevance of the variables

In statistical modelling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. In this case, four independent variables are used for the regression model.

According to the I-SIP toolkit from the Consultative Group to Assist the Poor (CGAP), the percentage of adults having an account at a formal financial institution is a measurable indicator for access to financial services. The dependent variable in our model will be the number of savings bank accounts of scheduled commercial banks in India. This data is obtained from the Reserve Bank’s database.

Similarly, according to a recent survey conducted by the European Microfinance Platform (e-MFP), Small and Medium Enterprise (SME) Finance is one of the five most important new areas of focus for financial inclusion. As financing businesses is an integral part of financial inclusion, the total amount of credit provided by banks is the first independent variable of our model. Bank loans (in crores) data is obtained from the Reserve Bank’s database.

Similar to access, usage of the financial services and products is an important factor in determining the degree of maturity of a country’s financial system. Payments and transactions are considered as forms of measures of usage. So, payments volume (in lakhs) is the next independent variable in our model. This data is obtained from the Reserve Bank’s database.

Access to financial services also depends on the number of formal financial institutions present in the vicinity. So, the number of commercial banks in India is the third independent variable of our model. This data is obtained from the Reserve Bank’s database.

Finally, without good financial literacy among people, it is very difficult to realize the goals of financial inclusion. Financial literacy depends on the general literacy of any country’s population. Literacy depends on the number of people finishing matriculation. This depends on the availability of the number of secondary schools in the country. The fourth and final independent variable of our model is the number of secondary schools in India. This data is obtained from the Government of India’s Ministry of Statistics and Programme Implementation.

Artificial neural networks and scikit-learn

Artificial Neural Networks (ANN) is an information processing model that is composed of a large number of highly interconnected processing elements(neurons) working simultaneously to solve a specific problem. ANNs work like how a human brain processes information.

Scikit-learn is a free software machine learning library for the Python programming language. This library contains a lot of efficient tools for machine learning and statistical modelling. The MLPRegressor class of scikit-learn’s artificial neural network framework implements a multi-layer perceptron (MLP) that trains using backpropagation with no activation function in the output layer. The output is a set of continuous values.

MLPRegressor trains iteratively. At each time-step, the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters. It can also have a regularization term added to the loss function that shrinks model parameters to prevent overfitting.

The model optimizes the squared-loss using LBFGS or stochastic gradient descent (SGD).

MLPRegressor also supports multi-output regression.

Python program and the neural network model

NumPy is the fundamental package for scientific computing with Python. It is a general-purpose array-processing package. NumPy also provides a large collection of high-level mathematical functions to operate on the arrays. Pandas is an open-source library that provides easy-to-use data structures and data analysis tools for the Python programming language. These two libraries are loaded for data processing before loading the neural network package.

We see the dependent variable is Y (number of savings bank accounts). The four independent variables are X1 (bank loans in crores), X2 (payments volume in lakhs), X3 (number of commercial banks in India), X4 (number of secondary schools in India). For the fourth independent variable, X4, data is available only till March-2016. For the years 2017, 2018, and 2019 data are calculated taking YoY growth of 1.03% (YoY growth for 2016).

There are various parameters such as the number of neurons in the hidden layers, activation function, learning rate, random state, etc. which have to be initialized for the MLPRegressor class to work. In this case, all the functionalities of the class are utilized by the object ‘reg.’

After that, both the dependent variable and the four independent variables are fit into the neural network model.

Once the variables are fit into the model, we can predict the corresponding values for the dependent variable using the independent variables’ values. After that, we can check the predicted values with the original values to see how good our model was. The parameter values can be tweaked to see whether the predictions are getting better or not.

Limitations and ways of enhancing the model

With only 11 data points in the dataset, it is difficult to split it into train and test sets. With a large dataset (at least 30 data points), we can split the dataset into training and testing sets. After that, we can check the R-squared value (adjusted R-squared value in case of more than one independent variables) of our model using scikit-learn’s r2-score package.

Other tools such as Minitab or IBM SPSS can help us find the R-squared value of our model; however, for models with a small number of samples like the one we currently have, it is very important to check whether the data points are fulfilling the normality conditions, before applying statistical modelling techniques.

An R-squared value near to ‘1’ means the model is a good model and capable of predictions.