 # Studying Suicide Rates from 1985 to 2016 : Prediction using Machine Learning

Source: Deep Learning on Medium # Studying Suicide Rates from 1985 to 2016 : Prediction using Machine Learning

The World Health Organization (WHO) has called on nations throughout the globe to make suicide prevention a “Global Imperative.”

# Introduction

Let us begin by briefly understanding the technology we are using. Books describe Machine Learning as “a subset of artificial intelligence which focuses mainly on machine learning from their experience and making predictions based on its experience.”

In layman terms, we find a way to enable computers or machines to make data-driven decisions rather than being explicitly programmed for carrying out a certain task. These programs or algorithms are designed in a way that they learn and improve over time when are exposed to new data.

# Dataset

In our problem, the data that should be feeded for the machine to decide and predict effectively has to be measure of variability in depressive symptoms along with other relevant factors such as younger age, mood disorders, childhood abuse, and personal and parental history of suicide attempts, etc.

## Columns in csv file containing overview from 1965–2016:

Our model follows Supervised Learning, which consists in learning the link between two datasets: the observed data `X` and an external variable `y` that we are trying to predict, usually called “target” or “labels”. Most often, `y` is a 1D array of length `n_samples`.

All supervised estimators in scikit-learn implement a `fit(X, y)` method to fit the model and a `predict(X)` method that, given unlabeled observations `X`, returns the predicted labels `y`.

While assigning values to X, we drop some columns which we do not require or which are less relevant to our model while predicting the output.

# Investigating Correlation

Correlation is a technique for investigating the relationship between two quantitative, continuous variables, for example, age, sex and number of suicides.

This involves investigating the connection between the scatterplot of bivariate data and the numerical value of the correlation coefficient.

The rest of the dataset is preprocessed as required by the respective models deployed.

# Splitting the Dataset

As we work with datasets, a machine learning algorithm works in two stages. We have split the data around 20%-80% between testing and training stages.

# Trying Linear Regression

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task. Linear regression performs the task to predict a dependent variable value (y) based on a given independent variable (x). So, this regression technique finds out a linear relationship between x (input) and y(output).