Source: Deep Learning on Medium
1. Stock Analysis
In this projects, the data are fetched using yahoo api key. The data contains information about stocks of some top MNCs. The aim of the projects is to provide detailed analysis of stocks along with visualization
This clearly shows how Tesla stocks open the markets with high prices
This shows the volume distribution of stocks of companies
After that we calculated the total trade of each company using opening price and volume and plot that
Then we calculate mean of opening prices for each 50 days and 200 days
We also calculate opening prices of each company each day and plotted it
Then we calculate the returns and cumulative returns
Project Link: https://github.com/harshbansal1999/Stock-Analysis
2. Credit Card Fraud Detection
For this project refer my story: https://medium.com/@bansalh944/credit-card-fraud-detection-c66d1399c0b7
3. Suicide Rate Prediction
We have provided the dataset, most probably from kaggle. The dataset contain suicides number in different regions along with the gdp, population count, sex, generation, age, year.
country 27820 non-null object
year 27820 non-null int64
sex 27820 non-null object
age 27820 non-null object
suicides_no 27820 non-null int64
population 27820 non-null int64
suicides/100k pop 27820 non-null float64
country-year 27820 non-null object
HDI for year 8364 non-null float64
gdp_for_year 27820 non-null object
gdp_per_capita 27820 non-null int64
generation 27820 non-null object
Data cleaning part involve removing some columns and using LabelEncoder on categorical columns like generation.
We calculated the correlation matrix of dataset to see how exploratory variable are related to response variable.
Then we split the dataset and trained various ML regression model and compare their accuracy.
And we came to know that random forest regressor is best suited algorithm for the purpose
Project Link: https://github.com/harshbansal1999/Suicide
4. Movie Recommendation System
This is a natural language processing project. The dataset used in this project is from kaggle and it contains various movie information. This information include the genre of movie, rated or not, id, title, release date, votes average, vote count. So our aim to recommend the movies based on genres provided.
We calculate the average vote of all movies and filter out those which does not qualify criteria of 90% percentile.
After that we calculated the score of each movie using vote count and average
This is the formula:
q_movies[‘score’]=(q_movies[‘vote_count’]/(q_movies[‘vote_count’]+c) * q_movies[‘vote_average’]) + (c/(c+q_movies[‘vote_count’]) * mean_rating)
Then we import TfidfVectorizer which convert the text into computer readable form. We will apply this on overview of each movies.
Then we calculate the cosine similarity of data to determine how data are similar to each other
Then we calculated a function which determine similar types of movies based on overview and we recommend those
For ex- We use Inception movies we get the results
Then we imply topic modelling by importing LDA. This will determine top topics on which most movies are based. To achieve that we have to perform Text Cleaning like removing stopwords, lemmatization.
5. Delhi Election Analysis
In this project we study the data of elections held in delhi in past 10 years. This include both vidhansabha and loksabha election. We have to fetech data from delhi government site but there are in form of pdf. So we have to extract the data tables from the pdf file. This data contains info about the candidates list who are participating in election. The data contain information about votes each candidate get from each pooling station.
We calculate the detailed statistics of each statistics
After getting data in required form we formed a dataframe determining the detailed analysis of each candidate
We performed this detailed analysis on each elections
Project Link: https://github.com/harshbansal1999/Delhi-Election