MARKET BASKET ANALYSIS USINHHG ASSOCIATION RULE MINING WITH APRIORI ECLAT AND FPGROWTH ALGORITHM

Original article was published by Karan Choudhary on Deep Learning on Medium


MARKET BASKET ANALYSIS USINHHG ASSOCIATION RULE MINING WITH APRIORI ECLAT AND FPGROWTH ALGORITHM

ABOUT DATASET

The dataset consists of various item which are brought by user in various transactions . The goal of the competition is to predict which products will be in a user’s next order. The dataset is anonymized and contains a sample of over7501 rows with 20 plus unique item in the dataset transcations . For each transactions we have, with the sequence of products purchased in each order. All other information we want to conclude for better selling of products with another products.

Library

The libraries used here are:-

1)NumPy

2)Pandas

3)Matplotlib

4)Apyori

5) Mlextend

6) pyfpgrowth

Association rule mining

Association rule mining finds interesting associations and relationships among large sets of data items. This rule shows how frequently a itemset occurs in a transaction. A typical example is Market Based Analysis.

Market Based Analysis is one of the key techniques used by large relations to show associations between items.It allows retailers to identify relationships between the items that people buy together frequently.

Given a set of transactions, we can find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction.

In this dataset we will be using various algorithm to predict the relation between the item sets and Make a strong confidence so we will put these items along them to have better sell of products in the market for better profit.

ALORITHM FOR ASSOCIATION RULE MINING

We will be implementing 3 algorithm for prediction

1. Apriori

2. ECLAT

3. FP-growth

For each algorithm we will using our data with different approach according to the algorithm need and analysis result according to the lift score and various value for better reach of market basket analysis to achieve profit.

Data Pre-processing

We will be importing libraries and then through pandas importing the dataset for using and finding relation between them.

Then we will be doing the analysis of the dataset through pandas and numpy for making the fruitfull information from the dataset

which consists of info , describe and null functions for better reach

of data.

Apriori

Apriori is an algorithm used for Association Rule Mining. It searches for a series of frequent sets of items in the datasets. It builds on associations and correlations between the itemsets. It is the algorithm behind “You may also like” where you commonly saw in recommendation platforms.

General Process of the Apriori algorithm
The entire algorithm can be divided into two steps:

Step 1: Apply minimum support to find all the frequent sets with k items in a database.

Step 2: Use the self-join rule to find the frequent sets with k+1 items with the help of frequent k-itemsets.

We have different hyperparameters for the association rule mining

The hyperparameters choosen on this training are:

min_support = items bought in rows to the total number of transcations.

min_confidence: at least 20%, min_lift = minimum of 2 (less than that is too low) and max= 5 depends on the need rather.

min_length: we want at least 2 items or according to needs to be associated. No point in having a single item in the result

lift : confidence/support

conviction : tell s about the how the dataset is wrongly assigned to

Hyperparameters

We have taken minimum support = 3 , minimum confidence=70%

and minimum length = 3 to have better output and we found the results satisfactory.

Then we have result generated for the hyperparameters according to them .

Then we come up with the result of hyper parameters in the dataset with the confidence in it.

Now we will go for the lift and association rules among the itemsets and then print lift and association rues and then get result which has the highest we give as optimal item set for market basket analysis.

Then we will make the data frame for the lift and association rule mining and ascending the value of it for having batter understanding of the lift with X -> Y.

The association rule generated are as follows according the hyperparamters.

RESULTS for better understanding