Skip to content

Bhushan0097/03.CAPSTONE.ML.Classification--Email-Campaign-Effectiveness-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Email Campaign Effectiveness Prediction

GitHib Logo

Overview

This repository contains the code and resources for a supervised machine learning project aimed at predicting whether a deliverd email will be read , acknowledged or ignored. The dataset used is data_email_campaign.csv.

Table of Contents

Introduction

Bike share demand prediction is a critical aspect of urban transportation planning. This project focuses on using machine learning techniques to predict bike rental demand in Seoul, aiding in efficient resource allocation and city planning.

Dataset

The dataset data_email_campaign.csv is included in the 📁 data directory. It contains information about bike rentals, including weather conditions, temperature, humidity, and other relevant features.

The data_email_campaign.csv file contains the following columns:

  • Email_Id Email id of customer
  • Email_Type Email type contains 2 categories : 1 and 2. We can assume that the types are like promotional email or sales email
  • Subject_Hotness_Score It is the email’s subject’s score on the basis of how good and effective the content is
  • Email_Source_Type It represents the source of the email like sales, marketing or product type email
  • Email_Campaign_Type The campaign type of the email
  • Customer_Location Categorical data which explains the different demographic location of the customers
  • Total_Past_Communications This columns contains the total previous mails from the same source
  • Time_Email_sent_Category The time of the day when the email was sent
  • Word_Count Total count of word in each email
  • Total_links Total number of links in the email
  • Total_Images Total Number of images in the email
  • Email_Status Our target variable which contains whether the mail

Dependencies

The project is developed using Python and relies on the following libraries:

  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • Scikit-learn

Documentation

The project involves the following steps:

  1. Data Cleaning and Preparation
  2. Exploratory Data Analysis
  3. Visualization and Insights
  4. Hypothesis Testing
  5. Feature Enginerring & Data Pre-processing
  6. ML Model Training , Implementation and Evaluation

Data Cleaning and Preparation

The first step in this project involves cleaning and preparing the data. This includes checking for missing data, removing duplicates, and converting data types. Some of the specific tasks involved in this step include:

  • Handling missing data
  • Removing duplicates
  • Converting data types
  • TimeSeries Analysis

Exploratory Data Analysis

The next step in the project is to conduct exploratory data analysis. This involves examining the data to understand its distribution, central tendencies, and correlations between variables.

Hypothesis Testing

Hypothesis testing , a statistical method used to make inferences about a population based on a sample of data. To perform hypothesis testing on the 'data_email_campaign.csv' dataset, we first start with a null hypothesis (H0) and an alternative hypothesis (H1), then use statistical tests to determine if there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

Below is general step-by-step guide on to perform hypothesis testing on a dataset like SeoulBikeData.csv:

  1. Define the Hypotheses
  2. Choose a Significance Level (α)
  3. Select the Test
  4. Perform the Test
  5. Analyze the Results
  6. Draw Conclusions

Feature Enginerring & Data Pre-processing

  1. Handling Missing Values
  2. Handling Outliers
  3. Label Encoding
  4. Textual Data Preprocessing
  5. Feature Manipulation & Selection
  6. Data Transformation
  7. Data Scaling
  8. Dimesionality Reduction
  9. Data Splitting

ML Model Training and Evaluation

The dependent variable is Rented Bike Count is a contionus variable. Hence to Regression ML algorithms are used to train the model to predict the depedent variable.
Following are the ML algorithms on which the model is trained

  1. Logistic Regression
  2. Random Forest Classification
  3. XGBoost Classification
Test Train
Sr No. Model Name Accuracy Recall Precision F1score AUC Accuracy Recall Precision F1score AUC
0 Logistic Regression 0.542400 0.542400 0.527200 0.517300 0.729900 0.583100 0.583100 0.608800 0.583700 0.766600
1 Logistic Regression + GridSearchCV 0.542000 0.542000 0.526900 0.516300 0.729700 0.582800 0.582800 0.608600 0.583100 0.766500
2 Random Forest 0.999700 0.999700 0.999700 0.999700 1.000000 0.808700 0.808700 0.808400 0.808300 0.911100
3 Random Forest + GridSearchCV 0.999700 0.999700 0.999700 0.999700 1.000000 0.809200 0.809200 0.808700 0.808700 0.911900
4 XGboost 0.808700 0.808700 0.809500 0.802600 0.935500 0.776600 0.776600 0.765500 0.765800 0.895100
5 XGboost + GridSearchCV 0.999300 0.999300 0.999300 0.999300 1.000000 0.824700 0.824700 0.821200 0.822300 0.914100

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published