Email Campaign Effectiveness Prediction

Overview

This repository contains the code and resources for a supervised machine learning project aimed at predicting whether a deliverd email will be read , acknowledged or ignored. The dataset used is data_email_campaign.csv.

Introduction

Bike share demand prediction is a critical aspect of urban transportation planning. This project focuses on using machine learning techniques to predict bike rental demand in Seoul, aiding in efficient resource allocation and city planning.

Dataset

The dataset data_email_campaign.csv is included in the 📁 data directory. It contains information about bike rentals, including weather conditions, temperature, humidity, and other relevant features.

The data_email_campaign.csv file contains the following columns:

Email_Id Email id of customer
Email_Type Email type contains 2 categories : 1 and 2. We can assume that the types are like promotional email or sales email
Subject_Hotness_Score It is the email’s subject’s score on the basis of how good and effective the content is
Email_Source_Type It represents the source of the email like sales, marketing or product type email
Email_Campaign_Type The campaign type of the email
Customer_Location Categorical data which explains the different demographic location of the customers
Total_Past_Communications This columns contains the total previous mails from the same source
Time_Email_sent_Category The time of the day when the email was sent
Word_Count Total count of word in each email
Total_links Total number of links in the email
Total_Images Total Number of images in the email
Email_Status Our target variable which contains whether the mail

Dependencies

The project is developed using Python and relies on the following libraries:

NumPy
Pandas
Matplotlib
Seaborn
Scikit-learn

Documentation

The project involves the following steps:

Data Cleaning and Preparation
Exploratory Data Analysis
Visualization and Insights
Hypothesis Testing
Feature Enginerring & Data Pre-processing
ML Model Training , Implementation and Evaluation

Data Cleaning and Preparation

The first step in this project involves cleaning and preparing the data. This includes checking for missing data, removing duplicates, and converting data types. Some of the specific tasks involved in this step include:

Handling missing data
Removing duplicates
Converting data types
TimeSeries Analysis

Exploratory Data Analysis

The next step in the project is to conduct exploratory data analysis. This involves examining the data to understand its distribution, central tendencies, and correlations between variables.

Hypothesis Testing

Hypothesis testing , a statistical method used to make inferences about a population based on a sample of data. To perform hypothesis testing on the 'data_email_campaign.csv' dataset, we first start with a null hypothesis (H0) and an alternative hypothesis (H1), then use statistical tests to determine if there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

Below is general step-by-step guide on to perform hypothesis testing on a dataset like SeoulBikeData.csv:

Define the Hypotheses
Choose a Significance Level (α)
Select the Test
Perform the Test
Analyze the Results
Draw Conclusions

Feature Enginerring & Data Pre-processing

Handling Missing Values
Handling Outliers
Label Encoding
Textual Data Preprocessing
Feature Manipulation & Selection
Data Transformation
Data Scaling
Dimesionality Reduction
Data Splitting

ML Model Training and Evaluation

The dependent variable is Rented Bike Count is a contionus variable. Hence to Regression ML algorithms are used to train the model to predict the depedent variable.
Following are the ML algorithms on which the model is trained

Logistic Regression
Random Forest Classification
XGBoost Classification

	Test					Train
Sr No.	Model Name	Accuracy	Recall	Precision	F1score	AUC	Accuracy	Recall	Precision	F1score	AUC
0	Logistic Regression	0.542400	0.542400	0.527200	0.517300	0.729900	0.583100	0.583100	0.608800	0.583700	0.766600
1	Logistic Regression + GridSearchCV	0.542000	0.542000	0.526900	0.516300	0.729700	0.582800	0.582800	0.608600	0.583100	0.766500
2	Random Forest	0.999700	0.999700	0.999700	0.999700	1.000000	0.808700	0.808700	0.808400	0.808300	0.911100
3	Random Forest + GridSearchCV	0.999700	0.999700	0.999700	0.999700	1.000000	0.809200	0.809200	0.808700	0.808700	0.911900
4	XGboost	0.808700	0.808700	0.809500	0.802600	0.935500	0.776600	0.776600	0.765500	0.765800	0.895100
5	XGboost + GridSearchCV	0.999300	0.999300	0.999300	0.999300	1.000000	0.824700	0.824700	0.821200	0.822300	0.914100

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Notebook		Notebook
data		data
Header.jpg		Header.jpg
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Email Campaign Effectiveness Prediction

Overview

Table of Contents

Introduction

Dataset

Dependencies

Documentation

Data Cleaning and Preparation

Exploratory Data Analysis

Hypothesis Testing

Feature Enginerring & Data Pre-processing

ML Model Training and Evaluation

About

Uh oh!

Releases

Packages

Languages

Bhushan0097/03.CAPSTONE.ML.Classification--Email-Campaign-Effectiveness-Prediction

Folders and files

Latest commit

History

Repository files navigation

Email Campaign Effectiveness Prediction

Overview

Table of Contents

Introduction

Dataset

Dependencies

Documentation

Data Cleaning and Preparation

Exploratory Data Analysis

Hypothesis Testing

Feature Enginerring & Data Pre-processing

ML Model Training and Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages