In this project, we will be working to understand the results of an A/B test run by an e-commerce website. The test wil be run to compare the performance of the old version of website to the new version using bootstrapping for hypothesis testing.
We will start off by importing our required libraries.
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
#We are setting the seed to assure you get the same answers on quizzes as we set up
random.seed(42)
We will now, read in the csv file and print out few of it's lines.
df=pd.read_csv('ab_data.csv')
df.head()
df.describe()
We will use the shape function to know the row size of the database.
df.shape
The number of unique users in the dataset will be found out by checking the unique number of user_id's present in the dataset.
df.nunique()
We will find the proportion of converted users by making a new dataframe and then find out the number of converts.Then we will find the percentage of converts from the total number of unique users.
convert=df.query('converted == 1')
convert.shape
35237/290584
In this step we will find out the number of landing pages which are not new page and is assigned to the treatment group,then we will find the number of new pages allotted to the control group and add both of these to find the total number of mismatched pages.
treatment=df.query('landing_page !="new_page" and group== "treatment"')
treatment.shape
control=df.query('landing_page=="new_page" and group=="control" ')
control.shape
treatment.shape[0]+control.shape[0]
In the next step we will find out if there are number of missing values in any rows.
df.info()
The mis-matched rows will be dropped because these are uninterpretable and the rest of the rows will be added in a new dataframe.
delete=treatment.append(control).index
df2=df.drop(delete)
df2.head()
df2.shape
# Double Check all of the correct rows were removed
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]
The unique users in the newly dataframe is checked through the nunique function.
df2.nunique()
We observed that the number of number of unique userid's is one less than the total number of id's and hence one of these is a duplicate id.
We will find out which of the id is duplicate and print the details of that row.
duplicate = df2[df2.duplicated(['user_id'])]
duplicate.head()
The duplicated row will be dropped from the dataframe.
df2.drop(2893, inplace=True)
We will check the converted mean once again for the updated dataframe.
df2.converted.mean()
We will find the conversion rate for the old page.
converted_control = df2[df2['group'] == 'control']['converted'].mean()
print(converted_control)
We will also find the conversiopn rate for the new page.
converted_treatment= df2[df2['group'] == 'treatment']['converted'].mean()
print(converted_treatment)
In the end,the probability of getting the new page will be found out
new_page_probable = (df2['landing_page'] == 'new_page').mean()
print(new_page_probable)
Both the pages are well proportioned and balanced therefore no additional changes is needed to maintain the balance unevenness between the pages.
Observation: The following results state that the data is not sufficient for us to come to the conclusion that the treatment page leads to more conversions as the probability of conversion from treatment group is less than that of control group.
Conversion rate for $p_{new}$ and $p_{new}$ under the null hypothesis would be the same and hence can be calculated by:-
df2.converted.mean()
We will find $n_{new}$, which is the number of individuals in the treatment group
n_new = df2.query('landing_page == "new_page"')
n_new.shape
We will also find $n_{old}$, the number of individuals in the control group
n_old = df2.query('landing_page == "old_page"')
n_old.shape
The conversion rate for the new page will be found out.
convert_mean = df2.converted.mean()
print(convert_mean)
new_page_converted=np.random.choice([0,1],size=n_new.shape[0], p=[(1 - convert_mean), convert_mean])
Simulate the conversion rate for the old page as well.
old_page_converted = np.random.choice([0, 1], size=n_old.shape[0], p=[(1 - convert_mean), convert_mean])
The difference between the conversion rates will be found out.
new_page_converted.mean() - old_page_converted.mean()
We will run 10,000 simulations
p_diffs = []
for i in range(10000):
new_page_converted = np.random.choice([0, 1], size=n_new.shape[0], p=[(1 - convert_mean), convert_mean])
old_page_converted = np.random.choice([0, 1], size=n_old.shape[0], p=[(1 - convert_mean), convert_mean])
p_diffs.append(new_page_converted.mean() - old_page_converted.mean())
Plot the histogram of the p_diffs.
p_diffs = np.asarray(p_diffs)
plt.hist(p_diffs)
plt.title("Simulated Differences in Conversion Rates for Null Hypothesis \n", fontsize=14)
plt.xlabel("\n Difference in Probability", fontsize=12)
plt.axvline(converted_treatment - converted_control, color='red');
We will check which proportion of the p_diffs is actually greater than the actual difference.
obs_diff = converted_treatment - converted_control
(p_diffs > obs_diff).mean()
Results: The p-value calculated is 0.9010. This is very much greater than the typical $\alpha$ level of 0.05 in business studies. (An $\alpha$ level of 0.05 indicates that we have a 5% chance of committing a Type I error if the null is true.) As such, we would fail to reject the null and state that the data is insufficient and it can be said that that there is a difference between the two values.
To Calculate the number of rows for old and new.
import statsmodels.api as sm
convert_old = df2.query('group == "control" & converted == 1')['converted'].count()
convert_new = df2.query('group == "treatment" & converted == 1')['converted'].count()
we will now use stats.proportions_ztest to compute our test statistic and p-value.
sm.stats.proportions_ztest([convert_new, convert_old], [n_new.shape[0], n_old.shape[0]], alternative='larger')
We observe from the above approaches that the calculated values match with those found during the bootstrapped hypothesis testing.
The first step would be to create the dummy variables and add an intercept.
df2[['ab_page', 'old_page']] = pd.get_dummies(df2['landing_page'])
df2['intercept'] = 1
df2.head()
The next step would be to instantiate and fit the model.
log_mod = sm.Logit(df2['converted'], df2[['intercept', 'ab_page']])
result = log_mod.fit()
We will use statsmodels to instantiate your regression model on the columns we created and then fit the model using the columns to predict whether or not an individual converts.
# Workaround for known bug with .summary() with updated scipy
from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)
result.summary()
The z-score tells us about how many standard deviations our data is away from the mean.Here the z-score(-1.311) means that it is -1.311 standard deviations below from the mean of the data which is in the satisfatory range of the z-score.
The p-value (0.190) here remains above an $\alpha$ level of 0.05 but is not the same because this is a two tailed test. We will still not accept the null in this situation.
We will see this with the bootstrapped information:
# Calculate area of lower tail
p_lower = (p_diffs < obs_diff).mean()
# Calculate area of upper tail
upper = p_diffs.mean() - obs_diff
p_upper = (p_diffs > upper).mean()
# Calculate total tail area
p_lower + p_upper
We will now add an additional variable to the model to check if that is statistically significant to our regression analysis.Adding more variables to the regression analysis gives a new term for the model to fit and a new co-efficient that it can vary to force a better fit.
We should keep in mind not to add too many additional variables to our regression model as this will more and more likely overfit our model which is not deseriable.
More information regarding conversion rates is available for user country. The model is tested to see if there is a connection between the country of the user and the conversion rates between the old and new pages.
countries_df = pd.read_csv('countries.csv')
df_new = countries_df.set_index('user_id').join(df2.set_index('user_id'), how='inner')
df_new.head()
We will confirm the countries of the users.
df_new['country'].unique()
The next step would be to build the dummy variables.
df_new[['CA', 'UK', 'US']] = pd.get_dummies(df_new['country'])
df_new.head()
Instatiate and fit the model.
log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'CA', 'UK']])
result = log_mod.fit()
result.summary()
Results: Once again, the p-values for hte countries are well above a 0.05 $\alpha$ level. And so we fail to reject the null and come to the conclusion that on it's own, there is no considerable contribution from countries to differences in conversion rates for the following two pages.
Now we will check the following for an interaction. We will repeat the same steps as above.
df_new['CA_page'] = df_new['CA'] * df_new['ab_page']
df_new['UK_page'] = df_new['UK'] * df_new['ab_page']
df_new.head()
log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'ab_page', 'CA', 'UK', 'CA_page', 'UK_page']])
result = log_mod.fit()
result.summary()
Results: In this case we can see that neither of the variables have considerable p-values. Therefore, we will fail to reject the null and come to the conclusion that there is not sufficient evidence to come that there is an interaction between country and page received that will predict whether a user will convert or not.
In the bigger picture, based on the information from the data, we don't have adequate proof to state that the new page brings about a greater number of conversions than the old page.
from subprocess import call
call(['python', '-m', 'nbconvert', 'Analyze_ab_test_results_notebook.ipynb'])