02_行销(Marketing)里用逻辑回归来找寻顾客参与度后面的原因
行銷(Marketing)里用邏輯回歸來找尋顧客參與度后面的原因
- Load packages
- Generate engage category
- Engagement Rate
- Engage By Renew Offer Type
- Engage By Sales Channel
- Total Claim Amount Distributions
- Income Distributions
- Regression using different features
- All together in logistic regression
進行市場營銷活動時,查看和分析的重要指標之一是客戶參與營銷活動。例如,在電子郵件營銷中,可以通過客戶打開或忽略了多少營銷電子郵件來衡量客戶參與度。客戶參與度也可以通過單個客戶的網站訪問量來衡量。成功的市場營銷活動將吸引客戶大量參與,而無效的市場營銷活動不僅會降低客戶的參與度,還會對業務產生負面影響。客戶可能會將來自你公司的電子郵件標記為垃圾郵件,或者取消訂閱您的郵件列表。為了理解什么會影響客戶參與度,在本章中,我們將討論如何使用解釋性分析(更具體地說,是回歸分析)。我們將簡要介紹解釋性分析的定義,什么是回歸分析以及如何使用邏輯回歸模型進行解釋性分析。然后,我們將介紹如何使用statsmodels包在Python中構建和解釋回歸分析結果。在這篇文章里我仍會用一個Kaggle的數據集來演示。數據來源于 WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv。
Logistic回歸是一種回歸分析,當輸出變量為binary時(對于陽性結果為一個,對于陰性結果為零),將使用回歸分析。像任何其他線性回歸模型一樣,邏輯回歸模型從特征變量的線性組合估計輸出。唯一的區別是模型估計的值。與其他線性回歸模型不同,邏輯回歸模型估計事件的對數幾率,換句話說,估計正事件和負事件概率之間的對數比
左邊的比率是成功的幾率,它表示成功的概率與失敗的概率之間的比率。 Logistic回歸模型輸出只是logit的倒數,范圍從零到一。在本章中,我們將使用回歸分析來了解推動客戶參與度的因素,而輸出變量將是客戶是否響應了營銷電話。因此,邏輯回歸非常適合這種情況,因為輸出是一個可以采用兩個值的二變量:已響應與未響應。下面我們用Kaggle的數據做一個邏輯回歸來看怎么做統計分析。
Load packages
import matplotlib.pyplot as plt import pandas as pd import statsmodels.formula.api as sm import statsmodels.api as sm %matplotlib inline df = pd.read_csv('../input/ibm-watson-marketing-customer-value-data/WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv') df.head(3)| BU79786 | Washington | 2763.519279 | No | Basic | Bachelor | 2/24/11 | Employed | F | 56274 | ... | 5 | 0 | 1 | Corporate Auto | Corporate L3 | Offer1 | Agent | 384.811147 | Two-Door Car | Medsize |
| QZ44356 | Arizona | 6979.535903 | No | Extended | Bachelor | 1/31/11 | Unemployed | F | 0 | ... | 42 | 0 | 8 | Personal Auto | Personal L3 | Offer3 | Agent | 1131.464935 | Four-Door Car | Medsize |
| AI49188 | Nevada | 12887.431650 | No | Premium | Bachelor | 2/19/11 | Employed | F | 48767 | ... | 38 | 0 | 2 | Personal Auto | Personal L3 | Offer1 | Agent | 566.472247 | Two-Door Car | Medsize |
3 rows ?? 24 columns
Generate engage category
df['Engaged'] = df['Response'].apply(lambda x: 0 if x == 'No' else 1) df.head(3)| BU79786 | Washington | 2763.519279 | No | Basic | Bachelor | 2/24/11 | Employed | F | 56274 | ... | 0 | 1 | Corporate Auto | Corporate L3 | Offer1 | Agent | 384.811147 | Two-Door Car | Medsize | 0 |
| QZ44356 | Arizona | 6979.535903 | No | Extended | Bachelor | 1/31/11 | Unemployed | F | 0 | ... | 0 | 8 | Personal Auto | Personal L3 | Offer3 | Agent | 1131.464935 | Four-Door Car | Medsize | 0 |
| AI49188 | Nevada | 12887.431650 | No | Premium | Bachelor | 2/19/11 | Employed | F | 48767 | ... | 0 | 2 | Personal Auto | Personal L3 | Offer1 | Agent | 566.472247 | Two-Door Car | Medsize | 0 |
3 rows ?? 25 columns
Engagement Rate
engagement_rate_df = pd.DataFrame(df.groupby('Engaged').count()['Response'] / df.shape[0] * 100.0 ) engagement_rate_df.T| 85.679877 | 14.320123 |
Engage By Renew Offer Type
engagement_by_offer_type_df = pd.pivot_table(df, values='Response', index='Renew Offer Type', columns='Engaged', aggfunc=len ).fillna(0.0)engagement_by_offer_type_df.columns = ['Not Engaged', 'Engaged'] engagement_by_offer_type_df| 3158.0 | 594.0 |
| 2242.0 | 684.0 |
| 1402.0 | 30.0 |
| 1024.0 | 0.0 |
Engage By Sales Channel
engagement_by_sales_channel_df = pd.pivot_table(df, values='Response', index='Sales Channel', columns='Engaged', aggfunc=len ).fillna(0.0)engagement_by_sales_channel_df.columns = ['Not Engaged', 'Engaged'] engagement_by_sales_channel_df| 2811 | 666 |
| 2273 | 294 |
| 1573 | 192 |
| 1169 | 156 |
Total Claim Amount Distributions
ax = df[['Engaged', 'Total Claim Amount']].boxplot(by='Engaged',showfliers=False, ## this will help remove outlierfigsize=(7,5) )ax.set_xlabel('Engaged') ax.set_ylabel('Total Claim Amount') ax.set_title('Total Claim Amount Distributions by Enagements')plt.suptitle("") plt.show()If we don’t want to remove outliers
ax = df[['Engaged', 'Total Claim Amount']].boxplot(by='Engaged',showfliers=True,figsize=(7,5) )ax.set_xlabel('Engaged') ax.set_ylabel('Total Claim Amount') ax.set_title('Total Claim Amount Distributions by Enagements')plt.suptitle("") plt.show()Income Distributions
ax = df[['Engaged', 'Income']].boxplot(by='Engaged',showfliers=True,figsize=(7,5) )ax.set_xlabel('Engaged') ax.set_xlabel('Income') ax.set_title('Income Distributions by Enagements')plt.suptitle("") plt.show() df.groupby('Engaged').describe()['Income'].T| 7826.000000 | 1308.000000 |
| 37509.190008 | 38544.027523 |
| 30752.259099 | 28043.637944 |
| 0.000000 | 0.000000 |
| 0.000000 | 18495.000000 |
| 34091.000000 | 32234.000000 |
| 62454.250000 | 60880.000000 |
| 99981.000000 | 99845.000000 |
Regression using different features
continuous_vars = ['Customer Lifetime Value', 'Income', 'Monthly Premium Auto', 'Months Since Last Claim', 'Months Since Policy Inception', 'Number of Open Complaints', 'Number of Policies', 'Total Claim Amount' ] df['Engaged'] 0 0 1 0 2 0 3 0 4 0.. 9129 0 9130 1 9131 0 9132 0 9133 0 Name: Engaged, Length: 9134, dtype: int64 logit = sm.Logit(df['Engaged'], df[continuous_vars] ) logit_fit = logit.fit() Optimization terminated successfully.Current function value: 0.421189Iterations 6 logit_fit.summary()| Engaged | No. Observations:9134 |
| Logit | Df Residuals:9126 |
| MLE | Df Model:7 |
| Sun, 10 May 2020 | Pseudo R-squ.:-0.02546 |
| 16:48:28 | Log-Likelihood:-3847.1 |
| True | LL-Null:-3751.6 |
| nonrobust | LLR p-value:1.000 |
| coefstd errzP>|z|[0.0250.975] | |||||
| -6.741e-06 | 5.04e-06 | -1.337 | 0.181 | -1.66e-05 | 3.14e-06 |
| -2.857e-06 | 1.03e-06 | -2.766 | 0.006 | -4.88e-06 | -8.33e-07 |
| -0.0084 | 0.001 | -6.889 | 0.000 | -0.011 | -0.006 |
| -0.0202 | 0.003 | -7.238 | 0.000 | -0.026 | -0.015 |
| -0.0060 | 0.001 | -6.148 | 0.000 | -0.008 | -0.004 |
| -0.0829 | 0.034 | -2.424 | 0.015 | -0.150 | -0.016 |
| -0.0810 | 0.013 | -6.356 | 0.000 | -0.106 | -0.056 |
| 0.0001 | 0.000 | 0.711 | 0.477 | -0.000 | 0.000 |
Looking at this model output, we can see that Income, Monthly Premium Auto, Months Since Last Claim, Months Since Policy Inception, and Number of Policies variables have significant relationships with the output variable, Engaged. For example, Number of Policies variable is significant and is negatively correlated with Engaged. This suggests that the more policies that the customers have, the less likely they are to respond to marketing calls. As another example, the Months Since Last Claim variable is significant and is negatively correlated with the output variable, Engaged. This means that the longer it has been since the last claim, the less likely that the customer is going to respond to marketing calls.
Next we add categorical variables. There are several ways to deal with categorical variables
factorize
labels, levels = df['Education'].factorize() labels array([0, 0, 0, ..., 0, 1, 1]) levels Index(['Bachelor', 'College', 'Master', 'High School or Below', 'Doctor'], dtype='object')pandas’ Categorical variable series
categories = pd.Categorical(df['Education'], categories=['High School or Below', 'Bachelor', 'College', 'Master', 'Doctor'] ) categories.categories Index(['High School or Below', 'Bachelor', 'College', 'Master', 'Doctor'], dtype='object') categories.codes array([1, 1, 1, ..., 1, 2, 2], dtype=int8)Dummy variables
pd.get_dummies(df['Education']).head(10)| 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 0 | 0 | 1 |
| 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 |
| Engaged | No. Observations:9134 |
| Logit | Df Residuals:9132 |
| MLE | Df Model:1 |
| Sun, 10 May 2020 | Pseudo R-squ.:-0.2005 |
| 16:54:00 | Log-Likelihood:-4503.7 |
| True | LL-Null:-3751.6 |
| nonrobust | LLR p-value:1.000 |
| coefstd errzP>|z|[0.0250.975] | |||||
| -1.1266 | 0.047 | -24.116 | 0.000 | -1.218 | -1.035 |
| -0.6256 | 0.021 | -29.900 | 0.000 | -0.667 | -0.585 |
All together in logistic regression
logit = sm.Logit(df['Engaged'], df[['Customer Lifetime Value','Income','Monthly Premium Auto','Months Since Last Claim','Months Since Policy Inception','Number of Open Complaints','Number of Policies','Total Claim Amount','GenderFactorized','EducationFactorized']] ) logit_fit = logit.fit() logit_fit.summary() Optimization terminated successfully.Current function value: 0.420810Iterations 6| Engaged | No. Observations:9134 |
| Logit | Df Residuals:9124 |
| MLE | Df Model:9 |
| Sun, 10 May 2020 | Pseudo R-squ.:-0.02454 |
| 16:54:33 | Log-Likelihood:-3843.7 |
| True | LL-Null:-3751.6 |
| nonrobust | LLR p-value:1.000 |
| coefstd errzP>|z|[0.0250.975] | |||||
| -6.909e-06 | 5.03e-06 | -1.373 | 0.170 | -1.68e-05 | 2.96e-06 |
| -2.59e-06 | 1.04e-06 | -2.494 | 0.013 | -4.63e-06 | -5.55e-07 |
| -0.0081 | 0.001 | -6.526 | 0.000 | -0.011 | -0.006 |
| -0.0194 | 0.003 | -6.858 | 0.000 | -0.025 | -0.014 |
| -0.0057 | 0.001 | -5.827 | 0.000 | -0.008 | -0.004 |
| -0.0813 | 0.034 | -2.376 | 0.017 | -0.148 | -0.014 |
| -0.0781 | 0.013 | -6.114 | 0.000 | -0.103 | -0.053 |
| 0.0001 | 0.000 | 0.943 | 0.346 | -0.000 | 0.000 |
| -0.1500 | 0.058 | -2.592 | 0.010 | -0.263 | -0.037 |
| -0.0070 | 0.027 | -0.264 | 0.792 | -0.059 | 0.045 |
Let’s take a closer look at this output. The Income, Monthly Premium Auto, Months Since Last Claim,Months Since Policy Inception, Number of Open Complaints, Number of Policies, and GenderFactorized variable are significant at a 0.05 significance level, and all of them have negative relationships with the output variable, Engaged. Hence, the higher the income is, the less likely that the customer will be engaged with marketing calls. Similarly, the more policies that the customer has, the less likely that he or she will be engaged with marketing calls.
Lastly, male customers are less likely to engage with marketing calls than female customers, which we can see from looking at the coefficient of GenderFactorized. From looking at this regression analysis output, we can easily see the relationships between the input and output variables, and we can understand which attributes of customers are positively or negatively related to customer engagement with marketing calls
總結
以上是生活随笔為你收集整理的02_行销(Marketing)里用逻辑回归来找寻顾客参与度后面的原因的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: BZOJ1876 [SDOI2009]S
- 下一篇: Vim 命令