當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

02_行销（Marketing）里用逻辑回归来找寻顾客参与度后面的原因

發布時間：2023/12/20 编程问答 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 02_行销（Marketing）里用逻辑回归来找寻顾客参与度后面的原因小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

行銷（Marketing）里用邏輯回歸來找尋顧客參與度后面的原因

- - Load packages
  - Generate engage category
  - Engagement Rate
  - Engage By Renew Offer Type
  - Engage By Sales Channel
  - Total Claim Amount Distributions
  - Income Distributions
  - Regression using different features
  - All together in logistic regression

進行市場營銷活動時，查看和分析的重要指標之一是客戶參與營銷活動。例如，在電子郵件營銷中，可以通過客戶打開或忽略了多少營銷電子郵件來衡量客戶參與度。客戶參與度也可以通過單個客戶的網站訪問量來衡量。成功的市場營銷活動將吸引客戶大量參與，而無效的市場營銷活動不僅會降低客戶的參與度，還會對業務產生負面影響。客戶可能會將來自你公司的電子郵件標記為垃圾郵件，或者取消訂閱您的郵件列表。為了理解什么會影響客戶參與度，在本章中，我們將討論如何使用解釋性分析（更具體地說，是回歸分析）。我們將簡要介紹解釋性分析的定義，什么是回歸分析以及如何使用邏輯回歸模型進行解釋性分析。然后，我們將介紹如何使用statsmodels包在Python中構建和解釋回歸分析結果。在這篇文章里我仍會用一個Kaggle的數據集來演示。數據來源于 WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv。

Logistic回歸是一種回歸分析，當輸出變量為binary時（對于陽性結果為一個，對于陰性結果為零），將使用回歸分析。像任何其他線性回歸模型一樣，邏輯回歸模型從特征變量的線性組合估計輸出。唯一的區別是模型估計的值。與其他線性回歸模型不同，邏輯回歸模型估計事件的對數幾率，換句話說，估計正事件和負事件概率之間的對數比

左邊的比率是成功的幾率，它表示成功的概率與失敗的概率之間的比率。 Logistic回歸模型輸出只是logit的倒數，范圍從零到一。在本章中，我們將使用回歸分析來了解推動客戶參與度的因素，而輸出變量將是客戶是否響應了營銷電話。因此，邏輯回歸非常適合這種情況，因為輸出是一個可以采用兩個值的二變量：已響應與未響應。下面我們用Kaggle的數據做一個邏輯回歸來看怎么做統計分析。

# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to loadimport numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)# Input data files are available in the read-only "../input/" directory # For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directoryimport os for dirname, _, filenames in os.walk('/kaggle/input'):for filename in filenames:print(os.path.join(dirname, filename))# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" # You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session /kaggle/input/ibm-watson-marketing-customer-value-data/WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv

Load packages

import matplotlib.pyplot as plt import pandas as pd import statsmodels.formula.api as sm import statsmodels.api as sm %matplotlib inline df = pd.read_csv('../input/ibm-watson-marketing-customer-value-data/WA_Fn-UseC_-Marketing-Customer-Value-Analysis.csv') df.head(3) CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGenderIncome...Months Since Policy InceptionNumber of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle Size012

BU79786

Washington

2763.519279

Basic

Bachelor

2/24/11

Employed

56274

...

Corporate Auto

Corporate L3

Offer1

Agent

384.811147

Two-Door Car

Medsize

QZ44356

Arizona

6979.535903

Extended

Bachelor

1/31/11

Unemployed

...

Personal Auto

Personal L3

Offer3

Agent

1131.464935

Four-Door Car

Medsize

AI49188

Nevada

12887.431650

Premium

Bachelor

2/19/11

Employed

48767

...

Personal Auto

Personal L3

Offer1

Agent

566.472247

Two-Door Car

Medsize

3 rows ?? 24 columns

Generate engage category

df['Engaged'] = df['Response'].apply(lambda x: 0 if x == 'No' else 1) df.head(3) CustomerStateCustomer Lifetime ValueResponseCoverageEducationEffective To DateEmploymentStatusGenderIncome...Number of Open ComplaintsNumber of PoliciesPolicy TypePolicyRenew Offer TypeSales ChannelTotal Claim AmountVehicle ClassVehicle SizeEngaged012

BU79786

Washington

2763.519279

Basic

Bachelor

2/24/11

Employed

56274

...

Corporate Auto

Corporate L3

Offer1

Agent

384.811147

Two-Door Car

Medsize

QZ44356

Arizona

6979.535903

Extended

Bachelor

1/31/11

Unemployed

...

Personal Auto

Personal L3

Offer3

Agent

1131.464935

Four-Door Car

Medsize

AI49188

Nevada

12887.431650

Premium

Bachelor

2/19/11

Employed

48767

...

Personal Auto

Personal L3

Offer1

Agent

566.472247

Two-Door Car

Medsize

3 rows ?? 25 columns

Engagement Rate

engagement_rate_df = pd.DataFrame(df.groupby('Engaged').count()['Response'] / df.shape[0] * 100.0 ) engagement_rate_df.T Engaged01Response

85.679877	14.320123

85.679877

14.320123

Engage By Renew Offer Type

engagement_by_offer_type_df = pd.pivot_table(df, values='Response', index='Renew Offer Type', columns='Engaged', aggfunc=len ).fillna(0.0)engagement_by_offer_type_df.columns = ['Not Engaged', 'Engaged'] engagement_by_offer_type_df Not EngagedEngagedRenew Offer TypeOffer1Offer2Offer3Offer4

3158.0	594.0
2242.0	684.0
1402.0	30.0
1024.0	0.0

engagement_by_offer_type_df.plot(kind='pie',figsize=(15, 7),startangle=90,subplots=True,autopct=lambda x: '%0.1f%%' % x )plt.show()

Engage By Sales Channel

engagement_by_sales_channel_df = pd.pivot_table(df, values='Response', index='Sales Channel', columns='Engaged', aggfunc=len ).fillna(0.0)engagement_by_sales_channel_df.columns = ['Not Engaged', 'Engaged'] engagement_by_sales_channel_df Not EngagedEngagedSales ChannelAgentBranchCall CenterWeb

2811	666
2273	294
1573	192
1169	156

engagement_by_sales_channel_df.plot(kind='pie',figsize=(15, 7),startangle=90,subplots=True,autopct=lambda x: '%0.1f%%' % x )plt.show()

Total Claim Amount Distributions

ax = df[['Engaged', 'Total Claim Amount']].boxplot(by='Engaged',showfliers=False, ## this will help remove outlierfigsize=(7,5) )ax.set_xlabel('Engaged') ax.set_ylabel('Total Claim Amount') ax.set_title('Total Claim Amount Distributions by Enagements')plt.suptitle("") plt.show()

If we don’t want to remove outliers

ax = df[['Engaged', 'Total Claim Amount']].boxplot(by='Engaged',showfliers=True,figsize=(7,5) )ax.set_xlabel('Engaged') ax.set_ylabel('Total Claim Amount') ax.set_title('Total Claim Amount Distributions by Enagements')plt.suptitle("") plt.show()

Income Distributions

ax = df[['Engaged', 'Income']].boxplot(by='Engaged',showfliers=True,figsize=(7,5) )ax.set_xlabel('Engaged') ax.set_xlabel('Income') ax.set_title('Income Distributions by Enagements')plt.suptitle("") plt.show()

df.groupby('Engaged').describe()['Income'].T Engaged01countmeanstdmin25%50%75%max

7826.000000	1308.000000
37509.190008	38544.027523
30752.259099	28043.637944
0.000000	0.000000
0.000000	18495.000000
34091.000000	32234.000000
62454.250000	60880.000000
99981.000000	99845.000000

Regression using different features

continuous_vars = ['Customer Lifetime Value', 'Income', 'Monthly Premium Auto', 'Months Since Last Claim', 'Months Since Policy Inception', 'Number of Open Complaints', 'Number of Policies', 'Total Claim Amount' ] df['Engaged'] 0 0 1 0 2 0 3 0 4 0.. 9129 0 9130 1 9131 0 9132 0 9133 0 Name: Engaged, Length: 9134, dtype: int64 logit = sm.Logit(df['Engaged'], df[continuous_vars] ) logit_fit = logit.fit() Optimization terminated successfully.Current function value: 0.421189Iterations 6 logit_fit.summary() Logit Regression ResultsDep. Variable: No. Observations: Model: Df Residuals: Method: Df Model: Date: Pseudo R-squ.: Time: Log-Likelihood: converged: LL-Null: Covariance Type: LLR p-value:

Engaged	9134
Logit	9126
MLE	7
Sun, 10 May 2020	-0.02546
16:48:28	-3847.1
True	-3751.6
nonrobust	1.000

coefstd errzP>|z|[0.0250.975]Customer Lifetime ValueIncomeMonthly Premium AutoMonths Since Last ClaimMonths Since Policy InceptionNumber of Open ComplaintsNumber of PoliciesTotal Claim Amount


-6.741e-06	5.04e-06	-1.337	0.181	-1.66e-05	3.14e-06
-2.857e-06	1.03e-06	-2.766	0.006	-4.88e-06	-8.33e-07
-0.0084	0.001	-6.889	0.000	-0.011	-0.006
-0.0202	0.003	-7.238	0.000	-0.026	-0.015
-0.0060	0.001	-6.148	0.000	-0.008	-0.004
-0.0829	0.034	-2.424	0.015	-0.150	-0.016
-0.0810	0.013	-6.356	0.000	-0.106	-0.056
0.0001	0.000	0.711	0.477	-0.000	0.000

Looking at this model output, we can see that Income, Monthly Premium Auto, Months Since Last Claim, Months Since Policy Inception, and Number of Policies variables have significant relationships with the output variable, Engaged. For example, Number of Policies variable is significant and is negatively correlated with Engaged. This suggests that the more policies that the customers have, the less likely they are to respond to marketing calls. As another example, the Months Since Last Claim variable is significant and is negatively correlated with the output variable, Engaged. This means that the longer it has been since the last claim, the less likely that the customer is going to respond to marketing calls.

Next we add categorical variables. There are several ways to deal with categorical variables

factorize

labels, levels = df['Education'].factorize() labels array([0, 0, 0, ..., 0, 1, 1]) levels Index(['Bachelor', 'College', 'Master', 'High School or Below', 'Doctor'], dtype='object')

pandas’ Categorical variable series

categories = pd.Categorical(df['Education'], categories=['High School or Below', 'Bachelor', 'College', 'Master', 'Doctor'] ) categories.categories Index(['High School or Below', 'Bachelor', 'College', 'Master', 'Doctor'], dtype='object') categories.codes array([1, 1, 1, ..., 1, 2, 2], dtype=int8)

Dummy variables

pd.get_dummies(df['Education']).head(10) BachelorCollegeDoctorHigh School or BelowMaster0123456789

1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
1	0	0
0	1	0
0	0	1
1	0	0
0	1	0

gender_values, gender_labels = df['Gender'].factorize() df['GenderFactorized'] = gender_values categories = pd.Categorical(df['Education'], categories=['High School or Below', 'Bachelor', 'College', 'Master', 'Doctor'] ) df['EducationFactorized'] = categories.codes logit = sm.Logit(df['Engaged'], df[['GenderFactorized','EducationFactorized']] ) logit_fit = logit.fit() Optimization terminated successfully.Current function value: 0.493068Iterations 6 logit_fit.summary() Logit Regression ResultsDep. Variable: No. Observations: Model: Df Residuals: Method: Df Model: Date: Pseudo R-squ.: Time: Log-Likelihood: converged: LL-Null: Covariance Type: LLR p-value:

Engaged	9134
Logit	9132
MLE	1
Sun, 10 May 2020	-0.2005
16:54:00	-4503.7
True	-3751.6
nonrobust	1.000

coefstd errzP>|z|[0.0250.975]GenderFactorizedEducationFactorized


-1.1266	0.047	-24.116	0.000	-1.218	-1.035
-0.6256	0.021	-29.900	0.000	-0.667	-0.585

All together in logistic regression

logit = sm.Logit(df['Engaged'], df[['Customer Lifetime Value','Income','Monthly Premium Auto','Months Since Last Claim','Months Since Policy Inception','Number of Open Complaints','Number of Policies','Total Claim Amount','GenderFactorized','EducationFactorized']] ) logit_fit = logit.fit() logit_fit.summary() Optimization terminated successfully.Current function value: 0.420810Iterations 6 Logit Regression ResultsDep. Variable: No. Observations: Model: Df Residuals: Method: Df Model: Date: Pseudo R-squ.: Time: Log-Likelihood: converged: LL-Null: Covariance Type: LLR p-value:

Engaged	9134
Logit	9124
MLE	9
Sun, 10 May 2020	-0.02454
16:54:33	-3843.7
True	-3751.6
nonrobust	1.000

coefstd errzP>|z|[0.0250.975]Customer Lifetime ValueIncomeMonthly Premium AutoMonths Since Last ClaimMonths Since Policy InceptionNumber of Open ComplaintsNumber of PoliciesTotal Claim AmountGenderFactorizedEducationFactorized


-6.909e-06	5.03e-06	-1.373	0.170	-1.68e-05	2.96e-06
-2.59e-06	1.04e-06	-2.494	0.013	-4.63e-06	-5.55e-07
-0.0081	0.001	-6.526	0.000	-0.011	-0.006
-0.0194	0.003	-6.858	0.000	-0.025	-0.014
-0.0057	0.001	-5.827	0.000	-0.008	-0.004
-0.0813	0.034	-2.376	0.017	-0.148	-0.014
-0.0781	0.013	-6.114	0.000	-0.103	-0.053
0.0001	0.000	0.943	0.346	-0.000	0.000
-0.1500	0.058	-2.592	0.010	-0.263	-0.037
-0.0070	0.027	-0.264	0.792	-0.059	0.045

Let’s take a closer look at this output. The Income, Monthly Premium Auto, Months Since Last Claim,Months Since Policy Inception, Number of Open Complaints, Number of Policies, and GenderFactorized variable are significant at a 0.05 significance level, and all of them have negative relationships with the output variable, Engaged. Hence, the higher the income is, the less likely that the customer will be engaged with marketing calls. Similarly, the more policies that the customer has, the less likely that he or she will be engaged with marketing calls.

Lastly, male customers are less likely to engage with marketing calls than female customers, which we can see from looking at the coefficient of GenderFactorized. From looking at this regression analysis output, we can easily see the relationships between the input and output variables, and we can understand which attributes of customers are positively or negatively related to customer engagement with marketing calls

總結

以上是生活随笔為你收集整理的02_行销（Marketing）里用逻辑回归来找寻顾客参与度后面的原因的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： BZOJ1876 [SDOI2009]S
下一篇： Vim 命令