sim卡rfm_信用卡客户的RFM集群
sim卡rfm
Recency, Frequency, & Monetary (RFM) is one of the techniques that can be used for customer segmentation and is one of the conventional ways for segmentation that been used for a long time.
新近度,頻率和貨幣(RFM)是可用于客戶細分的技術之一,并且是長期使用的傳統細分方法之一。
Recency refers to when the customer did the most recent transaction using our product
近因是指當客戶使用我們的產品所做的最新交易
Frequency refers to how often customers do transactions using our product
頻率是指客戶使用我們的產品進行交易的 頻率
Monetary Value refers to how much does a customer spend in our product
貨幣價值是指客戶在我們的產品上花費了多少
RFM method is straightforward; we only have to transform our data (usually in the shape of transactional data) into data frame consists with three variables Recent Transaction, Transaction Frequency, and Transaction Amount (Monetary Values).
RFM方法很簡單; 我們只需要將我們的數據(通常以交易數據的形式)轉換為包含三個變量“ 最近交易”,“交易頻率”和“ 交易金額”(貨幣值)的數據框。
Transactional data itself is the data which records or captures every transaction that been done by customers. Typically, transactional data consists of transaction time, transaction location, how much amount our customers spend, which merchant the deal took place, and every detail that can be recorded at the moment transactions were made.
交易數據本身就是記錄或捕獲客戶完成的每筆交易的數據。 通常,交易數據包括交易時間,交易地點,我們的客戶花費多少,交易發生在哪個商人以及在進行交易時可以記錄的每個詳細信息。
Let see our transactional dataset that later will be used as our study case. Our dataset is a 2016 credit card transactional data from every customer. Transactions dataset consist of 24 features which recorded during every transaction our customers made. Even if we have many features on our dataset; we will not use all of them and only use small numbers of features which can be transformed into Recency, Frequency, and Monetary Values instead.
讓我們看看我們的交易數據集,以后將其用作我們的研究案例。 我們的數據集是來自每個客戶的2016年信用卡交易數據。 交易數據集包含24個特征,這些特征記錄在客戶進行的每次交易中。 即使我們的數據集上有很多功能, 我們將不會全部使用它們,而只會使用少量可以轉換為新近度,頻率和貨幣價值的功能 。
Link to dataset: https://www.kaggle.com/derykurniawan/credit-card-transaction
鏈接到數據集: https : //www.kaggle.com/derykurniawan/credit-card-transaction
Fig-1. Transactional Data Features圖。1。 交易數據功能If we return to our description of RFM features; we only have to keep customerId, transactionDate, and transactionAmount to create Recency, Frequency, and Transaction Amount features in the new data frame that grouped by customerId features.
如果我們返回對RFM功能的描述; 我們只需保留customerId,transactionDate和transactionAmount即可在按customerId功能分組的新數據框中創建新近度,頻率和交易金額功能。
For the Recency feature, we can subtract the current date with the maximum value of transactionDate (latest transaction). Since our dataset only contains 2016 transactional data, we will set 1st January 2017 as our current date.
對于新近度功能,我們可以用transactionDate(最新交易)的最大值減去當前日期。 由于我們的數據集僅包含2016年交易數據,因此我們將2017年1月1日設置為當前日期。
For the Frequency feature, we count how many transactions were made for every customer using n() function in R.
對于頻率功能,我們使用R中的n()函數計算每個客戶進行了多少筆交易。
for the Transaction Amount feature, we calculate the summation of transactionAmount for every customer.
對于交易金額功能,我們計算每個客戶的transactionAmount的總和。
Import and Transform Transactional Data to RFM Data導入事務數據并將其轉換為RFM數據 Fig-2. First six rows of RFM Dataset圖2。 RFM數據集的前六行Now we have three main feature for the RFM segmentation. It is similar to any other data analytical case, the first step that we have to do is exploring our dataset, and in this case, we will check every feature distribution using histogram plot using hist() function in R.
現在,我們為RFM細分提供了三個主要功能。 類似于任何其他數據分析案例,我們要做的第一步是探索數據集,在這種情況下,我們將使用R中的hist()函數使用直方圖來檢查每個特征分布。
Fig-3 Histogram of RFM Features Data圖3 RFM特征數據直方圖Our RFM dataset is so right-skewed, and it will be a catastrophic problem in K-Means clustering method since this method using the distance between points as one of its calculation to determine which cluster is the points fitted the most. Log transformation can be used to handle this kind of skewed data, and since we have 0 (zero values) in the data, we will use log(n + 1) to transform our data instead of the ordinary log transformation.
我們的RFM數據集偏右,這在K-Means聚類方法中將是一個災難性的問題,因為該方法使用點之間的距離作為其計算之一來確定哪個聚類是最適合的點。 可以使用對數轉換來處理這種偏斜的數據,并且由于數據中包含0 (零值),因此我們將使用log(n + 1)來轉換數據,而不是普通的對數轉換。
Log-Transformation and Histogram Plot對數變換和直方圖 Fig-4. Histogram of RFM Features Data — Logarithmic Scale圖4。 RFM功能數據的直方圖-對數刻度Logarithmic transformation provides better data for K-Means method to calculate and find the best cluster for our data by getting rid much of skewed data in our RFM dataset.
通過消除RFM數據集中的大量偏斜數據,對數變換為K-Means方法提供了更好的數據,以計算和找到最佳聚類。
K均值聚類 (K-Means Clustering)
K-Means clustering method by definition is a type of unsupervised learning which been used for defining the unlabeled data into groups based on its similarity.
根據定義,K-Means聚類方法是一種無監督學習,用于基于其相似性將未標記數據定義為組。
In R, K-Means clustering can be quickly done using kmeans() function. But, we have to find the number of clusters before creating the K-Means model. There are so many ways to find the best number of groups to assign, one of them is by using our business sense and assign the number directly, or we also can use mathematical sense to calculate the similarity between each point.
在R中,可以使用kmeans()函數快速完成K-Means聚類。 但是,在創建K-Means模型之前,我們必須先找到簇的數量。 有很多方法可以找到要分配的最佳組數,其中一種方法是使用我們的業務意識并直接分配數量,或者我們也可以使用數學方法來計算每個點之間的相似度。
On this example, we will use the within-cluster sum of squares that measures the variability of the observations within each cluster. We will iteratively calculate the within-cluster sum of squares for every cluster in range of 1 to 10 and choose the group with the lowest value and no further significant changes in value for its next cluster, or often we called it as the Elbow Method.
在此示例中,我們將使用集群內平方和來衡量每個集群內觀測值的變異性。 我們將迭代計算范圍在1到10之間的每個群集的群集內平方和,并選擇值最低且其下一個群集的值沒有進一步顯著變化的組,或者我們通常將其稱為Elbow方法 。
Elbow Method in RR中的肘部方法 Fig-5 Dataset Elbow Method Visualization (N = 4)圖5數據集肘方法可視化(N = 4)Using the elbow method, we will assign four groups as our number of clusters. Using kmeans() function in R we only need to put cluster number in centers parameter and assign the clustering results into our dataset.
使用彎頭法,我們將四個組分配為簇數。 在R中使用kmeans()函數,我們只需要將聚類數放在center參數中,并將聚類結果分配到我們的數據集中即可。
K-Means Model and Segment SummaryK均值模型和細分摘要 Fig-6. Dataset after Segment Addition圖6。 段添加后的數據集We now have assigned every Customer ID into their groups in the segment feature. For the next step, we will check the basic RFM profile from every segment by grouping the average value of RFM features based on its segment number.
現在,我們已經在細分功能中將每個客戶ID分配到了他們的組中。 下一步,我們將根據分段的編號將RFM功能的平均值分組,從而檢查每個分段的基本RFM配置文件。
Fig-7. RFM Summary per Segment圖7。 每個細分的RFM摘要So, we have four groups and let’s discuss the detail for every group:
因此,我們分為四個小組,讓我們討論每個小組的詳細信息:
Segment-1 (Silver): Middle-class customer with second-most considerable transactions frequency and spending amount.
第1段(白銀):交易頻率和支出金額第二高的中產客戶。
Segment-2 (Gold): Most valuable customers who have the most significant spending amount and the one who make transactions the most
第2部分(黃金):支出金額最高的最有價值的客戶,交易最多的客戶
Segment-3 (Bronze): Commoner customer with low transactions frequency and low spending amount. But, this segment has the largest number of the customer.
第3部分(銅牌):交易頻率低且支出金額低的普通客戶。 但是,該細分市場擁有最多的客戶。
Segment-4 (Inactive): Inactive/less-active customers whom latest transactions had done in more than a month ago. This segment has the lowest number of customer, transaction frequency, and transaction amount.
分類4(不活躍):不活躍/不活躍的客戶,其一個多月前進行了最新交易。 該細分市場的客戶數量,交易頻率和交易金額最低。
Now, we have four groups of customer with detailed RFM behaviour from each group. Usually this information can be used for arrange marketing strategy that well-targeted to the customers who share similar behaviour. Recency, Frequency, and Monetary Values segmentation is simple but useful for knowing your customer better and aiming an efficient and optimum marketing strategy.
現在,我們有四個客戶群,每個客戶群都有詳細的RFM行為。 通常,此信息可用于安排針對具有相似行為的客戶的目標明確的營銷策略。 新近度,頻率和貨幣價值 細分很簡單,但有助于更好地了解您的客戶并制定有效和最佳的營銷策略。
翻譯自: https://towardsdatascience.com/rfm-clustering-on-credit-card-customers-cdec560281c0
sim卡rfm
總結
以上是生活随笔為你收集整理的sim卡rfm_信用卡客户的RFM集群的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 旅行青蛙车票怎么获得(旅行的意义是什么)
- 下一篇: 水冷120和240的区别是什么(有没有性