Apriori算法进行关联分析(1)
1. 使用Apriori算法來(lái)發(fā)現(xiàn)頻繁集
1.1 關(guān)聯(lián)分析
關(guān)聯(lián)分析:是一種在大規(guī)模數(shù)據(jù)集中尋找有趣關(guān)系的任務(wù)。這些關(guān)系可以有兩種形式:頻繁項(xiàng)集或者關(guān)聯(lián)規(guī)則。頻繁項(xiàng)集(frequent item sets)是經(jīng)常出現(xiàn)在一塊的物品的集合,關(guān)聯(lián)規(guī)則(association rules)暗示兩種物品之間可能存在很強(qiáng)的關(guān)系。
而有趣、頻繁、有趣的關(guān)系這些量化的工具就是支持度和可信度。
- 一個(gè)項(xiàng)集的支持度(support):被定義為數(shù)據(jù)集中包含該項(xiàng)集的記錄所占的比例。支持度是針對(duì)項(xiàng)集來(lái)說(shuō)的,因此可以定義一個(gè)最小支持度,而只保留滿足最小支持度的項(xiàng)集。
- 可信度或置信度(confidence):是針對(duì)一條關(guān)聯(lián)規(guī)則來(lái)定義的。這條規(guī)則的可信度被定義為“支持度({尿布,葡萄酒})/支持度({尿布})”。
1.2 Apriori原理
對(duì)于上圖的集合數(shù)目,會(huì)發(fā)現(xiàn)即使對(duì)于僅有4種物品的集合,也需要遍歷數(shù)據(jù)15次。而隨著物品數(shù)目的增加遍歷次數(shù)會(huì)急劇增長(zhǎng)。對(duì)于包含—物品的數(shù)據(jù)集共有2N?1種項(xiàng)集組合,需要很長(zhǎng)的時(shí)間才能完成運(yùn)算。
為了降低所需的計(jì)算時(shí)間,有了Apriori原理,Apriori原理是說(shuō)如果某個(gè)項(xiàng)集是頻繁的,那么它的所有子集也是頻繁的。如果反過(guò)來(lái)就是說(shuō)如果一個(gè)項(xiàng)集是非頻繁集,那么它的所有超集也是非頻繁的。
關(guān)聯(lián)分析的目標(biāo)包括兩項(xiàng):發(fā)現(xiàn)頻繁項(xiàng)集和發(fā)現(xiàn)關(guān)聯(lián)規(guī)則。首先需要找到頻繁項(xiàng)集,然后才能獲得關(guān)聯(lián)規(guī)則。
生成候選集:
對(duì)數(shù)據(jù)集中的每條交易記錄tran
對(duì)每個(gè)候選項(xiàng)集can:
檢查一下can是否是tran的子集:
如果是,則增加can的計(jì)數(shù)值
對(duì)每個(gè)候選項(xiàng)集:
如果其支持度不低于最小值,則保留該項(xiàng)集
返回所有頻繁項(xiàng)集列表。
完整的Apriori算法:
當(dāng)集合中項(xiàng)的個(gè)數(shù)大于0時(shí)
構(gòu)建一個(gè)k個(gè)項(xiàng)組成的候選項(xiàng)集的列表
檢查數(shù)據(jù)以確認(rèn)每個(gè)項(xiàng)集都是頻繁的
保留頻繁項(xiàng)集并構(gòu)建k+1項(xiàng)組成的候選項(xiàng)集的列表
代碼:
# -*- coding: utf-8 -*- """ Created on Thu Nov 09 13:16:21 2017 """ from numpy import *# 用于測(cè)試的簡(jiǎn)單數(shù)據(jù)集列表 def loadDataSet():return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]# 構(gòu)建第一個(gè)候選子集 def createC1(dataSet):C1 = [] # 用來(lái)存儲(chǔ)所有不重復(fù)的項(xiàng)值for transaction in dataSet: # 遍歷數(shù)據(jù)集的每一次交易記錄for item in transaction: # 遍歷每一記錄的每一項(xiàng)if not [item] in C1: C1.append([item]) # 添加只包含該物品項(xiàng)的一個(gè)列表print 'list_C1:',C1 # 打印該列表C1.sort() # 對(duì)列表進(jìn)行排序frozenset_C1=map(frozenset, C1) # 這里把C1的每個(gè)元素映射到frozenset()return frozenset_C1 # 返回frozenset的列表# 從Ck生成Lk def scanD(D, Ck, minSupport): # 三個(gè)參數(shù):數(shù)據(jù)集,候選項(xiàng)集列表,感興趣項(xiàng)集的最小支持度 ssCnt = {}for tid in D: # 遍歷數(shù)據(jù)集的每一交易記錄for can in Ck: # 遍歷每一候選項(xiàng)集if can.issubset(tid): # 判斷該集合是否是記錄的一部分if not ssCnt.has_key(can): # 如果沒有,該集合就把該集合增加到字典中ssCnt[can]=1 # 這里字典的鍵就是該項(xiàng)集else: ssCnt[can] += 1 # 如果已經(jīng)有了鍵,就計(jì)數(shù)加1numItems = float(len(D)) # 變?yōu)楦↑c(diǎn)數(shù)retList = [] # 該列表包含滿足最小支持度要求的集合supportData = {}for key in ssCnt: # 遍歷字典的每個(gè)元素support = ssCnt[key]/numItems # 計(jì)算支持度(分母是所有的交易記錄)if support >= minSupport: # 如果滿足最小支持度retList.insert(0,key) # 則將字典元素添加到retlist列表print 'retList:',retListsupportData[key] = support # 返回所有項(xiàng)集的支持度return retList, supportData # 返回頻繁項(xiàng)集的支持度# 創(chuàng)建候選項(xiàng)集CK def aprioriGen(Lk, k): # 參數(shù)分別是:頻繁項(xiàng)集列表LK,項(xiàng)集元素個(gè)數(shù)Kprint 'LK_1:',LkretList = []lenLk = len(Lk)for i in range(lenLk): # 遍歷LK中的每一個(gè)元素for j in range(i+1, lenLk): # 和后面的元素逐一比較L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2] # 取列表的兩個(gè)集合進(jìn)行比較L1.sort(); L2.sort() print 'L1 and L2:',L1,L2if L1==L2: # 如果兩個(gè)集合的前k-2個(gè)元素相同,那么就合并此兩個(gè)集合為大小k的集合print 'here !'retList.append(Lk[i] | Lk[j]) # 這里用python中集合的并操作“|”return retListdef apriori(dataSet, minSupport = 0.5):C1 = createC1(dataSet) # C1是候選項(xiàng)集列表print 'C1:',C1D = map(set, dataSet)print 'D:',DL1, supportData = scanD(D, C1, minSupport) # L1是得到的是滿足最小支持度的項(xiàng)集構(gòu)成的集合print 'L1:',L1print 'supportData:',supportDataL = [L1] # L1放入列表中print 'L:',Lk = 2while (len(L[k-2]) > 0): # 直到下一個(gè)大的項(xiàng)集為空Ck = aprioriGen(L[k-2], k) # ck是非重復(fù)的項(xiàng)集,即候選項(xiàng)集列表print 'CK:',CkLk, supK = scanD(D, Ck, minSupport) # 得到Lk,即得到滿足最小支持度的項(xiàng)集print 'Lk_2:',Lkprint 'supK:',supKsupportData.update(supK) # 這里是更新supportData字典,即加入supk,得到總的支持度字典print 'supportData_2:',supportDataL.append(Lk) # 得到總的滿足支持度要求的項(xiàng)集集合print 'L_2:',Lk += 1return L, supportData# 主函數(shù) dataSet=loadDataSet() L,suppData=apriori(dataSet) print 'L:',L運(yùn)行結(jié)果:
list_C1: [[1]] list_C1: [[1], [3]] list_C1: [[1], [3], [4]] list_C1: [[1], [3], [4], [2]] list_C1: [[1], [3], [4], [2], [5]] C1: [frozenset([1]), frozenset([2]), frozenset([3]), frozenset([4]), frozenset([5])] D: [set([1, 3, 4]), set([2, 3, 5]), set([1, 2, 3, 5]), set([2, 5])] retList: [frozenset([5])] retList: [frozenset([2]), frozenset([5])] retList: [frozenset([3]), frozenset([2]), frozenset([5])] retList: [frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])] L1: [frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])] supportData: {frozenset([5]): 0.75, frozenset([2]): 0.75, frozenset([3]): 0.75, frozenset([1]): 0.5} L: [[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])]] LK_1: [frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])] L1 and L2: [] [] here ! L1 and L2: [] [] here ! L1 and L2: [] [] here ! L1 and L2: [] [] here ! L1 and L2: [] [] here ! L1 and L2: [] [] here ! CK: [frozenset([1, 3]), frozenset([1, 2]), frozenset([1, 5]), frozenset([2, 3]), frozenset([3, 5]), frozenset([2, 5])] retList: [frozenset([3, 5])] retList: [frozenset([2, 3]), frozenset([3, 5])] retList: [frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] retList: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] Lk_2: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] supK: {frozenset([1, 3]): 0.5, frozenset([2, 3]): 0.5, frozenset([3, 5]): 0.5, frozenset([2, 5]): 0.75} supportData_2: {frozenset([5]): 0.75, frozenset([3]): 0.75, frozenset([3, 5]): 0.5, frozenset([2, 3]): 0.5, frozenset([2, 5]): 0.75, frozenset([1]): 0.5, frozenset([1, 3]): 0.5, frozenset([2]): 0.75} L_2: [[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])], [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])]] LK_1: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] L1 and L2: [1] [2] L1 and L2: [1] [2] L1 and L2: [1] [3] L1 and L2: [2] [2] here ! L1 and L2: [2] [3] L1 and L2: [2] [3] CK: [frozenset([2, 3, 5])] retList: [frozenset([2, 3, 5])] Lk_2: [frozenset([2, 3, 5])] supK: {frozenset([2, 3, 5]): 0.5} supportData_2: {frozenset([5]): 0.75, frozenset([3]): 0.75, frozenset([2, 3, 5]): 0.5, frozenset([3, 5]): 0.5, frozenset([2, 3]): 0.5, frozenset([2, 5]): 0.75, frozenset([1]): 0.5, frozenset([1, 3]): 0.5, frozenset([2]): 0.75} L_2: [[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])], [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])], [frozenset([2, 3, 5])]] LK_1: [frozenset([2, 3, 5])] CK: [] Lk_2: [] supK: {} supportData_2: {frozenset([5]): 0.75, frozenset([3]): 0.75, frozenset([2, 3, 5]): 0.5, frozenset([3, 5]): 0.5, frozenset([2, 3]): 0.5, frozenset([2, 5]): 0.75, frozenset([1]): 0.5, frozenset([1, 3]): 0.5, frozenset([2]): 0.75} L_2: [[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])], [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])], [frozenset([2, 3, 5])], []] L: [[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])], [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])], [frozenset([2, 3, 5])], []]從運(yùn)行結(jié)果可以看出具體的運(yùn)行過(guò)程,結(jié)合下圖理解。
注意:
- frozenset是指被“冰凍”的集合,就是說(shuō)它們是不可改變的,即用戶不能修改它們。這里必須要使用frozenset而不是set類型,因?yàn)橹蟊仨氁獙⑦@些集合作為字典鍵值使用,使用frozenset可以實(shí)現(xiàn)這一點(diǎn),而set卻做不到。
- Python不能創(chuàng)建只有一個(gè)整數(shù)的集合,因此這里實(shí)現(xiàn)必須使用列表。這就是我們使用一個(gè)由單物品列表組成的大列表的原因。最后,對(duì)大列表進(jìn)行排序并將其中的每個(gè)單元素列表映射到frozenset(),最后返回frozenset的列表。
- 上面的k-2有點(diǎn)讓人疑惑。接下來(lái)再進(jìn)一步討論細(xì)節(jié)。當(dāng)利用{0}、{1}、{2}構(gòu)建{0,1}、{0,2}、{1,2}時(shí),這實(shí)際上是將單個(gè)項(xiàng)組合到一塊。現(xiàn)在如果想利用{0,1}、{0,2}、{1,2}來(lái)創(chuàng)建三元素項(xiàng)集應(yīng)該怎么做?如果將每?jī)蓚€(gè)集合合并,就會(huì)得到{0,1,2}、{0,1,2}、{0,l,2}。也就是說(shuō),同樣的結(jié)果集合會(huì)重復(fù)3次。接下來(lái)需要掃描三元素項(xiàng)集列表來(lái)得到非重復(fù)結(jié)果,我們要做的是確保遍歷列表的次數(shù)最少。現(xiàn)在,如果比較集合{0,1}、{0,2},{1,2}的第1個(gè)元素并只對(duì)第1個(gè)元素相同的集合求并操作,又會(huì)得到什么結(jié)果?{0,l,2},而且只有一次操作!這樣就不需要遍歷列表來(lái)尋找非重復(fù)值。
- 將兩個(gè)集合合成一個(gè)大小為k的集合。這里使用集合的并操作來(lái)完成,在python中對(duì)應(yīng)操作符 |
2. 從頻繁項(xiàng)集中挖掘關(guān)聯(lián)規(guī)則
這里要首先明白的而是可信的計(jì)算方式:
一條規(guī)則P—>H的可信度定義為Support(P|H)/support(P)中,操作符丨表示集合的并操作,而數(shù)學(xué)上集合并的符號(hào)是?。P|H是指所有出現(xiàn)在集合P或者集合H中的元素。前面一節(jié)已經(jīng)計(jì)算了所有頻繁項(xiàng)集支持度。現(xiàn)在想獲得可信度,所需要做的只是取出那些支持度值做一次除法運(yùn)算。
類似于上面的頻繁項(xiàng)集生成,我們可以為每個(gè)頻繁項(xiàng)集產(chǎn)生許多關(guān)聯(lián)規(guī)則。如果能夠減少規(guī)則數(shù)目來(lái)確保問(wèn)題的可解性,那么計(jì)算起來(lái)就會(huì)好很多。可以觀察到,如果某條規(guī)則并不滿足最小可信度要求,那么該規(guī)則的所有子集也不會(huì)滿足最小可信度要求。上圖中,假設(shè)規(guī)則0,1,2—>3并不滿足最小可信度要求,那么就知道任何左部為0,1,2子集的規(guī)則也不會(huì)滿足最小可信度要求。上圖中這些規(guī)則上都加了陰影來(lái)表示。所以可以利用關(guān)聯(lián)規(guī)則的上述性質(zhì)屬性來(lái)減少需要測(cè)試的規(guī)則數(shù)目。
代碼:
# -*- coding: utf-8 -*- """ Created on Thu Nov 09 20:52:41 2017 """ from numpy import *def loadDataSet():return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]]# 生成候選規(guī)則集合(計(jì)算可信度值) def calcConf(freqSet, H, supportData, brl, minConf=0.7):prunedH = [] # 空列表保存滿足最小可信度要求的規(guī)則列表for conseq in H: # 遍歷H中的所有項(xiàng)集,并計(jì)算他們的可信度值# supportData[freqSet]是頻繁集的值,也就是支持度,freqSet是相應(yīng)的鍵,freqSet-conseq集合相減print 'P -->H:',freqSet-conseq,'-->',conseqconf = supportData[freqSet]/supportData[freqSet-conseq] # 得到可信度print 'conf:',confif conf >= minConf: # 如果滿足最小可信度值print freqSet-conseq,'-->',conseq,'conf:',conf # 打印出來(lái)brl.append((freqSet-conseq, conseq, conf)) # 規(guī)則添加到列表中prunedH.append(conseq)return prunedH# 生成更多的關(guān)聯(lián)規(guī)則 def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7): # H是頻繁項(xiàng)集的拆分m = len(H[0]) # 首先計(jì)算拆分的頻繁項(xiàng)集的大小print 'm:',mif (len(freqSet) > (m + 1)): # 嘗試進(jìn)一步合并Hmp1 = aprioriGen(H, m+1) # 創(chuàng)建Hm+1條新候選規(guī)則,生成H中元素的無(wú)重復(fù)組合print 'Hmp1:',Hmp1Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf)print 'Hmp1_2:',Hmp1 if (len(Hmp1) > 1): # 使用hmp迭代調(diào)用函數(shù)rulesFromConseq()來(lái)判斷是否可以進(jìn)一步組合規(guī)則rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)# 關(guān)聯(lián)規(guī)則生成函數(shù) def generateRules(L, supportData, minConf=0.7): #頻繁項(xiàng)集列表,包含頻繁項(xiàng)集支持?jǐn)?shù)據(jù)的字典,最小可信度閾值bigRuleList = []for i in range(1, len(L)): # 遍歷每一個(gè)頻繁項(xiàng)集,從第二個(gè)項(xiàng)集開始for freqSet in L[i]: # 為每個(gè)頻繁項(xiàng)集創(chuàng)建只包含單個(gè)元素集合的列表H1print 'L[i]:',L[i]print 'freqSet:',freqSetH1 = [frozenset([item]) for item in freqSet] # 列表推導(dǎo)式print 'H1:',H1if (i > 1):print 'i=%d...' % irulesFromConseq(freqSet, H1, supportData, bigRuleList, minConf)else: # i=1表示頻繁項(xiàng)集的元素?cái)?shù)目為2print 'i=1.....'calcConf(freqSet, H1, supportData, bigRuleList, minConf)print 'bigRuleList:',bigRuleList # 對(duì)應(yīng)的是calcConf函數(shù)中的brl參數(shù)return bigRuleList # 返回一個(gè)包含可信度的規(guī)則列表 # 構(gòu)建第一個(gè)候選子集 def createC1(dataSet):C1 = [] # 用來(lái)存儲(chǔ)所有不重復(fù)的項(xiàng)值for transaction in dataSet: # 遍歷數(shù)據(jù)集的每一次交易記錄for item in transaction: # 遍歷每一記錄的每一項(xiàng)if not [item] in C1: C1.append([item]) # 添加只包含該物品項(xiàng)的一個(gè)列表 C1.sort() # 對(duì)列表進(jìn)行排序frozenset_C1=map(frozenset, C1) # 這里把C1的每個(gè)元素映射到frozenset()return frozenset_C1 # 返回frozenset的列表# 從Ck生成Lk def scanD(D, Ck, minSupport): # 三個(gè)參數(shù):數(shù)據(jù)集,候選項(xiàng)集列表,感興趣項(xiàng)集的最小支持度 ssCnt = {}for tid in D: # 遍歷數(shù)據(jù)集的每一交易記錄for can in Ck: # 遍歷每一候選項(xiàng)集if can.issubset(tid): # 判斷該集合是否是記錄的一部分if not ssCnt.has_key(can): # 如果沒有,該集合就把該集合增加到字典中ssCnt[can]=1 # 這里字典的鍵就是該項(xiàng)集else: ssCnt[can] += 1 # 如果已經(jīng)有了鍵,就計(jì)數(shù)加1numItems = float(len(D)) # 變?yōu)楦↑c(diǎn)數(shù)retList = [] # 該列表包含滿足最小支持度要求的集合supportData = {}for key in ssCnt: # 遍歷字典的每個(gè)元素support = ssCnt[key]/numItems # 計(jì)算支持度(分母是所有的交易記錄)if support >= minSupport: # 如果滿足最小支持度retList.insert(0,key) # 則將字典元素添加到retlist列表 supportData[key] = support # 返回所有項(xiàng)集的支持度return retList, supportData # 返回頻繁項(xiàng)集的支持度# 創(chuàng)建候選項(xiàng)集CK def aprioriGen(Lk, k): # 參數(shù)分別是:頻繁項(xiàng)集列表LK,項(xiàng)集元素個(gè)數(shù)KretList = []lenLk = len(Lk)for i in range(lenLk): # 遍歷LK中的每一個(gè)元素for j in range(i+1, lenLk): # 和后面的元素逐一比較L1 = list(Lk[i])[:k-2]; L2 = list(Lk[j])[:k-2] # 取列表的兩個(gè)集合進(jìn)行比較L1.sort(); L2.sort() if L1==L2: # 如果兩個(gè)集合的前k-2個(gè)元素相同,那么就合并此兩個(gè)集合為大小k的集合 retList.append(Lk[i] | Lk[j]) # 這里用python中集合的并操作“|”return retListdef apriori(dataSet, minSupport = 0.5):C1 = createC1(dataSet) # C1是候選項(xiàng)集列表D = map(set, dataSet) L1, supportData = scanD(D, C1, minSupport) # L1是得到的是滿足最小支持度的項(xiàng)集構(gòu)成的集合L = [L1] # L1放入列表中k = 2while (len(L[k-2]) > 0): # 直到下一個(gè)大的項(xiàng)集為空Ck = aprioriGen(L[k-2], k) # ck是非重復(fù)的項(xiàng)集,即候選項(xiàng)集列表 Lk, supK = scanD(D, Ck, minSupport) # 得到Lk,即得到滿足最小支持度的項(xiàng)集 supportData.update(supK) # 這里是更新supportData字典,即加入supk,得到總的支持度字典 L.append(Lk) # 得到總的滿足支持度要求的項(xiàng)集集合 k += 1return L, supportData# 主函數(shù)dataSet=loadDataSet() L,suppData=apriori(dataSet,minSupport = 0.5) print 'L:',L print 'suppData:',suppData rules=generateRules(L, suppData, minConf=0.7) print 'rules:',rules運(yùn)行結(jié)果:
L: [[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])], [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])], [frozenset([2, 3, 5])], []] suppData: {frozenset([5]): 0.75, frozenset([3]): 0.75, frozenset([2, 3, 5]): 0.5, frozenset([3, 5]): 0.5, frozenset([2, 3]): 0.5, frozenset([2, 5]): 0.75, frozenset([1]): 0.5, frozenset([1, 3]): 0.5, frozenset([2]): 0.75} L[i]: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] freqSet: frozenset([1, 3]) H1: [frozenset([1]), frozenset([3])] i=1..... P -->H: frozenset([3]) --> frozenset([1]) conf: 0.666666666667 P -->H: frozenset([1]) --> frozenset([3]) conf: 1.0 frozenset([1]) --> frozenset([3]) conf: 1.0 bigRuleList: [(frozenset([1]), frozenset([3]), 1.0)] L[i]: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] freqSet: frozenset([2, 5]) H1: [frozenset([2]), frozenset([5])] i=1..... P -->H: frozenset([5]) --> frozenset([2]) conf: 1.0 frozenset([5]) --> frozenset([2]) conf: 1.0 P -->H: frozenset([2]) --> frozenset([5]) conf: 1.0 frozenset([2]) --> frozenset([5]) conf: 1.0 bigRuleList: [(frozenset([1]), frozenset([3]), 1.0), (frozenset([5]), frozenset([2]), 1.0), (frozenset([2]), frozenset([5]), 1.0)] L[i]: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] freqSet: frozenset([2, 3]) H1: [frozenset([2]), frozenset([3])] i=1..... P -->H: frozenset([3]) --> frozenset([2]) conf: 0.666666666667 P -->H: frozenset([2]) --> frozenset([3]) conf: 0.666666666667 bigRuleList: [(frozenset([1]), frozenset([3]), 1.0), (frozenset([5]), frozenset([2]), 1.0), (frozenset([2]), frozenset([5]), 1.0)] L[i]: [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])] freqSet: frozenset([3, 5]) H1: [frozenset([3]), frozenset([5])] i=1..... P -->H: frozenset([5]) --> frozenset([3]) conf: 0.666666666667 P -->H: frozenset([3]) --> frozenset([5]) conf: 0.666666666667 bigRuleList: [(frozenset([1]), frozenset([3]), 1.0), (frozenset([5]), frozenset([2]), 1.0), (frozenset([2]), frozenset([5]), 1.0)] L[i]: [frozenset([2, 3, 5])] freqSet: frozenset([2, 3, 5]) H1: [frozenset([2]), frozenset([3]), frozenset([5])] i=2... m: 1 Hmp1: [frozenset([2, 3]), frozenset([2, 5]), frozenset([3, 5])] P -->H: frozenset([5]) --> frozenset([2, 3]) conf: 0.666666666667 P -->H: frozenset([3]) --> frozenset([2, 5]) conf: 0.666666666667 P -->H: frozenset([2]) --> frozenset([3, 5]) conf: 0.666666666667 Hmp1_2: [] bigRuleList: [(frozenset([1]), frozenset([3]), 1.0), (frozenset([5]), frozenset([2]), 1.0), (frozenset([2]), frozenset([5]), 1.0)] rules: [(frozenset([1]), frozenset([3]), 1.0), (frozenset([5]), frozenset([2]), 1.0), (frozenset([2]), frozenset([5]), 1.0)]結(jié)果中給出三條規(guī)則:1—>3、5—>2及2—>5可以看到,后兩條包含2和5的規(guī)則可以互換前件和后件,但是前一條包含1和3的規(guī)則不行。
3. 筆記:
(1)Python List insert()方法:
list.insert(index, obj):
insert() 函數(shù)用于將指定對(duì)象插入列表的指定位置。該方法沒有返回值,但會(huì)在列表指定位置插入對(duì)象。
index – 對(duì)象 obj 需要插入的索引位置。
obj – 要插入列表中的對(duì)象。
示例:
>>> aList = [123, 'xyz', 'zara', 'abc'] >>> aList.insert( 3, 2009) >>> aList [123, 'xyz', 'zara', 2009, 'abc'] >>>(2) if can.issubset(tid):
In [54]: frozenset([1]).issubset(set([1, 3, 4])) Out[54]: True(3)
if not [item] in C1: C1.append([item])得到:
[[1]] # 每次的運(yùn)行效果 [[1], [3]] [[1], [3], [4]] [[1], [3], [4], [2]] [[1], [3], [4], [2], [5]]In [21]: C1 # 最后的c1 Out[21]: [frozenset({1}),frozenset({2}),frozenset({3}),frozenset({4}),frozenset({5})](4)L1 = list(Lk[i])[:k-2]
In [38]: L1 Out[38]: [frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})]In [39]: L=[L1]In [40]: L Out[40]: [[frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})]]In [41]: L[0] Out[41]: [frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})]In [42]: LK=L[0]In [43]: LK Out[43]: [frozenset({1}), frozenset({3}), frozenset({2}), frozenset({5})]In [44]: list(LK[0]) Out[44]: [1]In [45]: list(LK[0])[:0] Out[45]: [](5)list(Lk[i])[:k-2]
In [47]: bb=[1,2,3]In [48]: bb[:] Out[48]: [1, 2, 3]In [49]: bb[:1] Out[49]: [1]In [50]: bb[0] Out[50]: 1In [51]: bb[0:2] Out[51]: [1, 2]In [52]: bb[0:3] Out[52]: [1, 2, 3]In [53]: bb[:2] Out[53]: [1, 2](6)H1 = [frozenset([item]) for item in freqSet]
In [66]: L[1] Out[66]: [frozenset({1, 3}), frozenset({2, 5}), frozenset({2, 3}), frozenset({3, 5})]In [67]: [frozenset([item]) for item in frozenset({1, 3})] # 列表推到 Out[67]: [frozenset({1}), frozenset({3})]In [68]: [[item] for item in frozenset({1, 3})] Out[68]: [[1], [3]]In [69]: [item for item in frozenset({1, 3})] Out[69]: [1, 3]In [71]: supportData_2={frozenset([5]): 0.75, frozenset([3]): 0.75, frozenset([2, 3, 5]): 0.5, frozenset([3, 5]): 0.5, frozenset([2, 3]): 0.5, frozenset([2, 5]): 0.75, frozenset([1]): 0.5, frozenset([1, 3]): 0.5, frozenset([2]): 0.75}In [72]: supportData_2[frozenset({1, 3})] Out[72]: 0.5In [74]: supportData_2[frozenset({1, 3})-frozenset({1})] Out[74]: 0.75In [75]: frozenset({1, 3})-frozenset({1}) # 集合相減 Out[75]: frozenset({3})總結(jié)
以上是生活随笔為你收集整理的Apriori算法进行关联分析(1)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 维c泡腾片的副作用是什么(维c泡腾片的副
- 下一篇: 剑意化形机甲原著小说(小说中的剑意是什么