weka: FCBFSearch
paper:
Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution.
?
Feature selection method based on correlation measure and relevance redundancy analysis.
Use in conjunction with an attribute set evaluator
通過對特征集的相關性 以及 冗余分析做評價
?
//TODO 不明白
? // m_attributeList?? 屬性的索引, m_attributeList[2]表示待測屬性集中的第2個屬性在原數據中的索引位置。
? // 簡單起見, 可以認為m_attributeList[i] == i. 假設5個屬性, index分別為0??? 1???? 2?? 3???? 4
? // m_attributeMerit? 屬性的評價分. 假設分別為?????????????????????????????????????????2.1 2.3? 1? 1.2?? 0.5
? // rank 評價分升序排列時的索引值。 即?????????????????????????????????????????????????????4??? 2???? 3?? 0???? 1
? /*
? bestToWorst: 其實就是按merit從高到底排列其index
? 1 2.3
? 0 2.1
? 3 1.2
? 2 1
? 4 0.5
? */
?
?/*
????? bestToWorst: 其實就是按merit從高到底排列其index
????? 1 2.3
????? 0 2.1
????? 3 1.2
????? 2 1
????? 4 0.5
?????
????? m_rankedFCBF[dimension.length][4]
????? 1 2.3? -1
????? 0 2.1? -1
????? 3 1.2? -1
????? 2 1??? -1
????? 4 0.5? -1
?*/
FCBFElimination 就是
for(i = 0; i<dimension.length; i++)
{
???? if(m_rankedFCBF[2] != 1)
???? {++i; continue;}? //
??? for(j=i+1; j<dimension.length; j++)
??? {
????????? if(m_rankedFCBF[i][1] < SUij)? //則置 m_rankedFCBF[j][2]=m_rankedFCBF[i][0]
????????? {
??????????????? m_rankedFCBF[j][2] = m_rankedFCBF[i][0];
??????????????? m_rankedFCBF[j][3] = SUij;
??????????}
??? }
}
然后保留m_rankedFCBF[i][0] == m_rankedFCBF[i][2]的屬性
?
?
具體算法邏輯如下:
?
獲取最優特征集:
for (i = 0; i < m_attributeList.length; i++) {m_attributeMerit[i] = ASEvaluator.evaluateAttribute(m_attributeList[i]);}double[][] tempRanked = rankedAttributes();int[] rankedAttributes = new int[m_selectedFeatures.length];for (i = 0; i < m_selectedFeatures.length; i++) {rankedAttributes[i] = (int)tempRanked[i][0];}return rankedAttributes;
?
?
rankedAttributes:
public double[][] rankedAttributes () {int i, j;//m_attributeList index//m_attributeMerit 各index對應的merit//rank merit從小到大有序的indexif (m_attributeList == null || m_attributeMerit == null) {throw new Exception("");}//對屬性的評價分排序,假設有n個屬性. 索引排序, 得到merit從小到大的indexint[] ranked = Utils.sort(m_attributeMerit);// reverse the order of the ranked indexes//bestToWorst是n*2數組double[][] bestToWorst = new double[ranked.length][2];for (i = ranked.length - 1, j = 0; i >= 0; i--) {bestToWorst[j++][0] = ranked[i];//alan: means in the arrary ranked, varialbe is from ranked as from small to large}// convert the indexes to attribute indexesfor (i = 0; i < bestToWorst.length; i++) {int temp = ((int)bestToWorst[i][0]);bestToWorst[i][0] = m_attributeList[temp]; //for the indexbestToWorst[i][1] = m_attributeMerit[temp]; //for the value of the index}if (m_numToSelect > bestToWorst.length) {throw new Exception("More attributes requested than exist in the data");}this.FCBFElimination(bestToWorst);if (m_numToSelect <= 0) {if (m_threshold == -Double.MAX_VALUE) {m_calculatedNumToSelect = m_selectedFeatures.length;} else {determineNumToSelectFromThreshold(m_selectedFeatures);}}return m_selectedFeatures;}??
?
?
FCBFElimination:
private void FCBFElimination(double[][]rankedFeatures)throws Exception {int i,j;m_rankedFCBF = new double[m_attributeList.length][4];int[] attributes = new int[1];int[] classAtrributes = new int[1];int numSelectedAttributes = 0;int startPoint = 0;double tempSUIJ = 0;AttributeSetEvaluator ASEvaluator = (AttributeSetEvaluator)m_asEval;for (i = 0; i < rankedFeatures.length; i++) {m_rankedFCBF[i][0] = rankedFeatures[i][0];m_rankedFCBF[i][1] = rankedFeatures[i][1];m_rankedFCBF[i][2] = -1;}while (startPoint < rankedFeatures.length){if (m_rankedFCBF[startPoint][2] != -1){startPoint++;continue;}m_rankedFCBF[startPoint][2] = m_rankedFCBF[startPoint][0];numSelectedAttributes++;for (i = startPoint + 1; i < m_attributeList.length; i++){if (m_rankedFCBF[i][2] != -1){continue;}attributes[0] = (int) m_rankedFCBF[startPoint][0];classAtrributes[0] = (int) m_rankedFCBF[i][0];tempSUIJ = ASEvaluator.evaluateAttribute(attributes, classAtrributes);if (m_rankedFCBF[i][1] < tempSUIJ || Math.abs(tempSUIJ-m_rankedFCBF[i][1])<1E-8){m_rankedFCBF[i][2] = m_rankedFCBF[startPoint][0];m_rankedFCBF[i][3] = tempSUIJ;}}startPoint++;}m_selectedFeatures = new double[numSelectedAttributes][2];for (i = 0, j = 0; i < m_attributeList.length; i++){if (m_rankedFCBF[i][2] == m_rankedFCBF[i][0]){m_selectedFeatures[j][0] = m_rankedFCBF[i][0];m_selectedFeatures[j][1] = m_rankedFCBF[i][1];j++;}}}
總結
以上是生活随笔為你收集整理的weka: FCBFSearch的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: weka: exhaustive sea
- 下一篇: weka: naive bayes