【数据竞赛】竞赛宝典黑科技:基于开源结果的高端融合策略
作者: 塵沙杰少,櫻落
競(jìng)賽寶典黑科技_基于開源結(jié)果的融合
(輕輕松松挖銀牌)
背景
本篇文章的思想很簡(jiǎn)單,不需要自己跑任何的模型,只需要將現(xiàn)有的開源提交結(jié)果進(jìn)行“直接優(yōu)化兩步走”即可拿到比所有開源結(jié)果更好的方案,有一些kaggle競(jìng)賽懶人選手就是直接通過(guò)此種策略在最后三天直接拿下銀牌.......
模型融合兩步走
1. 基礎(chǔ)融合
收集所有開源社區(qū)的提交結(jié)果(假設(shè)有N個(gè)結(jié)果,);
按照所有開源結(jié)果的分?jǐn)?shù)進(jìn)行排序(由低到高),();
取前M個(gè)較低的結(jié)果進(jìn)行某種方式的集成得到結(jié)果, 于是我們的結(jié)果變?yōu)? ();
然后我們選取與分?jǐn)?shù)相近的結(jié)果進(jìn)行集成;依次進(jìn)行直到所有結(jié)果集成完畢。
2. 基礎(chǔ)融合升級(jí)
拿到基礎(chǔ)融合的結(jié)果,再依次對(duì)結(jié)果進(jìn)行修正。(細(xì)節(jié)可以看下面的案例)
屢比屢大,則乘上大于1的系數(shù);屢比屢小,則乘上小于1的系數(shù);
案例
該案例摘錄于:《kaggle:[results-driven] Tabular Playground Series - 201》:https://www.kaggle.com/somayyehgholami/results-driven-tabular-playground-series-201
1. 收集開源提交結(jié)果
import?pandas?as?pd? import?matplotlib.pyplot?as?plt%matplotlib?inline? dfk?=?pd.DataFrame({?'Kernel?ID':?['A',?'B',?'C',?'D',?'E',?'F',?'G',?'H',?'I',?'J',?'K'],??'Score':?????[?0.69864?,?0.69846?,?0.69836?,?0.69824?,?0.69813,?0.69795,?0.69782,?0.69749,?0.69747,?0.69735,?0.69731],???'File?Path':?['../input/tps-jan-2021-gbdts-baseline/submission.csv',?'../input/pseudo-labelling/submission.csv',?'../input/v4-baseline-lgb-no-tune/sub_0.6971.csv',?'../input/tps21-optuna-lgb-fast-hyper-parameter-tunning/submission.csv',?'../input/gbdts-baseline-prevision-io-for-free/submission.csv',?'../input/v41-eda-gbdts/res41.csv',?'../input/v3-ensemble-lgb-xgb-cat/submission.csv'?,?'../input/tabular-playground/sub_gbm.csv',?'../input/v48tabular-playground-series-xgboost-lightgbm/V48-0.69747.csv',?'../input/xgboost-hyperparameter-tuning-using-optuna/submission.csv',?'../input/tabular-playground-some-slightly-useful-features/sub_gbm.csv']????? })????dfk????????| A | 0.69864 | ../input/tps-jan-2021-gbdts-baseline/submissio... |
| B | 0.69846 | ../input/pseudo-labelling/submission.csv |
| C | 0.69836 | ../input/v4-baseline-lgb-no-tune/sub_0.6971.csv |
| D | 0.69824 | ../input/tps21-optuna-lgb-fast-hyper-parameter... |
| E | 0.69813 | ../input/gbdts-baseline-prevision-io-for-free/... |
| F | 0.69795 | ../input/v41-eda-gbdts/res41.csv |
| G | 0.69782 | ../input/v3-ensemble-lgb-xgb-cat/submission.csv |
| H | 0.69749 | ../input/tabular-playground/sub_gbm.csv |
| I | 0.69747 | ../input/v48tabular-playground-series-xgboost-... |
| J | 0.69735 | ../input/xgboost-hyperparameter-tuning-using-o... |
| K | 0.69731 | ../input/tabular-playground-some-slightly-usef... |
2. 結(jié)果融合函數(shù)
用線上效果好的結(jié)果coeff + 線上效果差一些的結(jié)果(1-coeff), coeff一般是大于0.5的
| A | 0.69864 | ../input/tps-jan-2021-gbdts-baseline/submissio... |
| B | 0.69846 | ../input/pseudo-labelling/submission.csv |
| C | 0.69836 | ../input/v4-baseline-lgb-no-tune/sub_0.6971.csv |
| D | 0.69824 | ../input/tps21-optuna-lgb-fast-hyper-parameter... |
| E | 0.69813 | ../input/gbdts-baseline-prevision-io-for-free/... |
| F | 0.69795 | ../input/v41-eda-gbdts/res41.csv |
| G | 0.69782 | ../input/v3-ensemble-lgb-xgb-cat/submission.csv |
| H | 0.69749 | ../input/tabular-playground/sub_gbm.csv |
| I | 0.69747 | ../input/v48tabular-playground-series-xgboost-... |
| J | 0.69735 | ../input/xgboost-hyperparameter-tuning-using-o... |
| K | 0.69731 | ../input/tabular-playground-some-slightly-usef... |
3. 結(jié)果融合初步
3.1 融合1:通過(guò)A-G -> 最優(yōu)結(jié)果1
[ A: (Score: 0.69864), B: (Score: 0.69846), ... , F: (Score: 0.69795), G: (Score: 0.69782) ] >>> sub1: (Score: 0.69781)
3.2 融合2:使用融合結(jié)果1以及差不大的分?jǐn)?shù)融合
注意線上結(jié)果好的sub是main,次優(yōu)的是support;
[ H: (Score: 0.69749) , sub1: (Score: 0.69781) ] >>> sub2: (Score: 更好了)
3.3 融合3:使用融合結(jié)果2以及差不大的分?jǐn)?shù)融合
[ I: (Score: 0.69747) , sub2: (Score: -----) ] >>> sub3: (Score: 更好了)
3.4 融合4:使用融合結(jié)果3以及差不大的分?jǐn)?shù)融合
[ J: (Score: 0.69735) , sub3: (Score: ------) ] >>> sub4: (Score: 更好了)
3.5 融合5:使用融合結(jié)果4以及差不大的分?jǐn)?shù)融合
[ k: (Score: 0.69731) , sub4: (Score: -------) ] >>> sub5: (Score: 0.69688)
4. 結(jié)果融合升級(jí)
對(duì)預(yù)測(cè)結(jié)果偏低的糾正,對(duì)預(yù)測(cè)結(jié)果偏高的糾正
sub5: (Score: 0.69688) >>> sub6: (Score: 0.69682)
We first compared the result of our previous step with the results of each kernel used. We looked for rows where the results of all kernels (or the majority of kernels) differed from the results of our previous step (more or less). On the other hand, we know that the results of the previous step are better than the results of all the kernels used. So we can guess that these rows have been oppressed !!! That is, in the previous steps, they were mistakenly increased or decreased. We compensate for these possible errors to some extent by applying the coefficients "pcoeff" and "mcoeff" (of course, only in these rows). Fortunately, the pictures illustrate the method well.
main?=?sub5??#0.69688 comp?=?main.copy()majority?=?9????????#?Hyper?parameter pcoeff???=?1.0016???#?Hyper?parameter mcoeff???=?0.9984???#?Hyper?parameterpxy?=?[[],[],[]] mxy?=?[[],[],[]]for?i?in?main.columns[1:]:????lm???=?main[i].tolist()?ls???=?[[],[],[],[],[],[],[],[],[],[],[]]res??=?[]##?1.?讀取所有的開源結(jié)果for?n?in?range?(11):???????csv???=?pd.read_csv(dfk.iloc[n,?2])??ls[n]?=?csv[i].tolist()?##?2.?for?j?in?range(len(main)):??pcount?=?0pvalue?=?0.0mcount?=?0mvalue?=?0.0?##?2.1?統(tǒng)計(jì)main的結(jié)果大于ls的次數(shù),用pcount記錄##?????統(tǒng)計(jì)main的結(jié)果小于ls的次數(shù),用mcount記錄##?2.2?pcount的次數(shù)大于一個(gè)閾值,那么我們的main的結(jié)果乘上一個(gè)系數(shù)(一般大于1)##?????mcount的次數(shù)大于某個(gè)閾值,那么我們的main的結(jié)果乘上一個(gè)系數(shù)(一般小于1)for?k?in?range?(11):??if?lm[j]?>?ls[k][j]:pcount?+=?1pvalue?+=?ls[k][j]?????????????????else:?mcount?+=?1mvalue?+=?ls[k][j]?if?(pcount?>?majority):?res.append(lm[j]?*?pcoeff)pxy[0].append(lm[j])pxy[1].append(pvalue?/?pcount)pxy[2].append(lm[j]??*?pcoeff)elif?(mcount?>?majority):?res.append(lm[j]?*?mcoeff)mxy[0].append(lm[j])mxy[1].append(mvalue?/?mcount)mxy[2].append(lm[j]??*?mcoeff)else:?res.append(lm[j])???????comp[i]?=?ressub6?=?comp?
往期精彩回顧適合初學(xué)者入門人工智能的路線及資料下載機(jī)器學(xué)習(xí)及深度學(xué)習(xí)筆記等資料打印機(jī)器學(xué)習(xí)在線手冊(cè)深度學(xué)習(xí)筆記專輯《統(tǒng)計(jì)學(xué)習(xí)方法》的代碼復(fù)現(xiàn)專輯 AI基礎(chǔ)下載機(jī)器學(xué)習(xí)的數(shù)學(xué)基礎(chǔ)專輯 本站知識(shí)星球“黃博的機(jī)器學(xué)習(xí)圈子”(92416895) 本站qq群704220115。 加入微信群請(qǐng)掃碼:總結(jié)
以上是生活随笔為你收集整理的【数据竞赛】竞赛宝典黑科技:基于开源结果的高端融合策略的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: QQ邮箱怎么发送文件夹 怎样在QQ邮箱里
- 下一篇: 火狐浏览器工具栏/折叠菜单怎么设置?火狐