kaggle使用笔记
因?yàn)閰⒓恿薉CASE2018比賽的聲學(xué)場(chǎng)景分類的子任務(wù),這個(gè)比賽有個(gè)排行榜是用的kaggle來(lái)做的,所以在比賽中,用到過(guò)kaggle API,下面是關(guān)于kaggle的使用筆記。
kaggle 是什么?
Kaggle是一個(gè)數(shù)據(jù)科學(xué)競(jìng)賽的平臺(tái),很多公司會(huì)發(fā)布一些接近真實(shí)業(yè)務(wù)的問(wèn)題,吸引愛好數(shù)據(jù)科學(xué)的人來(lái)一起解決。
點(diǎn)擊導(dǎo)航欄的 competitions 可以看到有很多比賽,其中正式比賽,一般會(huì)有獎(jiǎng)金或者工作機(jī)會(huì),除了正式比賽還有一些為初學(xué)者提供的 playground,在這里可以先了解這個(gè)比賽,練習(xí)能力,再去參加正式比賽。
參賽方法
參賽之前,首先需要一個(gè)kaggle的賬號(hào),激活之后,找到自己感興趣的competitions,然后選擇“join competitions”即可。
界面介紹:
Overview: 首先在 overview 中仔細(xì)閱讀問(wèn)題的描述,這個(gè)比賽是讓我們預(yù)測(cè)房?jī)r(jià),它會(huì)給我們 79 個(gè)影響房?jī)r(jià)的變量,我們可以通過(guò)應(yīng)用 random forest,gradient boosting 等算法,來(lái)對(duì)房?jī)r(jià)進(jìn)行預(yù)測(cè)。
Data:在這里給我們提供了 train 數(shù)據(jù)集,用來(lái)訓(xùn)練模型;test 數(shù)據(jù)集,用來(lái)將訓(xùn)練好的模型應(yīng)用到這上面,進(jìn)行預(yù)測(cè),這個(gè)結(jié)果也是要提交到系統(tǒng)進(jìn)行評(píng)價(jià)的;sample_submission 就是我們最后提交的 csv 文件中,里面的列的格式需要和這里一樣。
Kernels:可以看到一些參賽者分享的代碼。
Discussion:參賽者們可以在這里提問(wèn),分享經(jīng)驗(yàn)。
Leaderboard:就是參賽者的排行榜。
參賽流程
第一步:在 Data 里面下載三個(gè)數(shù)據(jù)集,最基本的就是上面提到的三個(gè)文件,有些比賽會(huì)有附加的數(shù)據(jù)描述文件等。
第二步:自己在線下分析,建模,調(diào)參,把用 test 數(shù)據(jù)集預(yù)測(cè)好的結(jié)果,按照 sample_submission 的格式輸出到 csv 文件中。
第三步:點(diǎn)擊藍(lán)色按鈕 ’Submit Predictions’ ,把 csv 文件拖拽進(jìn)去,然后系統(tǒng)就會(huì)加載并檢驗(yàn)結(jié)果,稍等片刻后就會(huì)在 Leaderboard 上顯示當(dāng)前結(jié)果所在的排名位置。
上傳過(guò)一次結(jié)果之后,就直接加入了這場(chǎng)比賽。
注意:正式比賽中每個(gè)團(tuán)隊(duì)每天有 5 次的上傳機(jī)會(huì),然后就要等 24 小時(shí)再次傳結(jié)果,playground 的是 9 次。
kaggle API的安裝及使用
安裝方法
首先確保安裝了Python和包管理器pip。
運(yùn)行以下命令以使用命令行訪問(wèn)Kaggle API:
| 1 2 3 4 | // Windows系統(tǒng),默認(rèn)的安裝目錄是“$ PYTHON_HOME / Scripts” pip install kaggle // Mac / Linux系統(tǒng) pip install --user kaggle |
下載API credentials
要使用Kaggle API,需要在kaggle官網(wǎng)上注冊(cè)Kaggle帳戶。
轉(zhuǎn)到用戶個(gè)人資料的’Account’標(biāo)簽,然后選擇“create API Token”之后會(huì)彈出kaggle.json的下載,這是一個(gè)包含API credentials的文件。
- 將此文件放在?/ .kaggle / kaggle.json位置(在Windows上的位置C:\ Users \ <Windows-username> \ .kaggle \ kaggle.json)。
第一次安裝的時(shí)候,再C:\ Users \ <Windows-username> \ .kaggle \ kaggle.json目錄下沒(méi)有.kaggle這個(gè)文件夾,后來(lái)通過(guò)pip uninstall kaggle再重新安裝之后,自動(dòng)出現(xiàn).kaggle文件夾,隨后直接將kaggle.json文件復(fù)制到這個(gè)文件夾下面了。
您可以定義一個(gè)shell環(huán)境變量KAGGLE_CONFIG_DIR來(lái)將此位置更改為$ KAGGLE_CONFIG_DIR / kaggle.json(在Windows上它將是%KAGGLE_CONFIG_DIR%\ kaggle.json)。
命令
命令行支持命令:
| 1 2 3 | kaggle competitions {list,files,download,submit,submissions,leaderboard} kaggle datasets {list, files, download, create, version, init} kaggle config {view, set, unset} |
比賽——API支持Kaggle Competitions的命令。
List competitions
| 1 2 3 4 5 6 7 8 9 | usage: kaggle competitions list [-h] [-p PAGE] [-s SEARCH] [-v] optional arguments: -h, --help show this help message and exit -p PAGE, --page PAGE page number -s SEARCH, --search SEARCH term(s) to search for -v, --csv print in CSV format (if not set print in table format) |
例子:
| 1 | kaggle competitions list -s health |
- List competition files
| 1 2 3 4 5 6 7 8 9 | usage: kaggle competitions files [-h] [-c COMPETITION] [-v] [-q] optional arguments: -h, --help show this help message and exit -c COMPETITION, --competition COMPETITION Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -v, --csv Print results in CSV format (if not set print in table format) -q, --quiet Suppress printing information about download progress |
例子:
| 1 | kaggle competitions files -c favorita-grocery-sales-forecasting |
- Download competition files
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | usage: kaggle competitions download [-h] [-c COMPETITION] [-f FILE] [-p PATH] [-w] [-o] [-q] optional arguments: -h, --help show this help message and exit -c COMPETITION, --competition COMPETITION Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -f FILE, --file FILE File name, all files downloaded if not provided (use "kaggle competitions files -c <competition>" to show options) -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to ~/.kaggle -w, --wp Download files to current working path -o, --force Skip check whether local version of file is up to date, force file download -q, --quiet Suppress printing information about download progress |
例子:
| 1 2 | kaggle competitions download -c favorita-grocery-sales-forecasting kaggle competitions download -c favorita-grocery-sales-forecasting -f test.csv.7z |
- Submit to a competition
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | usage: kaggle competitions submit [-h] [-c COMPETITION] -f FILE -m MESSAGE [-q] required arguments: -f FILE, --file FILE File for upload (full path) -m MESSAGE, --message MESSAGE Message describing this submission optional arguments: -h, --help show this help message and exit -c COMPETITION, --competition COMPETITION Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -q, --quiet Suppress printing information about download progress |
例子:
| 1 | kaggle competitions submit -c favorita-grocery-sales-forecasting -f sample_submission_favorita.csv.7z -m "My submission message" |
- List competition submissions
| 1 2 3 4 5 6 7 8 9 | usage: kaggle competitions submissions [-h] [-c COMPETITION] [-v] [-q] optional arguments: -h, --help show this help message and exit -c COMPETITION, --competition COMPETITION Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -v, --csv Print results in CSV format (if not set print in table format) -q, --quiet Suppress printing information about download progress |
例子:
| 1 | kaggle competitions submissions -c favorita-grocery-sales-forecasting |
- Get competition leaderboard
| 1 2 3 4 5 6 7 8 9 10 11 12 | usage: kaggle competitions leaderboard [-h] [-c COMPETITION] [-s] [-d] [-p PATH] [-q] optional arguments: -h, --help show this help message and exit -c COMPETITION, --competition COMPETITION Competition URL suffix (use "kaggle competitions list" to show options) If empty, the default competition will be used (use "kaggle config set competition")" -s, --show Show the top of the leaderboard -d, --download Download entire leaderboard -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to ~/.kaggle -q, --quiet Suppress printing information about download progress |
例子:
| 1 | kaggle competitions leaderboard -c favorita-grocery-sales-forecasting -s |
數(shù)據(jù)集——API支持以下用于Kaggle數(shù)據(jù)集的命令。
- List datasets
| 1 2 3 4 5 6 7 8 | usage: kaggle datasets list [-h] [-p PAGE] [-s SEARCH] [-v] optional arguments: -h, --help show this help message and exit -p PAGE, --page PAGE Page number for results paging -s SEARCH, --search SEARCH Term(s) to search for -v, --csv Print results in CSV format (if not set print in table format) |
例子:
| 1 | kaggle datasets list -s demographics |
- List files for a dataset
| 1 2 3 4 5 6 7 8 9 | usage: kaggle datasets files [-h] -d DATASET [-v] required arguments: -d DATASET, --dataset DATASET Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options) optional arguments: -h, --help show this help message and exit -v, --csv Print results in CSV format (if not set print in table format) |
例子:
| 1 | kaggle datasets files -d zillow/zecon |
- Download dataset files
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | usage: kaggle datasets download [-h] -d DATASET [-f FILE] [-p PATH] [-w] [-o] [-q] required arguments: -d DATASET, --dataset DATASET Dataset URL suffix in format <owner>/<dataset-name> (use "kaggle datasets list" to show options) optional arguments: -h, --help show this help message and exit -f FILE, --file FILE File name, all files downloaded if not provided (use "kaggle datasets files -d <dataset>" to show options) -p PATH, --path PATH Folder where file(s) will be downloaded, defaults to ~/.kaggle -w, --wp Download files to current working path -o, --force Skip check whether local version of file is up to date, force file download -q, --quiet Suppress printing information about download progress |
例子:
| 1 2 3 | kaggle datasets download -d zillow/zecon kaggle datasets download -d zillow/zecon -f State_time_series.csv |
- Initialize metadata file for dataset creation
| 1 2 3 4 5 6 7 8 | usage: kaggle datasets init [-h] -p FOLDER required arguments: -p FOLDER, --path FOLDER Folder for upload, containing data files and a special metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Metadata) optional arguments: -h, --help show this help message and exit |
例子:
| 1 | kaggle datasets init -p /path/to/dataset |
- Create a new dataset
| 1 2 3 4 5 6 7 8 9 10 11 | usage: kaggle datasets create [-h] -p FOLDER [-u] [-q] required arguments: -p FOLDER, --path FOLDER Folder for upload, containing data files and a special metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Metadata) optional arguments: -h, --help show this help message and exit -u, --public Create the Dataset publicly (default is private) -q, --quiet Suppress printing information about download progress -t, --keep-tabular Do not convert tabular files to CSV (default is to convert) |
例子:
| 1 | kaggle datasets create -p /path/to/dataset |
- Create a new dataset version
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | usage: kaggle datasets version [-h] -m VERSION_NOTES -p FOLDER [-q] required arguments: -m VERSION_NOTES, --message VERSION_NOTES Message describing the new version -p FOLDER, --path FOLDER Folder for upload, containing data files and a special metadata.json file (https://github.com/Kaggle/kaggle-api/wiki/Metadata) optional arguments: -h, --help show this help message and exit -q, --quiet Suppress printing information about download progress -t, --keep-tabular Do not convert tabular files to CSV (default is to convert) -d, --delete-old-versions Delete old versions of this dataset |
例子:
| 1 | kaggle datasets version -p /path/to/dataset -m "Updated data" |
配置
- View current config values
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | usage: kaggle config path [-h] [-p PATH] optional arguments: -h, --help show this help message and exit -p PATH, --path PATH folder where file(s) will be downloaded, defaults to ~/.kaggle Example: kaggle config path -p C:\ View current config values usage: kaggle config view [-h] optional arguments: -h, --help show this help message and exit |
例子:
| 1 | kaggle config view |
- Set a configuration value
| 1 2 3 4 5 6 7 8 9 10 | usage: kaggle config set [-h] -n NAME -v VALUE required arguments: -n NAME, --name NAME Name of the configuration parameter (one of competition, path, proxy) -v VALUE, --value VALUE Value of the configuration parameter, valid values depending on name - competition: Competition URL suffix (use "kaggle competitions list" to show options) - path: Folder where file(s) will be downloaded, defaults to ~/.kaggle - proxy: Proxy for HTTP requests |
例子:
| 1 | kaggle config set -n competition -v titanic |
- Clear a configuration value
| 1 2 3 4 5 | usage: kaggle config unset [-h] -n NAME required arguments: -n NAME, --name NAME Name of the configuration parameter (one of competition, path, proxy) |
例子:
| 1 | kaggle config unset -n competition |
注意:目前最大的限制是此時(shí)不以任何方式支持內(nèi)核。 我們打算在不久的將來(lái)實(shí)施支持,盡管沒(méi)有ETA。 此外,目前無(wú)法使用大型數(shù)據(jù)集(> = 2GB)。
參考
總結(jié)
以上是生活随笔為你收集整理的kaggle使用笔记的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 【转】小程序图片裁剪组件
- 下一篇: LBS地理位置距离计算方法之geohas