存储输出的pickle文件作为数据源
這個東西的好處是你可以把輸出的pickle文件作為數據源,而且不會占用disk的空間
####################################下面是官方教程(注意已經過時)##################################
下面是原文轉載:
Our data science and engineering teams have some big news to share… you can now use any public kernel’s output files as a data source!
Plug and Play
This new functionality enables code that is more flexible, reusable, and easier to troubleshoot. With kernels as data sources, you can neatly plug together a data polishing script in R, a visualization script, and a model fitting script in Python without messy dependencies.
Follow Along
Cleaner and more compartmentalized code makes kernels an even better learning resource. Now your code can follow better practices, making it easier for new data scientists to follow along and for collaborators to pull out the pieces that will help them iterate effectively.
Adding a Kernel Data Source
Only kernels with data files as output can be used as a data source.
There are two ways to add a kernel as a data source:
1.Click Add a Data Source from within a kernel you are editing.
You’ll see that Kernels is now listed alongside Datasets and Competitions in the pop up. You can search for specific kernels using the search box.
Aurelio, the lead engineer on this feature, would love to hear what you think!
?
?
#########################下面是自己的教程###################################
?
1.在一個名叫IEEE Simple XGBoost的Notebook中寫一大堆代碼(代碼中必須包含to_pickle之類的輸出函數),然后commit
2.重新打開
https://www.kaggle.com/appleyuchi/ieee-simple-xgboost/output
選擇output一欄:
3.點擊上面的Nwe Dataset,會彈出對話框
我們把新的數據集(同時也是一個文件夾)取名為useNewData,然后點擊Create,會出現下面的進度條:
4.
?
5.然后點擊New Notebook,此時就會新建New Notebook來使用自己剛剛生成的pickle數據.
6.等待一段時間后,我們會看到新生成NoteBook中的右側是:
7.最后,當你重新打開kaggle的時候,就能看到這個:
上面的useNewData就是你新建的數據集的名字.
?
小結:
pickle數據的讀取速度遠遠快于csv文件
?
?
?
Reference:
https://www.kaggle.com/product-feedback/45472
總結
以上是生活随笔為你收集整理的存储输出的pickle文件作为数据源的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: catboost进行分类并开启GPU模式
- 下一篇: No module named 'pan