當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

存储输出的pickle文件作为数据源

發布時間：2023/12/20 编程问答 27 豆豆

生活随笔收集整理的這篇文章主要介紹了存储输出的pickle文件作为数据源小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

這個東西的好處是你可以把輸出的pickle文件作為數據源，而且不會占用disk的空間

####################################下面是官方教程(注意已經過時)##################################

下面是原文轉載：

Our data science and engineering teams have some big news to share… you can now use any public kernel’s output files as a data source!

Plug and Play

This new functionality enables code that is more flexible, reusable, and easier to troubleshoot. With kernels as data sources, you can neatly plug together a data polishing script in R, a visualization script, and a model fitting script in Python without messy dependencies.

Follow Along

Cleaner and more compartmentalized code makes kernels an even better learning resource. Now your code can follow better practices, making it easier for new data scientists to follow along and for collaborators to pull out the pieces that will help them iterate effectively.

Adding a Kernel Data Source

Only kernels with data files as output can be used as a data source.

There are two ways to add a kernel as a data source:

1.Click Add a Data Source from within a kernel you are editing.

You’ll see that Kernels is now listed alongside Datasets and Competitions in the pop up. You can search for specific kernels using the search box.

Go to any usable kernel’s Output tab and click “New Kernel Using This Data”.

Aurelio, the lead engineer on this feature, would love to hear what you think!

#########################下面是自己的教程###################################

1.在一個名叫IEEE Simple XGBoost的Notebook中寫一大堆代碼(代碼中必須包含to_pickle之類的輸出函數),然后commit

2.重新打開

https://www.kaggle.com/appleyuchi/ieee-simple-xgboost/output

選擇output一欄:

3.點擊上面的Nwe Dataset,會彈出對話框

我們把新的數據集(同時也是一個文件夾)取名為useNewData,然后點擊Create,會出現下面的進度條:

5.然后點擊New Notebook,此時就會新建New Notebook來使用自己剛剛生成的pickle數據.

6.等待一段時間后,我們會看到新生成NoteBook中的右側是:

7.最后,當你重新打開kaggle的時候,就能看到這個:

上面的useNewData就是你新建的數據集的名字.

小結:

pickle數據的讀取速度遠遠快于csv文件

Reference:

https://www.kaggle.com/product-feedback/45472

總結

以上是生活随笔為你收集整理的存储输出的pickle文件作为数据源的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： catboost进行分类并开启GPU模式
下一篇： No module named 'pan