當前位置：首頁 > 编程语言 > python >内容正文

python

python read_csv chunk_Python 数据分析之逐块读取文本的实现

發(fā)布時間：2025/1/21 python 57 豆豆

生活随笔收集整理的這篇文章主要介紹了 python read_csv chunk_Python 数据分析之逐块读取文本的实现小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

背景

《利用Python進行數(shù)據(jù)分析》，第 6 章的數(shù)據(jù)加載操作 read_xxx，有 chunksize 參數(shù)可以進行逐塊加載。

經(jīng)測試，它的本質就是將文本分成若干塊，每次處理 chunksize 行的數(shù)據(jù)，最終返回一個TextParser 對象，對該對象進行迭代遍歷，可以完成逐塊統(tǒng)計的合并處理。

示例代碼

文中的示例代碼分析如下：

from pandas import DataFrame,Series

import pandas as pd

path="D:/AStudy2018/pydata-book-2nd-edition/examples/ex6.csv"

# chunksize return TextParser

chunker=pd.read_csv(path,chunksize=1000)

# an array of Series

tot=Series([])

chunkercount=0

for piece in chunker:

print "------------piece[key] value_counts start-----------"

#piece is a DataFrame,lenth is chunksize=1000,and piece[key] is a Series ,key is int ,value is the key column

print piece["key"].value_counts()

print "------------piece[key] value_counts end-------------"

#piece[key] value_counts is a Series ,key is the key column, and value is the key count

tot=tot.add(piece["key"].value_counts(),fill_value=0)

chunkercount+=1

#last order the series

tot=tot.order(ascending=False)

print chunkercount

print "--------------"

流程分析

首先，例子數(shù)據(jù) ex6.csv 文件總共有 10000 行數(shù)據(jù)，使用 chunksize=1000 后，read_csv操作返回一個 TextParser 對象，該對象總共有10個元素，遍歷過程中打印 chunkercount驗證得到。

其次，每個 piece 對象是一個 DataFrame 對象，piece["key"] 得到的是一個 Series 對象，默認是數(shù)值索引，值為 csv 文件中的 key 列的值，即各個字符串。

將每個 Series 的 value_counts 作為一個Series，與上一次統(tǒng)計的 tot 結果進行 add 操作，最終得到所有塊數(shù)據(jù)中各個 key 的累加值。

最后，對 tot 進行 order 排序，按降序得到各個 key 的值在 csv 文件中出現(xiàn)的總次數(shù)。

這里很巧妙了使用 Series 對象的 add 操作，對兩個 Series 執(zhí)行 add 操作，即合并相同key：key相同的記錄的值累加，key不存在的記錄設置填充值為0。

輸出結果為：

到此這篇關于Python 數(shù)據(jù)分析之逐塊讀取文本的實現(xiàn)的文章就介紹到這了,更多相關Python 逐塊讀取文本內容請搜索云海天教程以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持云海天教程！

以上是生活随笔為你收集整理的python read_csv chunk_Python 数据分析之逐块读取文本的实现的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內容還不錯，歡迎將生活随笔推薦給好友。