python输出到txt文件太大的原因_如何减小Python创建的txt文件的大小?
我在Netezza服務器上的一個表中有大約2M行x70列的數值和分類數據,我想使用Python將這些數據轉儲到一個.txt文件中。
我以前用SAS做過這個,在我的測試用例中,我得到了一個值450MB的txt文件。
我用Python做了一些嘗試。在# One line at a time
startTime = datetime.datetime.now().replace(microsecond=0)
cnxn = pyodbc.connect('DSN=NZ_LAB')
cursor = cnxn.cursor()
c = cursor.execute("""SELECT * FROM MYTABLE""")
with open('dump_test_pyodbc.csv','wb') as csv:
csv.write(','.join([g[0] for g in c.description])+'\n')
while 1:
a=c.fetchone()
if not a:
break
csv.write(','.join([str(g) for g in a])+'\n')
cnxn.close()
endTime = datetime.datetime.now().replace(microsecond=0)
print "Time elapsed PYODBC:", endTime - startTime
>>Time elapsed PYODBC: 0:18:20
# Use Pandas chunksize
startTime = datetime.datetime.now().replace(microsecond=0)
cnxn = pyodbc.connect('DSN=NZ_LAB')
sql = ("""SELECT * FROM MYTABLE""")
df = psql.read_sql(sql, cnxn, chunksize=1000)
for k, chunk in enumerate(df):
if k == 0:
chunk.to_csv('dump_chunk.csv',index=False,mode='w')
else:
chunk.to_csv('dump_chunk.csv',index=False,mode='a',header=False)
endTime = datetime.datetime.now().replace(microsecond=0)
print "Time elapsed PANDAS:", endTime - startTime
cnxn.close()
>>Time elapsed PANDAS: 0:29:29
現在來看看尺寸:
Pandas方法創建了一個690MB的文件,另一個方法創建了一個630MB的文件。
速度和大小似乎有利于前一種方法,但是,從大小上看,這仍然比原來的SAS方法大得多。
關于如何改進Python方法來減小輸出大小有什么想法嗎?在
編輯:添加示例------------------
好吧,看起來SAS在管理整數方面做得更好,這是有意義的。我想這就是造成大小差異的主要原因。在
SAS系統:
xxxxxx,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2.49,40.65,63.311249.92。。。在
熊貓:
xxxxxx,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.49,40.65,63.311249.92。。。在
fetchone():
xxxxxx,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.49,40.65,63.311249.92。。。在
編輯2:解決方案---------------------------------
最后,我刪除了不必要的小數:
^{pr2}$
這使得文件大小降到SAS級別。在
總結
以上是生活随笔為你收集整理的python输出到txt文件太大的原因_如何减小Python创建的txt文件的大小?的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 菜鸟入门之火狐浏览器扩展和插件使用指南
- 下一篇: AdobePhotoshopCS3-fu