python csv文件复制时的编码问题_使用python读取CSV文件时的编码问题
嘗試使用python讀取CSV文件時遇到障礙。
更新:如果只想跳過字符或錯誤,可以打開文件,如下所示:
with open(os.path.join(directory, file), 'r', encoding="utf-8", errors="ignore") as data_file:
到目前為止,我已經嘗試過了。
for directory, subdirectories, files in os.walk(root_dir):
for file in files:
with open(os.path.join(directory, file), 'r') as data_file:
reader = csv.reader(data_file)
for row in reader:
print (row)
我得到的錯誤是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to
我試過了
with open(os.path.join(directory, file), 'r', encoding="UTF-8") as data_file:
錯誤:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 223: character maps to
現在,如果我只打印data_file,它說它們是cp1252編碼的,但是如果我嘗試
with open(os.path.join(directory, file), 'r', encoding="cp1252") as data_file:
我得到的錯誤是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to
我也嘗試了推薦的套餐。
我得到的錯誤是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to
我要解析的行是:
2015-11-28 22:23:58,670805374291832832,479174464,"MarkCrawford15","RT @WhatTheFFacts: The tallest man in the world was Robert Pershing Wadlow of Alton, Illinois. He was slighty over 8 feet 11 inches tall.","None
任何想法或幫助表示贊賞。
解決方案
我將使用csvkit,它使用自動檢測適當的編碼和解碼。例如
import csvkit
reader = csvkit.reader(data_file)
正如聊天解決方案所述,
for directory, subdirectories, files in os.walk(root_dir):
for file in files:
with open(os.path.join(directory, file), 'r', encoding="utf-8") as data_file:
reader = csv.reader(data_file)
for row in reader:
data = [i.encode('ascii', 'ignore').decode('ascii') for i in row]
print (data)
總結
以上是生活随笔為你收集整理的python csv文件复制时的编码问题_使用python读取CSV文件时的编码问题的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java反射 Method
- 下一篇: oracle groupq by,ora