【Python】Docx解析
生活随笔
收集整理的這篇文章主要介紹了
【Python】Docx解析
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
1、cd D:\ProgramData\Anaconda3
2、pip install python-docx
3、python代碼處理
# -*- coding: utf-8 -*-import os import docx from win32com import client as wcdocs = []def traverse(f):fs = os.listdir(f)for f1 in fs:tmp_path = os.path.join(f,f1)if not os.path.isdir(tmp_path):#print('文件: %s'%tmp_path)if os.path.splitext(tmp_path)[-1].lower() == ".doc" or os.path.splitext(tmp_path)[-1].lower() == ".docx":#print('文件: %s'%tmp_path) docs.append(tmp_path)else:#print('文件夾:%s'%tmp_path) traverse(tmp_path)def parseDoc(f):doc = docx.Document(f)parag_num = 0for para in doc.paragraphs :print("----------------------------------------------------")print(para.text)print("----------------------------------------------------")parag_num += 1 print ('This document has ', parag_num, ' paragraphs')def doc2docx(full_path):#dirname = os.path.dirname(full_path)#filename = os.path.basename(full_path)#newpath = full_path.replace('doc','docx')newpath = full_path + "x"if os.path.exists(newpath):return# 首先將doc轉(zhuǎn)換成docxword = wc.Dispatch("Word.Application")# 找到word路徑 + 文件名 ,即可打開文件 doc = word.Documents.Open(full_path)# 使用參數(shù)16表示將doc轉(zhuǎn)換成docx,保存成docx后才能 讀文件doc.SaveAs(newpath,16)doc.Close()word.Quit()path = 'E:/NLP/Docs/'traverse(path)for k,v in enumerate(docs):if k < 1:print(k,v)parseDoc(v)#doc2docx(v)?
總結(jié)
以上是生活随笔為你收集整理的【Python】Docx解析的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 12 Django cooking与se
- 下一篇: SpringBoot中@Property