當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

PyPDF2 | 利用 Python 实现 PDF 分割

發(fā)布時(shí)間：2025/3/15 python 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 PyPDF2 | 利用 Python 实现 PDF 分割小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

1. PDF 分割

由于疫情影響被迫在家上網(wǎng)課，因此教材也只能用電子版。但有一門教材是對開的掃描版，導(dǎo)致在 iPad 上閱讀很不友好，因此決定尋找一個(gè)工具將 PDF 對半分開。

圖1 分割前的 PDF

在百度了一番后，發(fā)現(xiàn)大多都是使用 Adobe Acrobat 軟件進(jìn)行剪裁，這完全不 Pythonic，因此又找了用 Python 處理 PDF 文件的方法，最后發(fā)現(xiàn)了 PyPDF2 這個(gè)庫，本文將利用這個(gè)庫，實(shí)現(xiàn)對 PDF 的分割。

首先，你需要通過 pip 安裝這個(gè)庫：

pip install PyPDF2

實(shí)現(xiàn)切割 PDF 的思想很簡單，只要我們能測量出 PDF 的長寬，接著分別將左右裁剪拼接即可，而 PyPDF2 已經(jīng)提供了這些功能：

# PdfFileReader 模塊用于讀取 pdf # PdfFileWriter 模塊用于創(chuàng)建要保存的 pdf from PyPDF2 import PdfFileReader, PdfFileWriter# 1. 讀取 pdf pdf_input = PdfFileReader(open('xxx.pdf', 'rb'))# 2. 創(chuàng)建要保存的 pdf 對象 pdf_output = PdfFileWriter()# 3. 選取第一頁 pdf 讀取長寬 page = pdf_input_left.getPage(0) width = float(page.mediaBox.getWidth()) height = float(page.mediaBox.getHeight())# 4. 計(jì)算 pdf 的總頁數(shù) page_count = pdf_input_left.getNumPages()# 5. 修改某一頁 pdf 的尺寸 page = pdf_input.getPage(i) page.mediaBox.lowerLeft = (x,y) page.mediaBox.lowerRight = (x,y) page.mediaBox.upperLeft = (x,y) page.mediaBox.upperRight = (x,y)# 6. 將修改好的 pdf 添加到我們要輸出的文件中 pdf_output.addPage(page)# 7. 循環(huán)所有的頁數(shù)后，將文件輸出為 pdf 文件 pdf_output.write(open('xxx,pdf', 'wb'))

需要注意的是，PyPDF2 默認(rèn)將較短的邊作為 X 軸，較長的邊作為 Y 軸，對應(yīng)的坐標(biāo)如下：

圖2 縱向比例下的 PyPDF2 坐標(biāo)

然而我們的 PDF 是橫向比例的，如下圖所示：

圖3 橫向比例 PDF 示例

相當(dāng)于：

圖4 橫向比例下的 PyPDF2 坐標(biāo)

即：

圖5 旋轉(zhuǎn)后的橫向比例下的 PyPDF2 坐標(biāo)

要注意與圖 1 坐標(biāo)的區(qū)別。

在弄清楚了 PyPDF 的坐標(biāo)后，我們就可以通過調(diào)整四個(gè)角的坐標(biāo)來分別獲得左右兩個(gè) PDF 了，對于左邊的 PDF，其對應(yīng)的坐標(biāo)為：

圖6 左半圖的 PyPDF2 坐標(biāo)

因此坐標(biāo)設(shè)置如下：

page_left.mediaBox.lowerLeft = (0, height/2) page_left.mediaBox.lowerRight = (width, height/2) page_left.mediaBox.upperLeft = (0, height) page_left.mediaBox.upperRight = (width, height)

而右半圖的坐標(biāo)為：

圖7 右半圖的 PyPDF2 坐標(biāo)

對應(yīng)的坐標(biāo)設(shè)置為：

page_right.mediaBox.lowerLeft = (0, 0) page_right.mediaBox.lowerRight = (width, 0) page_right.mediaBox.upperLeft = (0, height/2) page_right.mediaBox.upperRight = (width, height/2)

最后匯總得：

from PyPDF2 import PdfFileReader, PdfFileWriterinfile = '應(yīng)用多元統(tǒng)計(jì)分析高惠璇.pdf' outfile = '應(yīng)用多元統(tǒng)計(jì)分析高惠璇 split.pdf'pdf_input_left = PdfFileReader(open(infile, 'rb')) pdf_input_right = PdfFileReader(open(infile, 'rb')) pdf_output = PdfFileWriter()page = pdf_input_left.getPage(0) width = float(page.mediaBox.getWidth()) height = float(page.mediaBox.getHeight()) page_count = pdf_input_left.getNumPages()for i in range(page_count):# left pagepage_left = pdf_input_left.getPage(i)page_left.mediaBox.lowerLeft = (0, height/2)page_left.mediaBox.lowerRight = (width, height/2)page_left.mediaBox.upperLeft = (0, height)page_left.mediaBox.upperRight = (width, height)pdf_output.addPage(page_left)# right pagepage_right = pdf_input_right.getPage(i)page_right.mediaBox.lowerLeft = (0, 0)page_right.mediaBox.lowerRight = (width, 0)page_right.mediaBox.upperLeft = (0, height/2)page_right.mediaBox.upperRight = (width, height/2)pdf_output.addPage(page_right)pdf_output.write(open(outfile, 'wb'))

看下轉(zhuǎn)換效果，Bingo！

圖8 轉(zhuǎn)換后的 PDF 效果

2. 調(diào)整邊緣

轉(zhuǎn)換后發(fā)現(xiàn)，PDF 存在這黑邊，因此我們可以通過調(diào)整對應(yīng)的坐標(biāo)來減少黑邊的現(xiàn)象：

圖9 PDF 黑邊 from PyPDF2 import PdfFileReader, PdfFileWriter def pdf_split(infile, outfile, left_margin=0, right_margin=0, down_margin=0):pdf_input_left = PdfFileReader(open(infile, 'rb')) # 讀取切割為左邊的 pdfpdf_input_right = PdfFileReader(open(infile, 'rb')) # 讀取切割為右邊的 pdfpdf_output = PdfFileWriter() # 定義要保存的 pdfpage = pdf_input_left.getPage(0) # 選取第一頁來讀取 pdf 的長寬width = float(page.mediaBox.getWidth())height = float(page.mediaBox.getHeight())page_count = pdf_input_left.getNumPages() # 讀取 pdf 頁數(shù)for i in range(page_count):# 切割左邊 pdfpage_left = pdf_input_left.getPage(i)page_left.mediaBox.lowerLeft = (0, height/2)page_left.mediaBox.lowerRight = (width, height/2)page_left.mediaBox.upperLeft = (down_margin, height-left_margin)page_left.mediaBox.upperRight = (width, height-left_margin)pdf_output.addPage(page_left)# 切割右邊 pdfpage_right = pdf_input_right.getPage(i)page_right.mediaBox.lowerLeft = (down_margin, right_margin)page_right.mediaBox.lowerRight = (width, right_margin)page_right.mediaBox.upperLeft = (down_margin, height/2)page_right.mediaBox.upperRight = (width, height/2)pdf_output.addPage(page_right)pdf_output.write(open(outfile, 'wb')) # 保存 pdfprint('Done!') infile = '應(yīng)用多元統(tǒng)計(jì)分析高惠璇.pdf' outfile = '應(yīng)用多元統(tǒng)計(jì)分析高惠璇 split.pdf' left_margin=10 right_margin=10 down_margin = 20 pdf_split(infile, outfile, left_margin, right_margin, down_margin) Done!

看下最后效果：

圖10 調(diào)整后的 PDF 黑邊情況

其他文章推薦

機(jī)器學(xué)習(xí)算法與 Python 實(shí)現(xiàn)專欄

SQL 入門教程專欄

總結(jié)

以上是生活随笔為你收集整理的PyPDF2 | 利用 Python 实现 PDF 分割的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

Python
PDF

上一篇：学习Matlab强大的符号计算（解方程）
下一篇： Gedit快捷键