當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

Python之数据合并——【concat()函数、merge()函数、join()方法、combine_first()方法】

發(fā)布時間：2023/12/13 python 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python之数据合并——【concat()函数、merge()函数、join()方法、combine_first()方法】小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

文章目錄

軸向堆疊數(shù)據(jù)——concat()函數(shù)
- 橫向堆疊與外連接
- 縱向堆疊與內(nèi)連接
主鍵合并數(shù)據(jù)——merge()函數(shù)
- 內(nèi)連接方式
- 外連接方式
- 左連接方式
- 右連接方式
- 其他
根據(jù)行索引合并數(shù)據(jù)——join()方法
- 四種連接方式
- 行索引與列索引重疊
合并重疊數(shù)據(jù)——combine_first()方法

軸向堆疊數(shù)據(jù)——concat()函數(shù)

pandas.concat(
objs: Union[
Iterable[FrameOrSeriesUnion], Mapping[Optional[Hashable], FrameOrSeriesUnion]
],
axis=0,
join=“outer”,
ignore_index: bool = False,
keys=None,
levels=None,
names=None,
verify_integrity: bool = False,
sort: bool = False,
copy: bool = True,
)

上述函數(shù)中常用參數(shù)表示的含義如下：

join：表示連接的方式，inner表示內(nèi)連接，outer表示外連接，默認(rèn)使用外連接。

ignore_index：接受布爾值，默認(rèn)為False。如果設(shè)置為True，則表示清除現(xiàn)有索引并重置索引值。

keys：接受序列，表示添加最外層索引。

levels：用于構(gòu)建MultiIndex的特定級別（唯一值）

names：在設(shè)置了keys和levels參數(shù)后，用于創(chuàng)建分層級別的名稱

verify_integerity：檢查新的連接軸是否包含重復(fù)項。接收布爾值，當(dāng)設(shè)置為True時，如果有重復(fù)的軸將會拋出錯誤，默認(rèn)為False

根據(jù)軸方向的不同（axis參數(shù)），可以將堆疊分成橫向堆疊和縱向堆疊，默認(rèn)采用的是縱向堆疊方式。在堆疊數(shù)據(jù)時，默認(rèn)采用的是外連接，(join參數(shù)設(shè)為outer)的方式。

橫向堆疊與外連接

使用concat()函數(shù)合并時，若是將axis參數(shù)的值設(shè)為1，且join參數(shù)的值設(shè)為outer，則合并方式為橫向堆疊與外連接。

測試對象：

left:A B a A0 B0 b A1 B1 right:C D c C0 D0 d C1 D1

代碼：

left = pd.DataFrame({'A': ['A0', 'A1'],'B': ['B0', 'B1']},index=['a', 'b']) right = pd.DataFrame({'C': ['C0', 'C1'],'D': ['D0', 'D1']},index=['c', 'd']) print(pd.concat([left, right], join='outer', axis=1))

輸出結(jié)果：

A B C D a A0 B0 NaN NaN b A1 B1 NaN NaN c NaN NaN C0 D0 d NaN NaN C1 D1

使用concat()函數(shù)合并之后產(chǎn)生的不存在的數(shù)據(jù)將用NaN進(jìn)行填充。

縱向堆疊與內(nèi)連接

使用concat()函數(shù)合并時，若是將axis參數(shù)的值設(shè)為0，且join參數(shù)的值設(shè)為inner，則合并方式為縱向堆疊與內(nèi)連接。

測試對象：

df1:A B C 0 A0 B0 C0 1 A1 B1 C1 2 A2 B2 C2 df2:B C D 0 B3 C3 D3 1 B4 C4 D4 2 B5 C5 D5

代碼：

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'C': ['C0', 'C1', 'C2']}) df2 = pd.DataFrame({'B': ['B3', 'B4', 'B5'],'C': ['C3', 'C4', 'C5'],'D': ['D3', 'D4', 'D5']}) print(pd.concat([df1, df2], join='inner', axis=0))

輸出結(jié)果

B C 0 B0 C0 1 B1 C1 2 B2 C2 0 B3 C3 1 B4 C4 2 B5 C5

主鍵合并數(shù)據(jù)——merge()函數(shù)

主鍵合并根據(jù)一個或多個鍵將不同的DaraFrame對象連接起來，大多數(shù)是將兩個DataFrame對象中重疊的列作為合并的鍵。

merge(
left,
right,
how: str = “inner”,
on=None,
left_on=None,
right_on=None,
left_index: bool = False,
right_index: bool = False,
sort: bool = False,
suffixes=("_x", “_y”),
copy: bool = True,
indicator: bool = False,
validate=None,
)

上述函數(shù)中部分參數(shù)表示的含義如下：

left：參與合并的的左側(cè)DataFrame對象

right：參與合并的的右側(cè)DataFrame對象

how：表示連接方式，默認(rèn)為inner，該參數(shù)支持以下的取值：
· left：使用左側(cè)的DataFrame的鍵，類似于SQL的左外連接
· right：使用右側(cè)的DataFrame的鍵，類似于SQL的右外連接
· outer：使用兩個DataFrame所有的鍵，類似于SQL的全連接
· inner：使用兩個DataFrame鍵的交集，類似于SQL的內(nèi)連接

on：用于連接的列名。必須存在于左右兩個DataFrame對象中

left_on：以左側(cè)的DataFrame作為連接鍵

right_on：以右側(cè)的DataFrame作為連接鍵

left_index：左側(cè)的行索引用作連接鍵

right_index：右側(cè)的行索引用作連接鍵

sort：是否排序，接受布爾值，默認(rèn)為False

suffixes：用于追加都重疊列名的末尾，默認(rèn)為(_x,_y)

內(nèi)連接方式

默認(rèn)采用how=inner的方式合并

測試對象：

df1:A B C 0 A0 B0 C0 1 A1 B1 C1 2 A2 B2 C2 df3:B C D 0 B0 C0 D3 1 B2 C2 D4 2 B4 C4 D5

代碼：

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'C': ['C0', 'C1', 'C2']}) df3 = pd.DataFrame({'B': ['B0', 'B2', 'B4'],'C': ['C0', 'C2', 'C4'],'D': ['D3', 'D4', 'D5']}) print("pd.merge:\n", pd.merge(df1, df3, on=['B', 'C']))

輸出結(jié)果：

pd.merge:A B C D 0 A0 B0 C0 D3 1 A2 B2 C2 D4

外連接方式

外連接方式（how=outer）：left與right列中相同的數(shù)據(jù)將會重疊，沒有數(shù)據(jù)的位置使用NaN進(jìn)行填充。

測試對象：

df1:A B C 0 A0 B0 C0 1 A1 B1 C1 2 A2 B2 C2 df3:B C D 0 B0 C0 D3 1 B2 C2 D4 2 B4 C4 D5

代碼：

print("pd.merge(how=outer):\n", pd.merge(df1, df3, on=['B', 'C'], how='outer'))

輸出結(jié)果：

pd.merge(how=outer):A B C D 0 A0 B0 C0 D3 1 A1 B1 C1 NaN 2 A2 B2 C2 D4 3 NaN B4 C4 D5

左連接方式

左連接方式（how=left）：以左表作為基準(zhǔn)進(jìn)行連接，left表中的數(shù)據(jù)會全部顯示，right表中只會顯示與重疊數(shù)據(jù)行索引值相同的數(shù)據(jù)，合并后表中缺失的數(shù)據(jù)會使用NaN進(jìn)行填充。

測試對象：

df1:A B C 0 A0 B0 C0 1 A1 B1 C1 2 A2 B2 C2 df3:B C D 0 B0 C0 D3 1 B2 C2 D4 2 B4 C4 D5

代碼：

print("pd.merge(how=left):\n", pd.merge(df1, df3, on=['B', 'C'], how='left'))

輸出結(jié)果：

pd.merge(how=left):A B C D 0 A0 B0 C0 D3 1 A1 B1 C1 NaN 2 A2 B2 C2 D4

右連接方式

右連接方式（how=left）：以右表作為基準(zhǔn)進(jìn)行連接，right表中的數(shù)據(jù)會全部顯示，left表中只會顯示與重疊數(shù)據(jù)行索引值相同的數(shù)據(jù)，合并后表中缺失的數(shù)據(jù)會使用NaN進(jìn)行填充。

測試對象：

df1:A B C 0 A0 B0 C0 1 A1 B1 C1 2 A2 B2 C2 df3:B C D 0 B0 C0 D3 1 B2 C2 D4 2 B4 C4 D5

代碼：

print("pd.merge(how=right):\n", pd.merge(df1, df3, on=['B', 'C'], how='right'))

測試結(jié)果：

pd.merge(how=right):A B C D 0 A0 B0 C0 D3 1 A2 B2 C2 D4 2 NaN B4 C4 D5

其他

即使兩張表中的行索引與列索引均沒有重疊的部分，也可以使用merge()函數(shù)來合并。只需要將參數(shù)left_index和right_index的值設(shè)置為True即可。

測試對象

left:A B a A0 B0 b A1 B1 right:C D c C0 D0 d C1 D1

代碼：

print("pd.merge(left_index=right_index=True):\n",pd.merge(left, right, how='outer', left_index=True, right_index=True))

輸出結(jié)果：

A B C D a A0 B0 NaN NaN b A1 B1 NaN NaN c NaN NaN C0 D0 d NaN NaN C1 D1

根據(jù)行索引合并數(shù)據(jù)——join()方法

join(self, other, on=None, how=“l(fā)eft”, lsuffix="", rsuffix="", sort=False)

上述方法常用參數(shù)表示的含義如下：

on：用于連接列名

how：可從{‘left’，‘right’，‘outer’，‘inner’}中任選一個，默認(rèn)使用left的方式

lsuffix：接受字符串，用于在左側(cè)重疊的列名后添加后綴名

rsuffix：接受字符串，用于在右側(cè)重疊的列名后添加后綴名

sort：接受布爾值，根據(jù)連接鍵對合并的數(shù)據(jù)進(jìn)行排序，默認(rèn)為False

四種連接方式

測試對象：

data1:A B C a A0 B0 C0 b A1 B1 C1 c A2 B2 C2 data2:B C D b B1 C1 D1 c B2 C2 D2 d B3 C3 D3

代碼：

data1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'C': ['C0', 'C1', 'C2']},index=['a', 'b', 'c']) data2 = pd.DataFrame({'B': ['B1', 'B2', 'B3'],'C': ['C1', 'C2', 'C3'],'D': ['D1', 'D2', 'D3']},index=['b', 'c', 'd']) print("data1.join(data2, how='outer', lsuffix='one'):\n",data1.join(data2, how='outer', lsuffix='one')) print("data1.join(data2, how='inner', rsuffix='two'):\n", data1.join(data2, how='inner', rsuffix='two')) print("data1.join(data2, how='left', lsuffix='one'):\n", data1.join(data2, how='left', lsuffix='one')) print("data1.join(data2, how='right', rsuffix='two'):\n", data1.join(data2, how='right', rsuffix='two'))

輸出結(jié)果：

data1.join(data2, how='outer', lsuffix='one'):A Bone Cone B C D a A0 B0 C0 NaN NaN NaN b A1 B1 C1 B1 C1 D1 c A2 B2 C2 B2 C2 D2 d NaN NaN NaN B3 C3 D3 data1.join(data2, how='inner', rsuffix='two'):A B C Btwo Ctwo D b A1 B1 C1 B1 C1 D1 c A2 B2 C2 B2 C2 D2 data1.join(data2, how='left', lsuffix='one'):A Bone Cone B C D a A0 B0 C0 NaN NaN NaN b A1 B1 C1 B1 C1 D1 c A2 B2 C2 B2 C2 D2 data1.join(data2, how='right', rsuffix='two'):A B C Btwo Ctwo D b A1 B1 C1 B1 C1 D1 c A2 B2 C2 B2 C2 D2 d NaN NaN NaN B3 C3 D3

行索引與列索引重疊

測試對象：

join1:A B key 0 A0 B0 K0 1 A1 B1 K1 2 A2 B2 K2 join2:C D K0 C0 D0 K1 C1 D1 K2 C2 D2

代碼：

join1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],'B': ['B0', 'B1', 'B2'],'key': ['K0', 'K1', 'K2']}) join2 = pd.DataFrame({'C': ['C0', 'C1', 'C2'],'D': ['D0', 'D1', 'D2']},index=['K0', 'K1', 'K2']) print("join1.join(join2, on='key'):\n", join1.join(join2, on='key'))

輸出結(jié)果：

join1.join(join2, on='key'):A B key C D 0 A0 B0 K0 C0 D0 1 A1 B1 K1 C1 D1 2 A2 B2 K2 C2 D2

合并重疊數(shù)據(jù)——combine_first()方法

使用combine_first()方法合并兩個DataFrame對象時，必須確保它們的行索引和列索引有重疊的部分。

combine_first(self, other: “DataFrame”)

上述方法中只有一個參數(shù)other，該參數(shù)用于接收填充缺失值的DataFrame對象。

測試對象：

test1:A B 0 NaN B0 1 A1 NaN 2 A2 B2 3 A3 NaN test2:A B 1 C0 D0 0 C1 D1 2 C2 D2

代碼：

test1 = pd.DataFrame({'A': [np.nan, 'A1', 'A2', 'A3'],'B': ['B0', np.nan, 'B2', np.nan]}) test2 = pd.DataFrame({'A': ['C0', 'C1', 'C2'],'B': ['D0', 'D1', 'D2']},index=[1, 0, 2]) print("test1.combine_first(test2):\n", test1.combine_first(test2))

輸出結(jié)果：

test1.combine_first(test2):A B 0 C1 B0 1 A1 D0 2 A2 B2 3 A3 NaN

從上可知，盡管test2表中的行索引與test1表的行索引順序不同，當(dāng)用test2表的數(shù)據(jù)替換test1表的NaN值時，替換數(shù)據(jù)與缺失數(shù)據(jù)的索引位置仍然是相同的。例如，test1表中位于第0行第A列的“NaN”需要使用test2表中相同位置的數(shù)據(jù)“C1"來替換。

總結(jié)

以上是生活随笔為你收集整理的Python之数据合并——【concat()函数、merge()函数、join()方法、combine_first()方法】的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：《釜山行》导演新作《念力》首曝预告超人
下一篇：幻想游戏5.0豪华版正式发布!