當(dāng)前位置：首頁 > 编程语言 > python >内容正文

python

Python 数据分析三剑客之 Pandas（八）：数据重塑、重复数据处理与数据替换

發(fā)布時(shí)間：2023/12/10 python 30 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析三剑客之 Pandas（八）：数据重塑、重复数据处理与数据替换小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

CSDN 課程推薦：《邁向數(shù)據(jù)科學(xué)家：帶你玩轉(zhuǎn)Python數(shù)據(jù)分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學(xué)應(yīng)用統(tǒng)計(jì)專業(yè)碩士生指導(dǎo)委員會(huì)委員；已出版《跟老齊學(xué)Python：輕松入門》《跟老齊學(xué)Python：Django實(shí)戰(zhàn)》、《跟老齊學(xué)Python：數(shù)據(jù)分析》和《Python大學(xué)實(shí)用教程》暢銷圖書。

Pandas 系列文章：

Python 數(shù)據(jù)分析三劍客之 Pandas（一）：認(rèn)識(shí) Pandas 及其 Series、DataFrame 對象
Python 數(shù)據(jù)分析三劍客之 Pandas（二）：Index 索引對象以及各種索引操作
Python 數(shù)據(jù)分析三劍客之 Pandas（三）：算術(shù)運(yùn)算與缺失值的處理
Python 數(shù)據(jù)分析三劍客之 Pandas（四）：函數(shù)應(yīng)用、映射、排序和層級索引
Python 數(shù)據(jù)分析三劍客之 Pandas（五）：統(tǒng)計(jì)計(jì)算與統(tǒng)計(jì)描述
Python 數(shù)據(jù)分析三劍客之 Pandas（六）：GroupBy 數(shù)據(jù)分裂、應(yīng)用與合并
Python 數(shù)據(jù)分析三劍客之 Pandas（七）：合并數(shù)據(jù)集
Python 數(shù)據(jù)分析三劍客之 Pandas（八）：數(shù)據(jù)重塑、重復(fù)數(shù)據(jù)處理與數(shù)據(jù)替換
Python 數(shù)據(jù)分析三劍客之 Pandas（九）：時(shí)間序列
Python 數(shù)據(jù)分析三劍客之 Pandas（十）：數(shù)據(jù)讀寫

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關(guān)注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學(xué)習(xí)資料與網(wǎng)站（博主參與部分文檔翻譯）：

NumPy 官方中文網(wǎng)：https://www.numpy.org.cn/
Pandas 官方中文網(wǎng)：https://www.pypandas.cn/
Matplotlib 官方中文網(wǎng)：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

- 【01x00】數(shù)據(jù)重塑
- - 【01x01】stack
  - 【01x02】unstack
- 【02x00】重復(fù)數(shù)據(jù)處理
- - 【02x01】duplicated
  - 【02x02】drop_duplicates
- 【03x00】數(shù)據(jù)替換
- - 【03x01】replace
  - 【03x02】where
  - 【03x03】mask

這里是一段防爬蟲文本，請讀者忽略。本文原創(chuàng)首發(fā)于 CSDN，作者 TRHX。博客首頁：https://itrhx.blog.csdn.net/ 本文鏈接：https://itrhx.blog.csdn.net/article/details/106900748 未經(jīng)授權(quán)，禁止轉(zhuǎn)載！惡意轉(zhuǎn)載，后果自負(fù)！尊重原創(chuàng)，遠(yuǎn)離剽竊！

【01x00】數(shù)據(jù)重塑

有許多用于重新排列表格型數(shù)據(jù)的基礎(chǔ)運(yùn)算。這些函數(shù)也稱作重塑（reshape）或軸向旋轉(zhuǎn)（pivot）運(yùn)算。重塑層次化索引主要有以下兩個(gè)方法：

stack：將數(shù)據(jù)的列轉(zhuǎn)換成行；
unstack：將數(shù)據(jù)的行轉(zhuǎn)換成列。

【01x01】stack

stack 方法用于將數(shù)據(jù)的列轉(zhuǎn)換成為行；

基本語法：DataFrame.stack(self, level=-1, dropna=True)

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.stack.html

參數(shù)描述

level	從列轉(zhuǎn)換到行，指定不同層級的列索引或列標(biāo)簽、由列索引或列標(biāo)簽組成的數(shù)組，默認(rèn)-1
dropna	bool 類型，是否刪除重塑后數(shù)據(jù)中所有值為 NaN 的行，默認(rèn) True

單層列（Single level columns）：

>>> import pandas as pd >>> obj = pd.DataFrame([[0, 1], [2, 3]], index=['cat', 'dog'], columns=['weight', 'height']) >>> objweight height cat 0 1 dog 2 3 >>> >>> obj.stack() cat weight 0height 1 dog weight 2height 3 dtype: int64

多層列（Multi level columns）：

>>> import pandas as pd >>> multicol = pd.MultiIndex.from_tuples([('weight', 'kg'), ('weight', 'pounds')]) >>> obj = pd.DataFrame([[1, 2], [2, 4]], index=['cat', 'dog'], columns=multicol) >>> objweight kg pounds cat 1 2 dog 2 4 >>> >>> obj.stack()weight cat kg 1pounds 2 dog kg 2pounds 4

缺失值填充：

通過 level 參數(shù)指定不同層級的軸進(jìn)行重塑：

>>> import pandas as pd >>> multicol = pd.MultiIndex.from_tuples([('weight', 'kg'), ('height', 'm')]) >>> obj = pd.DataFrame([[1.0, 2.0], [3.0, 4.0]], index=['cat', 'dog'], columns=multicol) >>> objweight heightkg m cat 1.0 2.0 dog 3.0 4.0 >>> >>> obj.stack(level=0)kg m cat height NaN 2.0weight 1.0 NaN dog height NaN 4.0weight 3.0 NaN >>> >>> obj.stack(level=1)height weight cat kg NaN 1.0m 2.0 NaN dog kg NaN 3.0m 4.0 NaN >>> >>> obj.stack(level=[0, 1]) cat height m 2.0weight kg 1.0 dog height m 4.0weight kg 3.0 dtype: float64

對于重塑后的數(shù)據(jù)，若有一行的值均為 NaN，則默認(rèn)會(huì)被刪除，可以設(shè)置 dropna=False 來保留缺失值：

>>> import pandas as pd >>> multicol = pd.MultiIndex.from_tuples([('weight', 'kg'), ('height', 'm')]) >>> obj = pd.DataFrame([[None, 1.0], [2.0, 3.0]], index=['cat', 'dog'], columns=multicol) >>> objweight heightkg m cat NaN 1.0 dog 2.0 3.0 >>> >>> obj.stack(dropna=False)height weight cat kg NaN NaNm 1.0 NaN dog kg NaN 2.0m 3.0 NaN >>> >>> obj.stack(dropna=True)height weight cat m 1.0 NaN dog kg NaN 2.0m 3.0 NaN

【01x02】unstack

unstack：將數(shù)據(jù)的行轉(zhuǎn)換成列。

基本語法：

Series.unstack(self, level=-1, fill_value=None)
DataFrame.unstack(self, level=-1, fill_value=None)

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.unstack.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.unstack.html

參數(shù)描述

level	從行轉(zhuǎn)換到列，指定不同層級的行索引，默認(rèn)-1
fill_value	用于替換 NaN 的值

在 Series 對象中的應(yīng)用：

>>> import pandas as pd >>> obj = pd.Series([1, 2, 3, 4], index=pd.MultiIndex.from_product([['one', 'two'], ['a', 'b']])) >>> obj one a 1b 2 two a 3b 4 dtype: int64 >>> >>> obj.unstack()a b one 1 2 two 3 4 >>> >>> obj.unstack(level=0)one two a 1 3 b 2 4

和 stack 方法類似，如果值不存在將會(huì)引入缺失值（NaN）：

>>> import pandas as pd >>> obj1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd']) >>> obj2 = pd.Series([4, 5, 6], index=['c', 'd', 'e']) >>> obj3 = pd.concat([obj1, obj2], keys=['one', 'two']) >>> obj3 one a 0b 1c 2d 3 two c 4d 5e 6 dtype: int64 >>> >>> obj3.unstack()a b c d e one 0.0 1.0 2.0 3.0 NaN two NaN NaN 4.0 5.0 6.0

在 DataFrame 對象中的應(yīng)用：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.arange(6).reshape((2, 3)),index=pd.Index(['Ohio','Colorado'], name='state'),columns=pd.Index(['one', 'two', 'three'],name='number')) >>> obj number one two three state Ohio 0 1 2 Colorado 3 4 5 >>> >>> obj2 = obj.stack() >>> obj2 state number Ohio one 0two 1three 2 Colorado one 3two 4three 5 dtype: int32 >>> >>> obj3 = pd.DataFrame({'left': obj2, 'right': obj2 + 5},columns=pd.Index(['left', 'right'], name='side')) >>> obj3 side left right state number Ohio one 0 5two 1 6three 2 7 Colorado one 3 8two 4 9three 5 10 >>> >>> obj3.unstack('state') side left right state Ohio Colorado Ohio Colorado number one 0 3 5 8 two 1 4 6 9 three 2 5 7 10 >>> >>> obj3.unstack('state').stack('side') state Colorado Ohio number side one left 3 0right 8 5 two left 4 1right 9 6 three left 5 2right 10 7

【02x00】重復(fù)數(shù)據(jù)處理

duplicated：判斷是否為重復(fù)值；
drop_duplicates：刪除重復(fù)值。

【02x01】duplicated

duplicated 方法可以判斷值是否為重復(fù)數(shù)據(jù)。

基本語法：

Series.duplicated(self, keep='first')
DataFrame.duplicated(self, subset: Union[Hashable, Sequence[Hashable], NoneType] = None, keep: Union[str, bool] = 'first') → ’Series’

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.duplicated.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

參數(shù)描述

keep

標(biāo)記重復(fù)項(xiàng)的方法，默認(rèn) 'first'
'first'：將非重復(fù)項(xiàng)和第一個(gè)重復(fù)項(xiàng)標(biāo)記為 False，其他重復(fù)項(xiàng)標(biāo)記為 True
'last'：將非重復(fù)項(xiàng)和最后一個(gè)重復(fù)項(xiàng)標(biāo)記為 False，其他重復(fù)項(xiàng)標(biāo)記為 True
False：將所有重復(fù)項(xiàng)標(biāo)記為 True，非重復(fù)項(xiàng)標(biāo)記為 False

subset

列標(biāo)簽或標(biāo)簽序列，在 DataFrame 對象中才有此參數(shù)，
用于指定某列，僅標(biāo)記該列的重復(fù)項(xiàng)，默認(rèn)情況下將考慮所有列

默認(rèn)情況下，對于每組重復(fù)的值，第一個(gè)出現(xiàn)的重復(fù)值標(biāo)記為 False，其他重復(fù)項(xiàng)標(biāo)記為 True，非重復(fù)項(xiàng)標(biāo)記為 False，相當(dāng)于 keep='first'：

>>> import pandas as pd >>> obj = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama']) >>> obj 0 lama 1 cow 2 lama 3 beetle 4 lama dtype: object >>> >>> obj.duplicated() 0 False 1 False 2 True 3 False 4 True dtype: bool >>> >>> obj.duplicated(keep='first') 0 False 1 False 2 True 3 False 4 True dtype: bool

設(shè)置 keep='last'，將每組非重復(fù)項(xiàng)和最后一次出現(xiàn)的重復(fù)項(xiàng)標(biāo)記為 False，其他重復(fù)項(xiàng)標(biāo)記為 True，設(shè)置 keep=False，則所有重復(fù)項(xiàng)均為 True，其他值為 False：

>>> import pandas as pd >>> obj = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama']) >>> obj 0 lama 1 cow 2 lama 3 beetle 4 lama dtype: object >>> >>> obj.duplicated(keep='last') 0 True 1 False 2 True 3 False 4 False dtype: bool >>> >>> obj.duplicated(keep=False) 0 True 1 False 2 True 3 False 4 True dtype: bool

在 DataFrame 對象中，subset 參數(shù)用于指定某列，僅標(biāo)記該列的重復(fù)項(xiàng)，默認(rèn)情況下將考慮所有列：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame({'data1' : ['a'] * 4 + ['b'] * 4,'data2' : np.random.randint(0, 4, 8)}) >>> objdata1 data2 0 a 0 1 a 0 2 a 0 3 a 3 4 b 3 5 b 3 6 b 0 7 b 2 >>> >>> obj.duplicated() 0 False 1 True 2 True 3 False 4 False 5 True 6 False 7 False dtype: bool >>> >>> obj.duplicated(subset='data1') 0 False 1 True 2 True 3 True 4 False 5 True 6 True 7 True dtype: bool >>> >>> obj.duplicated(subset='data2', keep='last') 0 True 1 True 2 True 3 True 4 True 5 False 6 False 7 False dtype: bool

【02x02】drop_duplicates

drop_duplicates 方法會(huì)返回一個(gè)刪除了重復(fù)值的序列。

基本語法：

Series.drop_duplicates(self, keep='first', inplace=False) DataFrame.drop_duplicates(self,subset: Union[Hashable, Sequence[Hashable], NoneType] = None,keep: Union[str, bool] = 'first',inplace: bool = False,ignore_index: bool = False) → Union[ForwardRef(‘DataFrame’), NoneType]

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.drop_duplicates.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html

參數(shù)描述

keep	刪除重復(fù)項(xiàng)的方法，默認(rèn) 'first' 'first'：保留非重復(fù)項(xiàng)和第一個(gè)重復(fù)項(xiàng)，其他重復(fù)項(xiàng)標(biāo)記均刪除 'last'：保留非重復(fù)項(xiàng)和最后一個(gè)重復(fù)項(xiàng)，其他重復(fù)項(xiàng)刪除 False：將所有重復(fù)項(xiàng)刪除，非重復(fù)項(xiàng)保留
inplace	是否返回刪除重復(fù)項(xiàng)后的值，默認(rèn) False，若設(shè)置為 True，則不返回值，直接改變原數(shù)據(jù)
subset	列標(biāo)簽或標(biāo)簽序列，在 DataFrame 對象中才有此參數(shù)，用于指定某列，僅標(biāo)記該列的重復(fù)項(xiàng)，默認(rèn)情況下將考慮所有列
ignore_index	bool 類型，在 DataFrame 對象中才有此參數(shù)，是否忽略原對象的軸標(biāo)記，默認(rèn) False，如果為 True，則新對象的索引將是 0, 1, 2, …, n-1

keep 參數(shù)的使用：

>>> import pandas as pd >>> obj = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'], name='animal') >>> obj 0 lama 1 cow 2 lama 3 beetle 4 lama 5 hippo Name: animal, dtype: object >>> >>> obj.drop_duplicates() 0 lama 1 cow 3 beetle 5 hippo Name: animal, dtype: object >>> >>> obj.drop_duplicates(keep='last') 1 cow 3 beetle 4 lama 5 hippo Name: animal, dtype: object >>> >>> obj.drop_duplicates(keep=False) 1 cow 3 beetle 5 hippo Name: animal, dtype: object

如果設(shè)置 inplace=True，則不會(huì)返回任何值，但原對象的值已被改變：

>>> import pandas as pd >>> obj1 = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama', 'hippo'], name='animal') >>> obj1 0 lama 1 cow 2 lama 3 beetle 4 lama 5 hippo Name: animal, dtype: object >>> >>> obj2 = obj1.drop_duplicates() >>> obj2 # 有返回值 0 lama 1 cow 3 beetle 5 hippo Name: animal, dtype: object >>> >>> obj3 = obj1.drop_duplicates(inplace=True) >>> obj3 # 無返回值 >>> >>> obj1 # 原對象的值已改變 0 lama 1 cow 3 beetle 5 hippo Name: animal, dtype: object

在 DataFrame 對象中的使用：

>>> import numpy as np >>> import pandas as pd >>> obj = pd.DataFrame({'data1' : ['a'] * 4 + ['b'] * 4,'data2' : np.random.randint(0, 4, 8)}) >>> objdata1 data2 0 a 2 1 a 1 2 a 1 3 a 2 4 b 1 5 b 2 6 b 0 7 b 0 >>> >>> obj.drop_duplicates()data1 data2 0 a 2 1 a 1 4 b 1 5 b 2 6 b 0 >>> >>> obj.drop_duplicates(subset='data2')data1 data2 0 a 2 1 a 1 6 b 0 >>> >>> obj.drop_duplicates(subset='data2', ignore_index=True)data1 data2 0 a 2 1 a 1 2 b 0

【03x00】數(shù)據(jù)替換

【03x01】replace

replace 方法可以根據(jù)值的內(nèi)容進(jìn)行替換。

基本語法：

Series.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
DataFrame.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.replace.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html

常用參數(shù)：

參數(shù)描述

to_replace	找到要替換值的方法，可以是：字符串、正則表達(dá)式、列表、字典、整數(shù)、浮點(diǎn)數(shù)、Series 對象或者 None 使用不同參數(shù)的區(qū)別參見官方文檔
value	用于替換匹配項(xiàng)的值，對于 DataFrame，可以使用字典的值來指定每列要使用的值，還允許使用此類對象的正則表達(dá)式，字符串和列表或字典
inplace	bool 類型，是否直接改變原數(shù)據(jù)且不返回值，默認(rèn) False
regex	bool 類型或者與 to_replace 相同的類型，當(dāng) to_replace 參數(shù)為正則表達(dá)式時(shí)，regex 應(yīng)為 True，或者直接使用該參數(shù)代替 to_replace

to_replace 和 value 參數(shù)只傳入一個(gè)值，單個(gè)值替換單個(gè)值：

>>> import pandas as pd >>> obj = pd.Series([0, 1, 2, 3, 4]) >>> obj 0 0 1 1 2 2 3 3 4 4 dtype: int64 >>> >>> obj.replace(0, 5) 0 5 1 1 2 2 3 3 4 4 dtype: int64

to_replace 傳入多個(gè)值，value 傳入一個(gè)值，多個(gè)值替換一個(gè)值：

>>> import pandas as pd >>> obj = pd.Series([0, 1, 2, 3, 4]) >>> obj 0 0 1 1 2 2 3 3 4 4 dtype: int64 >>> >>> obj.replace([0, 1, 2, 3], 4) 0 4 1 4 2 4 3 4 4 4 dtype: int64

to_replace 和 value 參數(shù)都傳入多個(gè)值，多個(gè)值替換多個(gè)值：

>>> import pandas as pd >>> obj = pd.Series([0, 1, 2, 3, 4]) >>> obj 0 0 1 1 2 2 3 3 4 4 dtype: int64 >>> >>> obj.replace([0, 1, 2, 3], [4, 3, 2, 1]) 0 4 1 3 2 2 3 1 4 4 dtype: int64

to_replace 傳入字典：

>>> import pandas as pd >>> obj = pd.DataFrame({'A': [0, 1, 2, 3, 4],'B': [5, 6, 7, 8, 9],'C': ['a', 'b', 'c', 'd', 'e']}) >>> objA B C 0 0 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e >>> >>> obj.replace(0, 5)A B C 0 5 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e >>> >>> obj.replace({0: 10, 1: 100})A B C 0 10 5 a 1 100 6 b 2 2 7 c 3 3 8 d 4 4 9 e >>> >>> obj.replace({'A': 0, 'B': 5}, 100)A B C 0 100 100 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e >>> obj.replace({'A': {0: 100, 4: 400}})A B C 0 100 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 400 9 e

to_replace 傳入正則表達(dá)式：

>>> import pandas as pd >>> obj = pd.DataFrame({'A': ['bat', 'foo', 'bait'],'B': ['abc', 'bar', 'xyz']}) >>> objA B 0 bat abc 1 foo bar 2 bait xyz >>> >>> obj.replace(to_replace=r'^ba.$', value='new', regex=True)A B 0 new abc 1 foo new 2 bait xyz >>> >>> obj.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)A B 0 new abc 1 foo bar 2 bait xyz >>> >>> obj.replace(regex=r'^ba.$', value='new')A B 0 new abc 1 foo new 2 bait xyz >>> >>> obj.replace(regex={r'^ba.$': 'new', 'foo': 'xyz'})A B 0 new abc 1 xyz new 2 bait xyz >>> >>> obj.replace(regex=[r'^ba.$', 'foo'], value='new')A B 0 new abc 1 new new 2 bait xyz

【03x02】where

where 方法用于替換條件為 False 的值。

基本語法：

Series.where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
DataFrame.where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.where.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html

常用參數(shù)：

參數(shù)描述

cond	替換條件，如果 cond 為 True，則保留原始值。如果為 False，則替換為來自 other 的相應(yīng)值
other	替換值，如果 cond 為 False，則替換為來自該參數(shù)的相應(yīng)值
inplace	bool 類型，是否直接改變原數(shù)據(jù)且不返回值，默認(rèn) False

在 Series 中的應(yīng)用：

>>> import pandas as pd >>> obj = pd.Series(range(5)) >>> obj 0 0 1 1 2 2 3 3 4 4 dtype: int64 >>> >>> obj.where(obj > 0) 0 NaN 1 1.0 2 2.0 3 3.0 4 4.0 dtype: float64 >>> >>> obj.where(obj > 1, 10) 0 10 1 10 2 2 3 3 4 4 dtype: int64

在 DataFrame 中的應(yīng)用：

>>> import pandas as pd >>> obj = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B']) >>> objA B 0 0 1 1 2 3 2 4 5 3 6 7 4 8 9 >>> >>> m = obj % 3 == 0 >>> obj.where(m, -obj)A B 0 0 -1 1 -2 3 2 -4 -5 3 6 -7 4 -8 9 >>> >>> obj.where(m, -obj) == np.where(m, obj, -obj)A B 0 True True 1 True True 2 True True 3 True True 4 True True

【03x03】mask

mask 方法與 where 方法相反，mask 用于替換條件為 False 的值。

基本語法：

Series.mask(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
DataFrame.mask(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.mask.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mask.html

常用參數(shù)：

參數(shù)描述

cond	替換條件，如果 cond 為 False，則保留原始值。如果為 True，則替換為來自 other 的相應(yīng)值
other	替換值，如果 cond 為 False，則替換為來自該參數(shù)的相應(yīng)值
inplace	bool 類型，是否直接改變原數(shù)據(jù)且不返回值，默認(rèn) False

在 Series 中的應(yīng)用：

>>> import pandas as pd >>> obj = pd.Series(range(5)) >>> obj 0 0 1 1 2 2 3 3 4 4 dtype: int64 >>> >>> obj.mask(obj > 0) 0 0.0 1 NaN 2 NaN 3 NaN 4 NaN dtype: float64 >>> >>> obj.mask(obj > 1, 10) 0 0 1 1 2 10 3 10 4 10 dtype: int64

在 DataFrame 中的應(yīng)用：

>>> import pandas as pd >>> obj = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B']) >>> objA B 0 0 1 1 2 3 2 4 5 3 6 7 4 8 9 >>> >>> m = obj % 3 == 0 >>> >>> obj.mask(m, -obj)A B 0 0 1 1 2 -3 2 4 5 3 -6 7 4 8 -9 >>> >>> obj.where(m, -obj) == obj.mask(~m, -obj)A B 0 True True 1 True True 2 True True 3 True True 4 True True

總結(jié)

以上是生活随笔為你收集整理的Python 数据分析三剑客之 Pandas（八）：数据重塑、重复数据处理与数据替换的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： 12.4英寸2K屏：曝国行版小米首款二合
下一篇：连续40℃的“可南”要变回河南了：今晚起