當前位置：首頁 > 编程语言 > python >内容正文

python

Python 数据分析三剑客之 Pandas（四）：函数应用、映射、排序和层级索引

發(fā)布時間：2023/12/10 python 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 Python 数据分析三剑客之 Pandas（四）：函数应用、映射、排序和层级索引小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

CSDN 課程推薦：《邁向數(shù)據(jù)科學家：帶你玩轉(zhuǎn)Python數(shù)據(jù)分析》，講師齊偉，蘇州研途教育科技有限公司CTO，蘇州大學應(yīng)用統(tǒng)計專業(yè)碩士生指導(dǎo)委員會委員；已出版《跟老齊學Python：輕松入門》《跟老齊學Python：Django實戰(zhàn)》、《跟老齊學Python：數(shù)據(jù)分析》和《Python大學實用教程》暢銷圖書。

Pandas 系列文章：

Python 數(shù)據(jù)分析三劍客之 Pandas（一）：認識 Pandas 及其 Series、DataFrame 對象
Python 數(shù)據(jù)分析三劍客之 Pandas（二）：Index 索引對象以及各種索引操作
Python 數(shù)據(jù)分析三劍客之 Pandas（三）：算術(shù)運算與缺失值的處理
Python 數(shù)據(jù)分析三劍客之 Pandas（四）：函數(shù)應(yīng)用、映射、排序和層級索引
Python 數(shù)據(jù)分析三劍客之 Pandas（五）：統(tǒng)計計算與統(tǒng)計描述
Python 數(shù)據(jù)分析三劍客之 Pandas（六）：GroupBy 數(shù)據(jù)分裂、應(yīng)用與合并
Python 數(shù)據(jù)分析三劍客之 Pandas（七）：合并數(shù)據(jù)集
Python 數(shù)據(jù)分析三劍客之 Pandas（八）：數(shù)據(jù)重塑、重復(fù)數(shù)據(jù)處理與數(shù)據(jù)替換
Python 數(shù)據(jù)分析三劍客之 Pandas（九）：時間序列
Python 數(shù)據(jù)分析三劍客之 Pandas（十）：數(shù)據(jù)讀寫

另有 NumPy、Matplotlib 系列文章已更新完畢，歡迎關(guān)注：

NumPy 系列文章：https://itrhx.blog.csdn.net/category_9780393.html
Matplotlib 系列文章：https://itrhx.blog.csdn.net/category_9780418.html

推薦學習資料與網(wǎng)站（博主參與部分文檔翻譯）：

NumPy 官方中文網(wǎng)：https://www.numpy.org.cn/
Pandas 官方中文網(wǎng)：https://www.pypandas.cn/
Matplotlib 官方中文網(wǎng)：https://www.matplotlib.org.cn/
NumPy、Matplotlib、Pandas 速查表：https://github.com/TRHX/Python-quick-reference-table

文章目錄

- 【01x00】函數(shù)應(yīng)用和映射
- 【02x00】排序
- - 【02x01】sort_index() 索引排序
  - 【02x02】sort_values() 按值排序
  - 【02x03】rank() 返回排序后元素索引
- 【03x00】層級索引
- - 【03x01】認識層級索引
  - 【03x02】MultiIndex 索引對象
  - 【03x03】提取值
  - 【03x04】交換分層與排序

這里是一段防爬蟲文本，請讀者忽略。本文原創(chuàng)首發(fā)于 CSDN，作者 TRHX。博客首頁：https://itrhx.blog.csdn.net/ 本文鏈接：https://itrhx.blog.csdn.net/article/details/106758103 未經(jīng)授權(quán)，禁止轉(zhuǎn)載！惡意轉(zhuǎn)載，后果自負！尊重原創(chuàng)，遠離剽竊！

【01x00】函數(shù)應(yīng)用和映射

Pandas 可直接使用 NumPy 的 ufunc（元素級數(shù)組方法）函數(shù)：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(5,4) - 1) >>> obj0 1 2 3 0 -0.228107 1.377709 -1.096528 -2.051001 1 -2.477144 -0.500013 -0.040695 -0.267452 2 -0.485999 -1.232930 -0.390701 -1.947984 3 -0.839161 -0.702802 -1.756359 -1.873149 4 0.853121 -1.540105 0.621614 -0.583360 >>> >>> np.abs(obj)0 1 2 3 0 0.228107 1.377709 1.096528 2.051001 1 2.477144 0.500013 0.040695 0.267452 2 0.485999 1.232930 0.390701 1.947984 3 0.839161 0.702802 1.756359 1.873149 4 0.853121 1.540105 0.621614 0.583360

函數(shù)映射：在 Pandas 中 apply 方法可以將函數(shù)應(yīng)用到列或行上，可以通過設(shè)置 axis 參數(shù)來指定行或列，默認 axis = 0，即按列映射：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(5,4) - 1) >>> obj0 1 2 3 0 -0.707028 -0.755552 -2.196480 -0.529676 1 -0.772668 0.127485 -2.015699 -0.283654 2 0.248200 -1.940189 -1.068028 -1.751737 3 -0.872904 -0.465371 -1.327951 -2.883160 4 -0.092664 0.258351 -1.010747 -2.313039 >>> >>> obj.apply(lambda x : x.max()) 0 0.248200 1 0.258351 2 -1.010747 3 -0.283654 dtype: float64 >>> >>> obj.apply(lambda x : x.max(), axis=1) 0 -0.529676 1 0.127485 2 0.248200 3 -0.465371 4 0.258351 dtype: float64

另外還可以通過 applymap 將函數(shù)映射到每個數(shù)據(jù)上：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.DataFrame(np.random.randn(5,4) - 1) >>> obj0 1 2 3 0 -0.772463 -1.597008 -3.196100 -1.948486 1 -1.765108 -1.646421 -0.687175 -0.401782 2 0.275699 -3.115184 -1.429063 -1.075610 3 -0.251734 -0.448399 -3.077677 -0.294674 4 -1.495896 -1.689729 -0.560376 -1.808794 >>> >>> obj.applymap(lambda x : '%.2f' % x)0 1 2 3 0 -0.77 -1.60 -3.20 -1.95 1 -1.77 -1.65 -0.69 -0.40 2 0.28 -3.12 -1.43 -1.08 3 -0.25 -0.45 -3.08 -0.29 4 -1.50 -1.69 -0.56 -1.81

【02x00】排序

【02x01】sort_index() 索引排序

根據(jù)條件對數(shù)據(jù)集排序（sorting）也是一種重要的內(nèi)置運算。要對行或列索引進行排序（按字典順序），可使用 sort_index 方法，它將返回一個已排序的新對象。

在 Series 和 DataFrame 中的基本語法如下：

Series.sort_index(self,axis=0,level=None,ascending=True,inplace=False,kind='quicksort',na_position='last',sort_remaining=True,ignore_index: bool = False) DataFrame.sort_index(self,axis=0,level=None,ascending=True,inplace=False,kind='quicksort',na_position='last',sort_remaining=True,ignore_index: bool = False)

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_index.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_index.html

常用參數(shù)描述如下：

參數(shù)描述

axis	指定軸排序，0 or ‘index’，1 or ‘columns’，只有在 DataFrame 中才有 1 or 'columns’
ascending	為 True時升序排序（默認），為 False時降序排序
kind	排序方法，quicksort：快速排序（默認）；'mergesort’：歸并排序；'heapsort'：堆排序；具體可參見 numpy.sort()

在 Series 中的應(yīng)用（按照索引 index 排序）：

>>> import pandas as pd >>> obj = pd.Series(range(4), index=['d', 'a', 'b', 'c']) >>> obj d 0 a 1 b 2 c 3 dtype: int64 >>> >>> obj.sort_index() a 1 b 2 c 3 d 0 dtype: int64

在 DataFrame 中的應(yīng)用（可按照索引 index 或列標簽 columns 排序）：

>>> import pandas as pd >>> obj = pd.DataFrame(np.arange(8).reshape((2, 4)), index=['three', 'one'], columns=['d', 'a', 'b', 'c']) >>> objd a b c three 0 1 2 3 one 4 5 6 7 >>> >>> obj.sort_index()d a b c one 4 5 6 7 three 0 1 2 3 >>> >>> obj.sort_index(axis=1)a b c d three 1 2 3 0 one 5 6 7 4 >>> >>> obj.sort_index(axis=1, ascending=False)d c b a three 0 3 2 1 one 4 7 6 5

【02x02】sort_values() 按值排序

在 Series 和 DataFrame 中的基本語法如下：

Series.sort_values(self,axis=0,ascending=True,inplace=False,kind='quicksort',na_position='last',ignore_index=False) DataFrame.sort_values(self,by,axis=0,ascending=True,inplace=False,kind='quicksort',na_position='last',ignore_index=False)

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

常用參數(shù)描述如下：

參數(shù)描述

by	DataFrame 中的必須參數(shù)，指定列的值進行排序，Series 中沒有此參數(shù)
axis	指定軸排序，0 or ‘index’，1 or ‘columns’，只有在 DataFrame 中才有 1 or 'columns’
ascending	為 True時升序排序（默認），為 False時降序排序
kind	排序方法，quicksort：快速排序（默認）；'mergesort’：歸并排序；'heapsort'：堆排序；具體可參見 numpy.sort()

在 Series 中的應(yīng)用，按照值排序，如果有缺失值，默認都會被放到 Series 的末尾：

>>> import pandas as pd >>> obj = pd.Series([4, 7, -3, 2]) >>> obj 0 4 1 7 2 -3 3 2 dtype: int64 >>> >>> obj.sort_values() 2 -3 3 2 0 4 1 7 dtype: int64 >>> >>> obj = pd.Series([4, np.nan, 7, np.nan, -3, 2]) >>> obj 0 4.0 1 NaN 2 7.0 3 NaN 4 -3.0 5 2.0 dtype: float64 >>> >>> obj.sort_values() 4 -3.0 5 2.0 0 4.0 2 7.0 1 NaN 3 NaN dtype: float64

在 DataFrame 中的應(yīng)用，有時候可能希望根據(jù)一個或多個列中的值進行排序。將一個或多個列的名字傳遞給 sort_values() 的 by 參數(shù)即可達到該目的，當傳遞多個列時，首先會對第一列進行排序，若第一列有相同的值，再根據(jù)第二列進行排序，依次類推：

>>> import pandas as pd >>> obj = pd.DataFrame({'a': [4, 4, -3, 2], 'b': [0, 1, 0, 1], 'c': [6, 4, 1, 3]}) >>> obja b c 0 4 0 6 1 4 1 4 2 -3 0 1 3 2 1 3 >>> >>> obj.sort_values(by='c')a b c 2 -3 0 1 3 2 1 3 1 4 1 4 0 4 0 6 >>> >>> obj.sort_values(by='c', ascending=False)a b c 0 4 0 6 1 4 1 4 3 2 1 3 2 -3 0 1 >>> >>> obj.sort_values(by=['a', 'b'])a b c 2 -3 0 1 3 2 1 3 0 4 0 6 1 4 1 4 >>> import pandas as pd >>> obj = pd.DataFrame({'a': [4, 4, -3, 2], 'b': [0, 1, 0, 1], 'c': [6, 4, 1, 3]}, index=['A', 'B', 'C', 'D']) >>> obja b c A 4 0 6 B 4 1 4 C -3 0 1 D 2 1 3 >>> >>> obj.sort_values(by='B', axis=1)b a c A 0 4 6 B 1 4 4 C 0 -3 1 D 1 2 3

【02x03】rank() 返回排序后元素索引

rank() 函數(shù)會返回一個對象，對象的值是原對象經(jīng)過排序后的索引值，即下標。

在 Series 和 DataFrame 中的基本語法如下：

Series.rank(self: ~ FrameOrSeries,axis=0,method: str = 'average',numeric_only: Union[bool, NoneType] = None,na_option: str = 'keep',ascending: bool = True,pct: bool = False) DataFrame.rank(self: ~ FrameOrSeries,axis=0,method: str = 'average',numeric_only: Union[bool, NoneType] = None,na_option: str = 'keep',ascending: bool = True,pct: bool = False)

官方文檔：

https://pandas.pydata.org/docs/reference/api/pandas.Series.rank.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rank.html

常用參數(shù)描述如下：

參數(shù)描述

axis	指定軸排序，0 or ‘index’，1 or ‘columns’，只有在 DataFrame 中才有 1 or 'columns’
method	有相同值時，如何處理： ‘a(chǎn)verage’：默認值，去兩個相同索引的平均值；‘min’：取兩個相同索引的最小值； ‘max’：取兩個相同索引的最大值；‘first’：按照出現(xiàn)的先后順序； ‘dense’：和 'min' 差不多，但是各組之間總是+1的，不太好解釋，可以看后面的示例
ascending	為 True時升序排序（默認），為 False時降序排序

在 Series 中的應(yīng)用，按照值排序，如果有缺失值，默認都會被放到 Series 的末尾：

>>> import pandas as pd >>> obj = pd.Series([7, -5, 7, 4, 2, 0, 4]) >>> obj 0 7 1 -5 2 7 3 4 4 2 5 0 6 4 dtype: int64 >>> >>> obj.rank() 0 6.5 # 第 0 個和第 2 個值從小到大排名分別為 6 和 7，默認取平均值，即 6.5 1 1.0 2 6.5 3 4.5 # 第 3 個和第 6 個值從小到大排名分別為 4 和 5，默認取平均值，即 4.5 4 3.0 5 2.0 6 4.5 dtype: float64 >>> >>> obj.rank(method='first') 0 6.0 # 第 0 個和第 2 個值從小到大排名分別為 6 和 7，按照第一次出現(xiàn)排序，分別為 6 和 7 1 1.0 2 7.0 3 4.0 # 第 3 個和第 6 個值從小到大排名分別為 4 和 5，按照第一次出現(xiàn)排序，分別為 4 和 5 4 3.0 5 2.0 6 5.0 dtype: float64 >>> >>> obj.rank(method='dense') 0 5.0 # 第 0 個和第 2 個值從小到大排名分別為 6 和 7，按照最小值排序，但 dense 規(guī)定間隔為 1 所以為 5 1 1.0 2 5.0 3 4.0 # 第 3 個和第 6 個值從小到大排名分別為 4 和 5，按照最小值排序，即 4 4 3.0 5 2.0 6 4.0 dtype: float64 >>> >>> obj.rank(method='min') 0 6.0 # 第 0 個和第 2 個值從小到大排名分別為 6 和 7，按照最小值排序，即 6 1 1.0 2 6.0 3 4.0 # 第 3 個和第 6 個值從小到大排名分別為 4 和 5，按照最小值排序，即 4 4 3.0 5 2.0 6 4.0 dtype: float64

在 DataFrame 中可以使用 axis 參數(shù)來指定軸：

>>> import pandas as pd >>> obj = pd.DataFrame({'b': [4.3, 7, -3, 2], 'a': [0, 1, 0, 1], 'c': [-2, 5, 8, -2.5]}) >>> objb a c 0 4.3 0 -2.0 1 7.0 1 5.0 2 -3.0 0 8.0 3 2.0 1 -2.5 >>> >>> obj.rank()b a c 0 3.0 1.5 2.0 1 4.0 3.5 3.0 2 1.0 1.5 4.0 3 2.0 3.5 1.0 >>> >>> obj.rank(axis='columns')b a c 0 3.0 2.0 1.0 1 3.0 1.0 2.0 2 1.0 2.0 3.0 3 3.0 2.0 1.0

【03x00】層級索引

【03x01】認識層級索引

以下示例將創(chuàng)建一個 Series 對象，索引 Index 由兩個子 list 組成，第一個子 list 是外層索引，第二個 list 是內(nèi)層索引：

【03x02】MultiIndex 索引對象

官方文檔：https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.html

嘗試打印上面示例中 Series 的索引類型，會得到一個 MultiIndex 對象，MultiIndex 對象的 levels 屬性表示兩個層級中分別有那些標簽，codes 屬性表示每個位置分別是什么標簽，如下所示：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(12),index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]]) >>> obj a 0 0.0359461 -0.8672152 -0.053355 b 0 -0.9866161 0.0260712 -0.048394 c 0 0.2512741 0.2177902 1.137674 d 0 -1.2451781 1.2349722 -0.035624 dtype: float64 >>> >>> type(obj.index) <class 'pandas.core.indexes.multi.MultiIndex'> >>> >>> obj.index MultiIndex([('a', 0),('a', 1),('a', 2),('b', 0),('b', 1),('b', 2),('c', 0),('c', 1),('c', 2),('d', 0),('d', 1),('d', 2)],) >>> obj.index.levels FrozenList([['a', 'b', 'c', 'd'], [0, 1, 2]]) >>> >>> obj.index.codes FrozenList([[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

通常可以使用 from_arrays() 方法來將數(shù)組對象轉(zhuǎn)換為 MultiIndex 索引對象：

>>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']] >>> pd.MultiIndex.from_arrays(arrays, names=('number', 'color')) MultiIndex([(1, 'red'),(1, 'blue'),(2, 'red'),(2, 'blue')],names=['number', 'color'])

其他常用方法見下表（更多方法參見官方文檔）：

方法描述

from_arrays(arrays[, sortorder, names])	將數(shù)組轉(zhuǎn)換為 MultiIndex
from_tuples(tuples[, sortorder, names])	將元組列表轉(zhuǎn)換為 MultiIndex
from_product(iterables[, sortorder, names])	將多個可迭代的笛卡爾積轉(zhuǎn)換成 MultiIndex
from_frame(df[, sortorder, names])	將 DataFrame 對象轉(zhuǎn)換為 MultiIndex
set_levels(self, levels[, level, inplace, …])	為 MultiIndex 設(shè)置新的 levels
set_codes(self, codes[, level, inplace, …])	為 MultiIndex 設(shè)置新的 codes
sortlevel(self[, level, ascending, …])	根據(jù) level 進行排序
droplevel(self[, level])	刪除指定的 level
swaplevel(self[, i, j])	交換 level i 與 level i，即交換外層索引與內(nèi)層索引

【03x03】提取值

對于這種有多層索引的對象，如果只傳入一個參數(shù)，則會對外層索引進行提取，其中包含對應(yīng)所有的內(nèi)層索引，如果傳入兩個參數(shù)，則第一個參數(shù)表示外層索引，第二個參數(shù)表示內(nèi)層索引，示例如下：

【03x04】交換分層與排序

MultiIndex 對象的 swaplevel() 方法可以交換外層與內(nèi)層索引，sortlevel() 方法會先對外層索引進行排序，再對內(nèi)層索引進行排序，默認是升序，如果設(shè)置 ascending 參數(shù)為 False 則會降序排列，示例如下：

>>> import pandas as pd >>> import numpy as np >>> obj = pd.Series(np.random.randn(12),index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]]) >>> obj a 0 -0.1102151 0.1930752 -1.101706 b 0 -1.3257431 0.5284182 -0.127081 c 0 -0.7338221 1.6652622 0.127073 d 0 1.2620221 -1.1705182 0.966334 dtype: float64 >>> >>> obj.swaplevel() 0 a -0.110215 1 a 0.193075 2 a -1.101706 0 b -1.325743 1 b 0.528418 2 b -0.127081 0 c -0.733822 1 c 1.665262 2 c 0.127073 0 d 1.262022 1 d -1.170518 2 d 0.966334 dtype: float64 >>> >>> obj.swaplevel().index.sortlevel() (MultiIndex([(0, 'a'),(0, 'b'),(0, 'c'),(0, 'd'),(1, 'a'),(1, 'b'),(1, 'c'),(1, 'd'),(2, 'a'),(2, 'b'),(2, 'c'),(2, 'd')],), array([ 0, 3, 6, 9, 1, 4, 7, 10, 2, 5, 8, 11], dtype=int32))

總結(jié)

以上是生活随笔為你收集整理的Python 数据分析三剑客之 Pandas（四）：函数应用、映射、排序和层级索引的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：招行现金分期申请失败招行现金分期不能用
下一篇：人民币升值太快，4个月升值4000点，央