當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

非常详细的Series核心操作使用详解

發布時間：2023/12/20 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了非常详细的Series核心操作使用详解小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

簡介
1 創建
- 1.1 通過字典操作
- 1.2 通過numpy數組創建
- 1.3 通過標量創建
2 數據訪問
- 2.1 通過下標訪問
- 2.2 通過索引訪問
- 2.3 通過切片訪問
- 2.4 布爾變量訪問
3 索引操作
- 根據數據分組
- 3.1 索引屬性
- 3.2 訪問索引
4 基本操作
- 4.1 添加數據
- 4.2 刪除數據
- 4.3 修改數據
- 4.4 查看數據
- 4.5 重建索引
- 4.6 數據對齊
5 數據統計
- 5.1 功能介紹
- 5.2 代碼演示
6 注意事項
參考

簡介

Pandas是非常強大的二維數組操作庫。而二維庫是由多個一級的series組成，它具有以下內容：

數據：可以是列表, 字典或標量值。
index：索引的值應唯一且可哈希。它必須與數據長度相同。
dtype：是指系列的數據類型。

本文將介紹Series的基本使用方法。

1 創建

1.1 通過字典操作

根據以下示例可見：

通過字典可以直接生成Series對象
鍵會成為Series的索引，值會變成Series的數據
值類型可以不一樣（這點不同于Numpy）

import numpy as np import pandas as pd data = {'1':1,'2':2,'3':3,'4':'hello','5':'python','list':[1,2] } s1 = pd.Series(data) print(s1, type(s1))#運行結果 1 1 2 2 3 3 4 hello 5 python list1 [1, 2] dtype: object <class 'pandas.core.series.Series'>

1.2 通過numpy數組創建

| 在一個Series對象中，數據是必需的，所以通過pd.Series() | 數在創建時，第1個就是數據。下面的示例中使用 np. |random.rand(5) 函數生成長度為5的一組narray對象，對Series進行初始化。index就是索引，長度與數據必需一致。name是Series對象的名稱用于顯示。這兩個變量都不是必需的。

import numpy as np import pandas as pd # 三個參數分別表示數據，索引和Series的名稱 s = pd.Series(np.random.rand(5), index = list('abcde'), name = 'test') print(s,type(s))# 輸出 a 0.478839 b 0.517298 c 0.854202 d 0.543885 e 0.032623 Name: test, dtype: float64 <class 'pandas.core.series.Series'>

1.3 通過標量創建

所謂標題就是一維的常量，如下所示，可以使用數字3創建一個長度為5的Series對象。這里，長度是由索引決定的，由于數據只有一標量3，所以就用3填充5次。

import numpy as np import pandas as pds = pd.Series(3,index=list('abcde')) print(s)# 輸出 a 3 b 3 c 3 d 3 e 3 dtype: int64

2 數據訪問

數據訪問就是訪問Series對象中數據的方法。由于Series是一維的，所以常規可以通過索引或偏移量的方式進行訪問數據。

2.1 通過下標訪問

通過下標訪問是最常規的一種方法，可以將Series對象當作數組一樣使用下標進行訪問，下標同樣從0開始。

import numpy as np import pandas as pds = pd.Series(np.random.rand(5)) print(s,'\n') print('s[2]:', s[2],type(s[2]),s[2].dtype)# 輸出 0 0.949404 1 0.400692 2 0.660859 3 0.295815 4 0.680184 dtype: float64 s[2]: 0.6608588265235231 <class 'numpy.float64'> float64

2.2 通過索引訪問

通過索引訪問就是利用Series中的index訪問對應的數據，可以理解為將Series當作字典，使用key訪問其value。不過其訪問功能更加強大，除了可以使用單個key訪問其value，還可以使用包含多個key的列表，一次獲得多個value。

需要注意，使用單個key訪問時，若key不存在時，則會報錯，如果使用key列表，則返回為None。

import numpy as np import pandas as pds = pd.Series(np.random.rand(5), index = list('abcde')) print(s) print('-'*10, '\n') print("s['a']:", s['a'], '\n') print("--- s[['b','e', 'f']] ---") print(s[['b','e', 'f']])# 輸出 a 0.977675 b 0.128278 c 0.110421 d 0.413023 e 0.568087 dtype: float64 ---------- s['a']: 0.9776748201255117 --- s[['b','e', 'f']] --- b 0.128278 e 0.568087 f NaN dtype: float64 # s['f']不存在，第一次會給出報警，但可以正常執行 C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py:1152: FutureWarning: Passing list-likes to .loc or [] with any missing label will raise | KeyError in the future, you can use .reindex() | as an alternative. |See the documentation here: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlikereturn self.loc[key]

2.3 通過切片訪問

import numpy as np import pandas as pds1 = pd.Series(np.random.rand(5),list('abcde')) print(s1,'\n') print(s1['a':'c'],'\n') #用index做索引的話是末端包含的 print(s1[0:2],'\n') #用下標做切片索引的話和list切片是一樣的，不包含末端 # 輸出 a 0.634454 b 0.132619 c 0.211219 d 0.559798 e 0.424643 dtype: float64 a 0.634454 b 0.132619 c 0.211219 dtype: float64 a 0.634454 b 0.132619 dtype: float64

2.4 布爾變量訪問

布爾型索引判斷，生成的是一個由布爾型組成的新的Series。
| .isnull() | .notnull() | 判斷是否是空值，其中None表示空值，NaN表示有問題的值，兩個都會被判斷為空值。 |

import numpy as np import pandas as pds = pd.Series([0.2, 0.5, None]) print(s,'\n') print(s > 50,'\n') | print(s.isnull() | '\n') | | print(s.notnull() | '\n') | print(s[s > 50])# 輸出 0 0.2 1 0.5 2 NaN dtype: float64 0 False 1 False 2 False dtype: bool 0 False 1 False 2 True dtype: bool 0 True 1 True 2 False dtype: bool Series([], dtype: float64)

3 索引操作

除了數據的操作，索引的操作也很重要，下面是對索引的一些常規操作。

根據數據分組

3.1 索引屬性

除了數據訪問，我們還可以訪問索引內容。索引的類型。以下代碼展示了常用的索引類型，基本范圍都是與range相關的內容。

import numpy as np import pandas as pds = pd.Series(np.random.rand(5), index=range(5)) print('type(s.index): ', type(s.index), '\n') print('s.index:', s.index, '\n')s = pd.Series(np.random.rand(5), index=list('abcde')) print('type(s.index): ', type(s.index), '\n') print('s.index:', s.index, '\n')s = pd.Series(np.random.rand(5), index=pd.date_range('2018-01-01', periods=5)) print('type(s.index): ', type(s.index), '\n') print('s.index:', s.index, '\n')# 輸出 type(s.index): <class 'pandas.core.indexes.range.RangeIndex'> s.index: RangeIndex(start=0, stop=5, step=1) type(s.index): <class 'pandas.core.indexes.base.Index'> s.index: Index(['a', 'b', 'c', 'd', 'e'], dtype='object') type(s.index): <class 'pandas.core.indexes.datetimes.DatetimeIndex'> s.index: DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04','2018-01-05'],dtype='datetime64[ns]', freq='D')

3.2 訪問索引

由于索引類型基本與range相關，所以可以與list類型一樣使用下標和范圍進行訪問。

import numpy as np import pandas as pds = pd.Series(np.random.rand(5), index=range(5))print('type(s)', type(s), '\n')print(s, '\n')# 查看索引 print(s.index, '\n')# 查看范圍為 [1, 3) 的索引范圍 print(s.index[1:3], '\n')# 可以直接使用選定范圍后的內容查看s的數據 print(s[s.index[1:3]])# 遍歷索引，然后顯示索引對應的值 for id in s.index:print(s[id])# 輸出 type(s) <class 'pandas.core.series.Series'> 0 0.138492 1 0.285440 2 0.280471 3 0.245737 4 0.996996 dtype: float64 RangeIndex(start=0, stop=5, step=1) RangeIndex(start=1, stop=3, step=1)1 0.285440 2 0.280471 dtype: float640.13849193313381447 0.2854401610542934 0.280470887359729 0.2457365359030208 0.996996040313859

4 基本操作

4.1 添加數據

import numpy as np import pandas as pds1 = pd.Series(np.random.rand(2)) print('s1') print(s1)s1[3]= 100 #用index增添 s1['a'] = 200 print('\ns1添加兩數據') print(s,'\n')s2 = pd.Series(np.random.rand(2), index = ['value1','value2']) print('\ns2') print(s2)s3 = s.append(s2) #用append()增添 print('\ns1.append(s2)') print(s3)# 輸出 s1 0 0.981331 1 0.555244 dtype: float64s1添加兩數據 0 0.570088 1 0.835804 3 100.000000 a 200.000000 dtype: float64 s2 value1 0.089712 value2 0.399171 dtype: float64s1.append(s2) 0 0.570088 1 0.835804 3 100.000000 a 200.000000 value1 0.089712 value2 0.399171 dtype: float64

4.2 刪除數據

import numpy as np import pandas as pd s = pd.Series(np.random.rand(5),index = list('abcde')) print('s') print(s)del s['a'] #用del刪除 print("\n刪除1個數據：del s['a']") print(s,'\n')s1 = s.drop(['c','d']) #用.drop()刪除，刪除多個要加[] print("\n刪除多個數據：s1 = s.drop(['c','d']) ") print(s1)# 輸出 s a 0.687421 b 0.938094 c 0.391408 d 0.667542 e 0.245056 dtype: float64刪除1個數據：del s['a'] b 0.938094 c 0.391408 d 0.667542 e 0.245056 dtype: float64 刪除多個數據：s1 = s.drop(['c','d']) b 0.938094 e 0.245056 dtype: float64

4.3 修改數據

數據修改直接使用索引指定進行賦值操作，可以單個修改也可以批量修改。

import numpy as np import pandas as pds = pd.Series(np.random.rand(5),index = list('abcde')) print(s,'\n') s[1] = 100 # 直接賦值 print(s,'\n') s[['c','d']] = 200 # 批量賦值 print(s)# 輸出 a 0.317819 b 0.359241 c 0.662112 d 0.087609 e 0.940697 dtype: float64 a 0.317819 b 100.000000 c 0.662112 d 0.087609 e 0.940697 dtype: float64 a 0.317819 b 100.000000 c 200.000000 d 200.000000 e 0.940697 dtype: float64

4.4 查看數據

類似于Linux的head和tail命令，可以使用s.head(n)和s.tail(n)進行數據訪問。

import numpy as np import pandas as pd s = pd.Series(np.random.rand(10)) print(s.head(2),'\n') print(s.tail(3))# 輸出 0 0.140628 1 0.768699 dtype: float64 7 0.255628 8 0.535300 9 0.324614 dtype: float64

4.5 重建索引

.reindex(新的標簽,fill_value = )會根據更改后的標簽重新排序，若添加了原標簽中沒有的新標簽，則默認填入NaN，參數fill_value指對新出現的標簽填入的值。

import numpy as np import pandas as pd s = pd.Series(np.random.rand(3),index = ['a','b','c']) print(s, '\n') s1 = s.reindex(['c','b','a','A'],fill_value = 100) print(s1)# 輸出 a 0.692466 b 0.757568 c 0.181863 dtype: float64 c 0.181863 b 0.757568 a 0.692466 A 100.000000 dtype: float64

4.6 數據對齊

數據對齊的目的是根據索引，對數據進行相應的操作，如相加。

import numpy as np import pandas as pds1 = pd.Series(np.random.rand(3),index = ['a','b','c']) s2 = pd.Series(np.random.rand(3),index =['a','c','A']) print(s1,'\n') print(s2,'\n') print(s1+s2) # 輸出 a 0.414064 b 0.599441 c 0.579188 dtype: float64 a 0.163382 c 0.095508 A 0.521609 dtype: float64 A NaN a 0.577446 b NaN c 0.674696 dtype: float64

5 數據統計

5.1 功能介紹

常用的統計函數以下表所示：

函數含義

aggregate()	聚合運算，用于自定義統計函數，待研究。
all()	等價于邏輯“與”
any()	等價于邏輯“或”
idxmin()	尋找最小值對應的所在位置
idxmax()	尋找最大值所在位置
count()	計數，None不統計。
cumsum()	運算累計和
cumprod()	運算累計積
cov()	計算協方差
corr()	計算相關系數
describe()	描述性統計，返回多個常用統計結果。
groupby()	分組
kurt()	計算峰度
max()	計算最大值
mean()	計算平均值
median()	計算中位數
min()	計算最小值
mode()	計算眾數
pct--_change()	運算比率（后一個元素與前一個元素的比率）
quantile()	計算任意分位數
size()	計數（統計所有元素的個數）
skew()	計算偏度
std()	計算標準差
sum()	求和
value_counts()	頻次統計，即按相同值分組，返回每組的數據個數。
var()	計算方差

5.2 代碼演示

以下為演示代碼，以展示主要函數使用效果。
注：部分函數測試未通過，待進一步調研。

import numpy as np import pandas as pd data=[1,2,3,4,5,5,6,8,1,3,5,2,5,2] s = pd.Series(data) print(s)#print('s.aggregate()', s.aggregate(3), '\n') print('s.all()', s.all(), '\n') print('s.any()', s.any(), '\n') print('s.idxmin()', s.idxmin(), '\n') print('s.idxman()', s.idxmax(), '\n') print('s.count()', s.count(), '\n') print('s.cumsum()', s.cumsum(), '\n') print('s.cumprod()', s.cumprod(), '\n') #print('s.cov()', s.cov(), '\n') #print('s.corr()', s.corr(), '\n') print('s.describe()', s.describe(), '\n') #print('s.groupby()', s.groupby(5), '\n') print('s.kurt()', s.kurt(), '\n') print('s.max()', s.max(), '\n') print('s.mean()', s.mean(), '\n') print('s.median()', s.median(), '\n') print('s.min()', s.min(), '\n') print('s.mode()', s.mode(), '\n') print('s.pct_change()', s.pct_change(), '\n') print('s.quantile()', s.quantile(), '\n') #print('s.size()', s.size(), '\n') print('s.skew()', s.skew(), '\n') print('s.std()', s.std(), '\n') print('s.sum()', s.sum(), '\n') print('s.value_counts()', s.value_counts(), '\n') print('s.var()', s.var(), '\n')# 輸出 0 1 1 2 2 3 3 4 4 5 5 5 6 6 7 8 8 1 9 3 10 5 11 2 12 5 13 2 dtype: int64 s.all() True s.any() True s.idxmin() 0 s.idxman() 7 s.count() 14 s.cumsum() 0 1 1 3 2 6 3 10 4 15 5 20 6 26 7 34 8 35 9 38 10 43 11 45 12 50 13 52 dtype: int64 s.cumprod() 0 1 1 2 2 6 3 24 4 120 5 600 6 3600 7 28800 8 28800 9 86400 10 432000 11 864000 12 4320000 13 8640000 dtype: int64 s.describe() count 14.000000 mean 3.714286 std 2.054210 min 1.000000 25% 2.000000 50% 3.500000 75% 5.000000 max 8.000000 dtype: float64 s.kurt() -0.33190548058712066 s.max() 8 s.mean() 3.7142857142857144 s.median() 3.5 s.min() 1 s.mode() 0 5 dtype: int64 s.pct_change() 0 NaN 1 1.000000 2 0.500000 3 0.333333 4 0.250000 5 0.000000 6 0.200000 7 0.333333 8 -0.875000 9 2.000000 10 0.666667 11 -0.600000 12 1.500000 13 -0.600000 dtype: float64 s.quantile() 3.5 s.skew() 0.4487734149006034 s.std() 2.054210364052382 s.sum() 52 s.value_counts() 5 4 2 3 3 2 1 2 8 1 6 1 4 1 dtype: int64 s.var() 4.21978021978022

6 注意事項

空值（None）和任何值相加都會返回空值。
count之類的函數不統計空值（None）。

參考

[1] pandas時間序列操作方法pd.date_range()，https://blog.csdn.net/missyougoon/article/details/83958749
[2] pd.Series 用法，https://www.cnblogs.com/sparkingplug/p/11409365.html
[3] Pandas時間序列：生成指定范圍的日期, https://blog.csdn.net/bqw18744018044/article/details/80920356

總結

以上是生活随笔為你收集整理的非常详细的Series核心操作使用详解的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： JavaScript - 通过居民身份证
下一篇： vm安装vmtools