當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pandas之Seris和DataFrame

發(fā)布時間：2023/11/29 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了 pandas之Seris和DataFrame 小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

pandas是一個強(qiáng)大的python工具包，提供了大量處理數(shù)據(jù)的函數(shù)和方法，用于處理數(shù)據(jù)和分析數(shù)據(jù)。

使用pandas之前需要先安裝pandas包，并通過import pandas as pd導(dǎo)入。

一、系列Series

Seris為帶標(biāo)簽的一維數(shù)組，標(biāo)簽即為索引。

1.Series的創(chuàng)建

Seris創(chuàng)建的方法：s = pd.Seris(obj , index=' ***' , name='***')

Seris創(chuàng)建時如果不通過參數(shù)指定name，名稱默認(rèn)為None，并不是=前面的變量名稱s。

①通過字典創(chuàng)建

通過字典創(chuàng)建Seris，字典的key即為索引。如果字典的key有重復(fù)，創(chuàng)建Seris時會取最后出現(xiàn)的一個值。

dic = {'name':'Alice','age':23,'age':20,'age':25,'hobby':'dance'} s = pd.Series(dic,name='dic_Seris') print(s) # name Alice # age 25 # hobby dance # Name: dic_Seris, dtype: object 通過字典創(chuàng)建Seris

②通過一維數(shù)組、列表或元組創(chuàng)建

通過這種方法，如果不指定索引index，默認(rèn)為從0開始的整數(shù)；如果指定index，index的數(shù)量必須與Seris的元素個數(shù)保持一致，否則會報錯。

arr = np.arange(1,6) s1 = pd.Series(arr) s2 = pd.Series(arr,index=list('abcde'),name='iter_Seris') print(s1.name,s2.name) print(s1) print('-------------') print(s2) # None iter_Seris # 0 1 # 1 2 # 2 3 # 3 4 # 4 5 # dtype: int32 # ------------- # a 1 # b 2 # c 3 # d 4 # e 5 # Name: iter_Seris, dtype: int32 通過一維數(shù)組、列表或元組創(chuàng)建Seris

③通過標(biāo)量創(chuàng)建

?通過標(biāo)量創(chuàng)建時，參數(shù)obj為一個固定的值，表示Seris中元素的值，此時必須指定index，index的個數(shù)表示元素個數(shù)。

s = pd.Series('hi',index=list('abc'),name='s_Seris') print(s) # a hi # b hi # c hi # Name: s_Seris, dtype: object 通過標(biāo)量創(chuàng)建Seris

2.Series的索引

①下標(biāo)索引

下標(biāo)索引從0開始，-1表示最后一個元素，通過[m:n]切片包括m不包括n。Seris中的每一個元素類型為<class 'numpy.***'>

還可以通過[[ m,n,x]]獲取下標(biāo)為m、n、x的值，列表和元組沒有該用法。

s = pd.Series([1,2,3,4,5],index=list('abcde')) print(s[1],type(s[1])) print(s[-2]) print(s[1:3]) print(s[[0,4]]) # 2 <class 'numpy.int64'> # 4 # b 2 # c 3 # dtype: int64 # a 1 # e 5 # dtype: int64 Seris下標(biāo)索引

②標(biāo)簽索引

與下標(biāo)索引不同的是，標(biāo)簽通過[m:n]切片時包含m也包含n。也可以通過[[ m,n,x]]獲取標(biāo)簽為m、n和x的值

s = pd.Series([1,2,3,4,5],index=list('abcde')) print(s['b']) print(s['c':'d']) print(s[['a','e']]) # 2 # c 3 # d 4 # dtype: int64 # a 1 # e 5 # dtype: int64 Seris標(biāo)簽索引

注意，如果Seris的標(biāo)簽也為整數(shù)時，會出現(xiàn)混亂，因此不建議自定義數(shù)字為標(biāo)簽索引。

s = pd.Series([1,2,3,4,5],index=[1,2,3,4,5]) print(s) print('------------') print(s[3]) print('------------') print(s[2:4]) # 1 1 # 2 2 # 3 3 # 4 4 # 5 5 # dtype: int64 # ------------ # 3 # ------------ # 3 3 # 4 4 # dtype: int64 View Code

③布爾索引

s = pd.Series([1,2,3,4,5],index=list('abcde')) m = s > 3 print(m) print(s[m]) # a False # b False # c False # d True # e True # dtype: bool # d 4 # e 5 # dtype: int64 Seris布爾值索引

3.Seris查看和常用方法

①head()和tail()

參數(shù)默認(rèn)為5，表示查看前5個和后5個，可指定參數(shù)。

s = pd.Series([1,2,3,4,5,6,7,8,9,10]) print(s.head(2)) print(s.tail((3))) # 0 1 # 1 2 # dtype: int64 # 7 8 # 8 9 # 9 10 # dtype: int64 head()和tail()

?

②tolist()（也可寫作to_list()）

將Seris轉(zhuǎn)化為列表

s = pd.Series(np.random.randint(1,10,10)) print(s.tolist()) # [3, 8, 8, 9, 8, 2, 2, 7, 7, 7]

?

③reindex(index , fill_value=NaN)

reindex會生成一個新的Seris，對于參數(shù)index，如果在原Seris的index中存在則保留，不存在則將值填充為fill_value指定的值，fill_value默認(rèn)為NaN

arr = np.arange(1,6) s1 = pd.Series(arr,index = list('abcde')) s2 =s1.reindex(['a','d','f','h'],fill_value=0) print(s1) print(s2) # a 1 # b 2 # c 3 # d 4 # e 5 # dtype: int32 # a 1 # d 4 # f 0 # h 0 # dtype: int32 reindex()

?④+和-

Seris與單個值的加法和減法，是對Seris的每個元素進(jìn)行操作。

兩個Seris的加法和減法，對兩者index相同的數(shù)值做加法和減法，不相同的部分index都保留，值默認(rèn)為NaN。

s1 = pd.Series(np.arange(1,4),index = list('abc')) s2 = pd.Series(np.arange(5,8),index = list('bcd')) print(s1+s2) print('--------') print(s2-s1) print('--------') print(s1+10) # a NaN # b 7.0 # c 9.0 # d NaN # dtype: float64 # -------- # a NaN # b 3.0 # c 3.0 # d NaN # dtype: float64 # -------- # a 11 # b 12 # c 13 # dtype: int32 Seris的加法和減法

?⑤元素的添加

直接通過標(biāo)簽方式添加元素（通過下標(biāo)方式添加報超出索引錯誤），修改原Seris。

s = pd.Series(np.arange(1,4),index = list('abc')) # s[3] = 10 s['p'] = 15 print(s) # a 1 # b 2 # c 3 # p 15 # dtype: int64 Seris添加元素

s1.appeng(s2)，生成一個新的Seris，不修改s1和s2

s1 = pd.Series(np.arange(1,3),index = list('ab')) s2 = pd.Series(np.arange(3,5),index = list('mn')) a = s1.append(s2) print(s1) print(s2) print(a) # a 1 # b 2 # dtype: int32 # m 3 # n 4 # dtype: int32 # a 1 # b 2 # m 3 # n 4 # dtype: int32 append()

⑥元素的刪除drop()

用法：drop(index,inplace = False)，表示刪除原Seris中索引為參數(shù)index的值，默認(rèn)刪除的內(nèi)容會生成一個新的Seris且不改變原Seris，如果指定Inplace = True則會直接修改原Seris。

s1 = pd.Series(np.arange(1,4),index = list('abc')) s2 = s1.drop(['a','c']) print(s1) print(s2) s3 = pd.Series(np.arange(5,8),index = list('lmn')) s4 = s3.drop('m',inplace=True) print(s3) print(s4) # a 1 # b 2 # c 3 # dtype: int32 # b 2 # dtype: int32 # l 5 # n 7 drop()刪除元素

?返回頂部

二、數(shù)據(jù)幀DataFrame

DataFrame是一個表格型的數(shù)據(jù)結(jié)構(gòu)，是一組帶有標(biāo)簽的二維數(shù)組，DataFrame是pandas中最常用的一種數(shù)據(jù)結(jié)構(gòu)。創(chuàng)建一個DataFrame為df，則

df.index表示行索引，df.columns表示列索引，df.values表示實際的值。

dic = {'name':['alice','Bob','Jane'],'age':[23,26,25]} df = pd.DataFrame(dic) print(df) print(type(df)) print(df.index) print(df.columns) print(df.values) # name age # 0 alice 23 # 1 Bob 26 # 2 Jane 25 # <class 'pandas.core.frame.DataFrame'> # RangeIndex(start=0, stop=3, step=1) # Index(['name', 'age'], dtype='object') # [['alice' 23] # ['Bob' 26] # ['Jane' 25]] DataFrame數(shù)據(jù)示例

1.DataFrame的創(chuàng)建

①通過字典、或者由字典組成的列表創(chuàng)建

通過這種方法，字典的key就是列索引，行索引默認(rèn)為從0開始的整數(shù)。

dic1 = [{'name':'Alice','age':23},{'name':'Bob','age':26},{'name':'Jane','age':25}] dic2 = {'name':['alice','Bob','Jane'],'age':[23,26,25]} df1 = pd.DataFrame(dic1) df2 = pd.DataFrame(dic2) print(df1) print('---------------') # print(pd.DataFrame(df1,columns=['name','age'])) print(df2) # age name # 0 23 Alice # 1 26 Bob # 2 25 Jane # --------------- # name age # 0 alice 23 # 1 Bob 26 # 2 Jane 25 通過列表或字典創(chuàng)建DataFrame

創(chuàng)建時可通過index指定行索引，但是索引的個數(shù)必須要與DataFrame的行數(shù)保持一致，否則會報錯。

也可以通過columns指定列索引，列索引的個數(shù)可以不與DataFrame的列數(shù)保持一致，索引相同的部分保留，原字典或列表中多余的部分去除，columns中多余的部分保留并填充值為NaN

dic = {'name':['alice','Bob','Jane'],'age':[23,26,25]} df1 = pd.DataFrame(dic,columns=['name','hobby']) df2 = pd.DataFrame(dic,index=['a','b','c']) print(df1) print(df2) # name hobby # 0 alice NaN # 1 Bob NaN # 2 Jane NaN # name age # a alice 23 # b Bob 26 # c Jane 25 指定行索引和列索引

②通過Seris創(chuàng)建

通過Seris創(chuàng)建時，Seris的長度可以不一致，DataFrame會取最長的Seris，并將不足的部分填充為NaN

dic1 = {'one':pd.Series(np.arange(2)),'two':pd.Series(np.arange(3))} dic2 = {'one':pd.Series(np.arange(2),index=['a','b']),'two':pd.Series(np.arange(3),index = ['a','b','c'])} print(pd.DataFrame(dic1)) print('------------') print(pd.DataFrame(dic2)) # one two # 0 0.0 0 # 1 1.0 1 # 2 NaN 2 # ------------ # one two # a 0.0 0 # b 1.0 1 # c NaN 2 通過Seris創(chuàng)建DataFrame

③通過二維數(shù)組創(chuàng)建

方法：DataFrame(arr,index=‘***’? ,columns=‘***’)，如果不指定index和columns，默認(rèn)都是從0開始的整數(shù)，如果指定則index和columns的長度必須與二維數(shù)據(jù)的行數(shù)和列數(shù)相同，否則會報錯。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index=['a','b','c'],columns=['col1','col2','col3','col4']) print(df) # col1 col2 col3 col4 # a 0 1 2 3 # b 4 5 6 7 # c 8 9 10 11 通過二維數(shù)組創(chuàng)建DataFrame

④通過嵌套字典創(chuàng)建?

通過這種方法創(chuàng)建，字典的外層key為列索引，內(nèi)層key為行索引。

dic = {'Chinese':{'Alice':92,'Bob':95,'Jane':93},'Math':{'Alice':96,'Bob':98,'Jane':95}} print(pd.DataFrame(dic)) # Chinese Math # Alice 92 96 # Bob 95 98 # Jane 93 95 通過嵌套字典創(chuàng)建DataFrame

2.DataFrame的索引

可通過.values直接獲取不帶index和column的內(nèi)容部分，結(jié)果為一個二維數(shù)組。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df.values) # [[ 0 1 2 3] # [ 4 5 6 7] # [ 8 9 10 11]] .values獲取內(nèi)容部分

①列索引

單列索引直接使用df['列索引']即可，數(shù)據(jù)類型為Seris，名稱為列索引，index為原DataFrame的index；

多列索引通過df[['列索引1','列索引2',...]]，結(jié)果為DataFrame，columns為指定的索引，index為原DataFrame的index。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df) print('-------------') print(df['one'],type(df['one'])) print('-------------') print(df[['one','three']]) # one two three four # a 0 1 2 3 # b 4 5 6 7 # c 8 9 10 11 # ------------- # a 0 # b 4 # c 8 # Name: one, dtype: int32 <class 'pandas.core.series.Series'> # ------------- # one three # a 0 2 # b 4 6 # c 8 10 DataFrame列索引

②行索引

單行索引通過df.loc['行索引']實現(xiàn)，數(shù)據(jù)類型為Seris，名稱為行索引，index為原DataFrame的columns；

多行索引通過df.loc[['行索引1','行索引2',...]]，結(jié)果為DataFrame，columns為原DataFrame的columns，index為的指定的行索引。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df.loc['a'],type(df.loc['a'])) print(df.loc[['a','c']]) # one 0 # two 1 # three 2 # four 3 # Name: a, dtype: int32 <class 'pandas.core.series.Series'> # one two three four # a 0 1 2 3 # c 8 9 10 11 DataFrame行索引

行索引也可以使用iloc[]，loc[]使用標(biāo)簽作為行索引，iloc[ ]使用下標(biāo)（即第幾行）作為索引

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df.iloc[1],type(df.iloc[1])) print(df.iloc[[0,2]]) # one 4 # two 5 # three 6 # four 7 # Name: b, dtype: int32 <class 'pandas.core.series.Series'> # one two three four # a 0 1 2 3 # c 8 9 10 11 DataFrame的iloc[]行索引

③單元格和塊索引

單元格的索引有三種方式：df['列索引'].loc['行索引']、df.loc['行索引']['列索引']、df.loc['行索引','列索引']

塊索引：df[['列索引1','列索引2'...]].loc[['行索引1','行索引2'...]]、df.loc[['行索引1','行索引2'...]][['列索引1','列索引2'...]]、df.loc[['行索引1','行索引2'...]],[['列索引1','列索引2'...]]

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df) print('--------------------------') print(df['two'].loc['b'] , df.loc['b']['two'] , df.loc['b','two']) print('--------------------------') print(df.loc[['a','c'],['one','four']]) # one two three four # a 0 1 2 3 # b 4 5 6 7 # c 8 9 10 11 # -------------------------- # 5 5 5 # -------------------------- # one four # a 0 3 # c 8 11 DataFrame單元格和塊索引

④布爾索引

如果對DataFrame進(jìn)行單列布爾索引，結(jié)果會顯示列中值為True所在的行。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) m1= df['one']>5 print(df) print('------------------------') print(m1) #索引c對應(yīng)的值為True print('------------------------') print(df[m1]) #顯示索引c所在的行，包括所有列 # one two three four # a 0 1 2 3 # b 4 5 6 7 # c 8 9 10 11 # ------------------------ # a False # b False # c True # Name: one, dtype: bool # ------------------------ # one two three four # c 8 9 10 11 DataFrame單列布爾索引

如果對多列或整個DataFrame進(jìn)行布爾索引，結(jié)果是一個與DataFrame結(jié)構(gòu)相同的DataFrame，其中索引列中符合條件的以實際值顯示，不符合條件的以NaN顯示。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) m1 = df[['one','three']] > 5 print(m1) print(df[m1]) #列one、three中符合條件的顯示實際值，其他都顯示為NaN # one three # a False False # b False True # c True True # one two three four # a NaN NaN NaN NaN # b NaN NaN 6.0 NaN # c 8.0 NaN 10.0 NaN DataFrame多列布爾索引

df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) m = df >5 print(m) print(df[m]) # one two three four # a False False False False # b False False True True # c True True True True # one two three four # a NaN NaN NaN NaN # b NaN NaN 6.0 7.0 # c 8.0 9.0 10.0 11.0 整個DataFrame布爾索引

（對行做布爾索引會報錯pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match）

3.DataFrame的常用方法

①.T轉(zhuǎn)置

DataFrame轉(zhuǎn)置會將原columns變?yōu)閕ndex，原index變?yōu)閏olumns，并且修改原DataFrame會修改轉(zhuǎn)置后的DataFrame，修改轉(zhuǎn)置后的DataFrame也會修改原DataFrame。

arr = np.arange(12).reshape(3,4) df1 = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) df2 = df1.T df1.loc['a','one'] = 100 print(df1) print(df2) df2.loc['two','b'] = 500 print(df1) print(df2) # one two three four # a 100 1 2 3 # b 4 5 6 7 # c 8 9 10 11 # a b c # one 100 4 8 # two 1 5 9 # three 2 6 10 # four 3 7 11 # one two three four # a 100 1 2 3 # b 4 500 6 7 # c 8 9 10 11 # a b c # one 100 4 8 # two 1 500 9 # three 2 6 10 # four 3 7 11 DataFrame轉(zhuǎn)置

②添加與修改

增加列：df['新列索引'] = [***]，元素的個數(shù)必須與DataFrame的行數(shù)相同，否則會報錯。

增加行：df.loc['新行索引'] = [***]，元素的個數(shù)必須與DataFrame的列數(shù)相同，否則會報錯。

修改DataFrame直接通過上一節(jié)單元格或塊索引的方式獲得單元格或塊，再修改即可。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df) df['five'] = [11,22,33] #元素個數(shù)必須與行數(shù)相同，否則會報錯 print(df) df.loc['d'] = [100,200,300,400,500] #元素個數(shù)必須與列數(shù)相同，否則會報錯 print(df) # one two three four # a 0 1 2 3 # b 4 5 6 7 # c 8 9 10 11 # one two three four five # a 0 1 2 3 11 # b 4 5 6 7 22 # c 8 9 10 11 33 # one two three four five # a 0 1 2 3 11 # b 4 5 6 7 22 # c 8 9 10 11 33 # d 100 200 300 400 500 DataFrame增加行或列

③刪除

del df['列索引'] 直接刪除原DataFrame的列

df.drop('索引',axis = 0,inplace = False)，drop可以刪除行也可以刪除列，默認(rèn)axis為0即默認(rèn)刪除行，為1則表示刪除列，如果給定的索引在行中或者列中不存在會報錯；

drop默認(rèn)生成新的DataFrame不改變原DataFrame，即inplace=False，如果inplace設(shè)置為True則不生成新的DataFrame，而是直接修改原DataFrame。

arr = np.arange(12).reshape(3,4) df = pd.DataFrame(arr,index = ['a','b','c'],columns = ['one','two','three','four']) print(df) del df['four'] print(df) #del刪除原DataFrame的列 f = df.drop('c') print(f) print(df) f = df.drop('three',axis=1,inplace=True) print(f) print(df) # one two three four # a 0 1 2 3 # b 4 5 6 7 # c 8 9 10 11 # one two three # a 0 1 2 # b 4 5 6 # c 8 9 10 # one two three # a 0 1 2 # b 4 5 6 # one two three # a 0 1 2 # b 4 5 6 # c 8 9 10 # None # one two # a 0 1 # b 4 5 # c 8 9 DataFrame刪除行或列

④相加

DataFrame與單個值相加或相減，對每個元素進(jìn)行加或減即可。

DataFrame之間相加或相減，不要求index和columns相同，對行和列對應(yīng)的部分加或減，多余的行和列都保留并且值全部為NaN。

arr1 = np.arange(12).reshape(3,4) arr2 = np.arange(12).reshape(4,3) df1 = pd.DataFrame(arr1,index = ['a','b','c'],columns = ['one','two','three','four']) df2 = pd.DataFrame(arr2,index = ['a','b','c','d'],columns = ['one','two','three']) print( df1 + 1 ) print( df1 + df2 ) # one two three four # a 1 2 3 4 # b 5 6 7 8 # c 9 10 11 12 # four one three two # a NaN 0.0 4.0 2.0 # b NaN 7.0 11.0 9.0 # c NaN 14.0 18.0 16.0 # d NaN NaN NaN NaN DataFrame相加或相減

⑤排序

按值排序：sort_values('列索引',ascending=True)，即對某一列的值按行排序，默認(rèn)升序排序，對多個列排序則用['列索引1','列索引2',...]

按index排序：sort_index(ascending=True)，按照index的名稱進(jìn)行排序，默認(rèn)升序。

arr = np.random.randint(1,10,[4,3]) df = pd.DataFrame(arr,index = ['a','b','c','d'],columns = ['one','two','three']) print(df) print(df.sort_values(['one','three'],ascending=True)) print(df.sort_index(ascending=False)) # one two three # a 7 7 1 # b 5 7 1 # c 1 9 4 # d 7 9 9 # one two three # c 1 9 4 # b 5 7 1 # a 7 7 1 # d 7 9 9 # one two three # d 7 9 9 # c 1 9 4 # b 5 7 1 # a 7 7 1 DataFrame排序

轉(zhuǎn)載于:https://www.cnblogs.com/Forever77/p/11209186.html

總結(jié)

以上是生活随笔為你收集整理的pandas之Seris和DataFrame的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。