當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

pandas库基础学习

發(fā)布時間：2024/9/19 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 pandas库基础学习小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.

pandas庫基礎(chǔ)學(xué)習(xí)

1.Pandas模塊的數(shù)據(jù)結(jié)構(gòu)

Pandas模塊的數(shù)據(jù)結(jié)構(gòu)主要有兩種：

Series
DataFrame

Series 是一維數(shù)組，基于Numpy的ndarray 結(jié)構(gòu)

DataFrame是Pandas庫中的一種數(shù)據(jù)結(jié)構(gòu)，它類似excel，是一種二維表。

1.1Series

創(chuàng)建Series

arr = pd.Series([1, 2, -3, 4, -5, np.nan]) arr2 = pd.Series(np.arange(6)) d = {'a':1,'b':2,'c':3,'d':4,'e':5} arr3 = pd.Series(d)

Series的屬性

arr = pd.Series([1, 2, -3, 4, -5, np.nan]) arr.values #array([ 1., 2., -3., 4., -5., nan]) arr.index #RangeIndex(start=0, stop=6, step=1)#維度 df.shape

根據(jù)下標(biāo)取值

arr = pd.Series([1, 2, -3, 4, -5, np.nan]) arr[3:] arr.iloc[3:]

統(tǒng)計個數(shù)

obj = pd.Series(['Bob', 'Steve', 'Jeff', 'Ryan', 'Jeff', 'Ryan']) obj.value_counts()#結(jié)果 #Jeff 2 #Ryan 2 #Bob 1 #Steve 1 #dtype: int64

排序

arr = pd.Series([1, 2, -3, 4, -5, np.nan]) #根據(jù)值排序 arr.sort_values() #根據(jù)index排序 arr.sort_index()

1.2DataFrame

創(chuàng)建DataFrame

df1 = pd.DataFrame(np.random.randn(3, 3), index=list('abc'), columns=list('ABC')) print(df1) #在DataFrame中每一行列是一個Series # A B C #a -0.454419 -0.606726 0.499842 #b -0.666458 1.231203 -1.460624 #c -0.338414 -1.550477 -0.517511#----------------------------------------------------------------------------- #直接獲取是按columns來操作的，如：['列名']['下標(biāo)名或者下標(biāo)序號'] type(df1['A']) <class 'pandas.core.series.Series'> df1['A']['a'] #-0.454419 df1['A'][0] #-0.454419

DataFrame的基礎(chǔ)屬性

df1.dtypes df1.index #Index(['a', 'b', 'c'], dtype='object') df1.columns #Index(['A', 'B', 'C'], dtype='object')

iloc操作

df1 = pd.DataFrame(np.random.randn(3, 3), index=list('abc'), columns=list('ABC')) print(df1) # A B C #a -0.454419 -0.606726 0.499842 #b -0.666458 1.231203 -1.460624 #c -0.338414 -1.550477 -0.517511#------------------------------------------------------------- #iloc是按行來操作的，并且只能傳數(shù)字，不接受字符串,如：[行下標(biāo),'列下標(biāo)或名稱'] #沒辦法['行名稱'，'列名稱'] df1.iloc[0] #指定某一行（Series） # A B C # -0.454419 -0.606726 0.499842 type(df1.iloc[0]) #<class 'pandas.core.series.Series'>#------------------------------------------------------------- df1.iloc[[0]] #指定某一行（DataFrame），[[]]特指二維，所以是DataFrame # A B C #a -0.454419 -0.606726 0.499842 type(df1.iloc[[0]]) #<class 'pandas.core.frame.DataFrame'>#------------------------------------------------------------- df.iloc[[0, 1]] #指定多行（DataFrame） # A B C #a -0.454419 -0.606726 0.499842 #b -0.666458 1.231203 -1.460624#------------------------------------------------------------- df1.iloc[0:2] #指定多行（DataFrame） # A B C #a -0.454419 -0.606726 0.499842 #b -0.666458 1.231203 -1.460624 type(df1.iloc[0:2]) #<class 'pandas.core.frame.DataFrame'>#------------------------------------------------------------- df1.iloc[[True, False, True]] #還能通過Boolean列表指定取舍 # A B C #a -0.454419 -0.606726 0.499842 #c -0.338414 -1.550477 -0.517511#------------------------------------------------------------- df1.iloc[0,2] #通過行號列號獲取指定值 #0.499842#------------------------------------------------------------- df.iloc[lambda x: x.index == 'a'] #甚至能lambda來指定行列 # A B C #a -0.454419 -0.606726 0.499842

查看前幾行或后幾行

df1.head() df1.tail()

其他表格操作

#統(tǒng)計描述 df1.describe() # A B C #count 3.000000 3.000000 3.000000 #mean -0.486430 -0.308667 -0.492765 #std 0.166348 1.414590 0.980467 #min -0.666458 -1.550477 -1.460624 #25% -0.560438 -1.078601 -0.989068 #50% -0.454419 -0.606726 -0.517511 #75% -0.396417 0.312238 -0.008835 #max -0.338414 1.231203 0.499842 #------------------------------------------------------------- #列求和（每一列的和） df1.sum(axis=0) #行求和 df1.sum(axis=1)#------------------------------------------------------------- #轉(zhuǎn)置 df1.T#------------------------------------------------------------- #運(yùn)算 df1.apply(lambda x: x * 2)#------------------------------------------------------------- #空值填充（零填充） df.fillna(value=0) #空值填充（均值填充） df['prince'].fillna(df['prince'].mean())#------------------------------------------------------------- #尋找空值(返回boolean列表) data1 = df['字段1'].isnull() #非空值(返回boolean列表) data2 = df['字段2'].notnull()#------------------------------------------------------------- #去除空值 data_notnull = df[df['字段2'].notnull()]#------------------------------------------------------------- #尋找0值(返回boolean列表) index = airline_data['字段']==0#------------------------------------------------------------- #對表格數(shù)據(jù)集進(jìn)行function運(yùn)算 df['字段1'].map(function) #map操作 df.apply(function, axis=0) #默認(rèn)按列（這里每列數(shù)據(jù)作為一個Series，對Series做function運(yùn)算） #給個例子（列的每個數(shù)據(jù)開根） #>>> df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B']) #>>> df # A B #0 4 9 #1 4 9 #2 4 9 #>>> df.apply(np.sqrt) # A B #0 2.0 3.0 #1 2.0 3.0 #2 2.0 3.0#------------------------------------------------------------- #清除空格字段 df['字段1']=df['字段1'].map(str.strip)#------------------------------------------------------------- #大小寫轉(zhuǎn)換 df['字段1']=df['字段1'].str.lower()#------------------------------------------------------------- #改數(shù)據(jù)類型 df['字段1'].astype('int') #------------------------------------------------------------- #更改列名 df.rename(columns={'oldname': 'newname'}) #------------------------------------------------------------- #滾動窗口求和（每三個數(shù)據(jù)求一次和） df['字段'].rolling(3).sum() #pandas.rolling_sum(arg, window, min_periods=None, freq=None, center=False, how=None, **kwargs) #arg : 為Series或DataFrame #window : 窗口的大小 #min_periods : 最小的觀察數(shù)值個數(shù) pd.rolling_sum(df,window=3)#------------------------------------------------------------- #求最小值的下標(biāo) df = pd.DataFrame({'consumption': [10.51, 103.11, 55.48], ... 'co2_emissions': [37.2, 19.66, 1712]}, ... index=['Pork', 'Wheat Products', 'Beef']) # consumption co2_emissions #Pork 10.51 37.20 #Wheat Products 103.11 19.66 #Beef 55.48 1712.00 df.idxmin() #consumption Pork #co2_emissions Wheat Products

2.文件讀取操作

import pandas as pd data = pd.read_csv('文件名'，encoding='GB18030') df = pd.DataFrame(pd.read_csv('name.csv',header=1))pd.read_excel('name.xlsx') df = pd.DataFrame(pd.read_excel('name.xlsx'))

總結(jié)

以上是生活随笔為你收集整理的pandas库基础学习的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯，歡迎將生活随笔推薦給好友。

上一篇：小米平板3跑分多少?联发科MT8176性
下一篇：酷派大神f2—fhd自带root教程