pandas库Series使用和ix、loc、iloc基础用法
1. pandas庫Series基礎用法:
直接貼出用例:
1. 構造/初始化Series的3種方法:
(1)用列表list構建Series
import pandas as pd my_list=[7,'Beijing','19大',3.1415,-10000,'Happy'] s=pd.Series(my_list) print(type(s)) print(s) <class 'pandas.core.series.Series'> 0 7 1 Beijing 2 19大 3 3.1415 4 -10000 5 Happy dtype: objectpandas會默認用0到n來做Series的index,但是我們也可以自己指定index,index可以理解為dict里面的key
s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'], index=['A','B','C','D','E','F']) print(s) A 7 B Beijing C 19大 D 3.1415 E -10000 F Happy dtype: object(2)用字典dict來構建Series,因為Series本身其實就是key-value的結構
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') print(apts) Beijing 55000.0 Guangzhou 45000.0 Hangzhou 20000.0 Shanghai 60000.0 Suzhou NaN shenzhen 50000.0 Name: income, dtype: float64(3)用numpy array來構建Series
import numpy as np d=pd.Series(np.random.randn(5),index=['a','b','c','d','e']) print(d) a -0.329401 b -0.435921 c -0.232267 d -0.846713 e -0.406585 dtype: float64以上還是比較容易理解的。
2. Series選擇數據
(1)可以像對待一個list一樣對待一個Series,完成各種切片的操作
import pandas as pd cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') print('apts:\n',apts) print('apts[3]:\n',apts[3]) print('apts[[3,4,1]]:\n',apts[[3,4,1]]) print('apts[:-1]:\n',apts[:-1]) print('apts[1:]+apts[:-1]:\n',apts[1:]+apts[:-1]) apts:Beijing 55000.0 Shanghai 60000.0 shenzhen 50000.0 Hangzhou 20000.0 Guangzhou 45000.0 Suzhou NaN Name: income, dtype: float64 apts[3]:20000.0 apts[[3,4,1]]:Hangzhou 20000.0 Guangzhou 45000.0 Shanghai 60000.0 Name: income, dtype: float64 apts[:-1]:Beijing 55000.0 Shanghai 60000.0 shenzhen 50000.0 Hangzhou 20000.0 Guangzhou 45000.0 Name: income, dtype: float64 apts[1:]+apts[:-1]:Beijing NaN Guangzhou 90000.0 Hangzhou 40000.0 Shanghai 120000.0 Suzhou NaN shenzhen 100000.0 Name: income, dtype: float64(2)Series可以用來選擇數據
import pandas as pd cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') print(apts['Shanghai']) print('Hangzhou' in apts) print('Choingqing' in apts) 60000.0 True False(3)和numpy很像,可以使用numpy的各種函數mean,median,max,min
import pandas as pd cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') less_than_50000=(apts<=50000) print(apts[less_than_50000]) print(apts.mean()) Guangzhou 45000.0 Hangzhou 20000.0 shenzhen 50000.0 Name: income, dtype: float6446000.03. Series元素賦值
直接利用索引值賦值,boolean indexing,在賦值里它也可以用
import pandas as pd cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') print(apts) print('Old income of shenzhen:{}'.format(apts['shenzhen'])) apts['shenzhen']=70000 print('New income of shenzhen:{}'.format(apts['shenzhen']),'\n') less_than_50000=(apts<50000) print(less_than_50000) apts[less_than_50000]=40000 print(apts) Beijing 55000.0 Shanghai 60000.0 shenzhen 50000.0 Hangzhou 20000.0 Guangzhou 45000.0 Suzhou NaN Name: income, dtype: float64 Old income of shenzhen:50000.0 New income of shenzhen:70000.0 Beijing False Shanghai False shenzhen False Hangzhou True Guangzhou True Suzhou False Name: income, dtype: bool Beijing 55000.0 Shanghai 60000.0 shenzhen 70000.0 Hangzhou 40000.0 Guangzhou 40000.0 Suzhou NaN Name: income, dtype: float644. Series數據缺失的簡單應用
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None} apts=pd.Series(cities,name='income') apts['shenzhen']=70000 less_than_50000=(apts<50000) apts[less_than_50000]=40000 print('apts:\n',apts,'\n') print(apts.notnull()) # boolean條件 print(apts.isnull()) print(apts[apts.isnull()]) #利用缺失索引布爾值取元素 apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000}) print('apts2:\n',apts2) apts3=apts+apts2 #索引缺失相加 print('apts3:\n',apts3) apts3[apts3.isnull()]=300 #將缺失位置賦值為中值 print(apts3) apts:Beijing 55000.0 Shanghai 60000.0 shenzhen 70000.0 Hangzhou 40000.0 Guangzhou 40000.0 Suzhou NaN Name: income, dtype: float64 Beijing True Shanghai True shenzhen True Hangzhou True Guangzhou True Suzhou False Name: income, dtype: bool Beijing False Shanghai False shenzhen False Hangzhou False Guangzhou False Suzhou True Name: income, dtype: bool Suzhou NaN Name: income, dtype: float64 apts2:Beijing 10000 Shanghai 8000 shenzhen 6000 Tianjin 40000 Guangzhou 7000 Chongqing 30000 dtype: int64 apts3:Beijing 65000.0 Chongqing NaN Guangzhou 47000.0 Hangzhou NaN Shanghai 68000.0 Suzhou NaN Tianjin NaN shenzhen 76000.0 dtype: float64 Beijing 65000.0 Chongqing 300.0 Guangzhou 47000.0 Hangzhou 300.0 Shanghai 68000.0 Suzhou 300.0 Tianjin 300.0 shenzhen 76000.0 dtype: float642. Pandas中ix,loc,iloc的區別
import pandas as pd import numpy as npdata = pd.Series(np.arange(10), index=[49,48,47,46,45, 1, 2, 3, 4, 5])print('data:\n',data,'\n') print('data.iloc[:3]:\n',data.iloc[:3],'\n') print('data.loc[:3]:\n',data.loc[:3],'\n') print('data.ix[:3]:\n',data.ix[:3],'\n') data:49 0 48 1 47 2 46 3 45 4 1 5 2 6 3 7 4 8 5 9 dtype: int64 data.iloc[:3]:49 0 48 1 47 2 dtype: int64 data.loc[:3]:49 0 48 1 47 2 46 3 45 4 1 5 2 6 3 7 dtype: int64 data.ix[:3]:49 0 48 1 47 2 46 3 45 4 1 5 2 6 3 7 dtype: int64loc:在index的標簽上進行索引(即是在index上尋找相應的標簽,不是下標),范圍包括start和end。
iloc:在index的位置上進行索引(即是按照普通的下標尋找),不包括end.
ix:先在index的標簽上索引,索引不到就在index的位置上索引(如果index非全整數),不包括end。
為了避免歧義,建議優先選擇loc和iloc
這里算是一個pandas的語法筆記。。
參考:
https://blog.csdn.net/cymy001/article/details/78268721
https://blog.csdn.net/zeroder/article/details/54319021
總結
以上是生活随笔為你收集整理的pandas库Series使用和ix、loc、iloc基础用法的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Tensorflow模型的保存与恢复的细
- 下一篇: 中央空调打孔害人吗物业不让打孔不让装中央