當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

艺赛旗（RPA） Pandas 类库基础知识

發布時間：2023/12/14 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了艺赛旗（RPA） Pandas 类库基础知识小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

藝賽旗 RPA8.0全新首發免費下載點擊下載
http://www.i-search.com.cn/index.html?from=line1 詳細內容請參看藝賽旗官網支持欄目：RPA社區
點擊鏈接進入 http://support.i-search.com.cn/

上次有講過 numpy 的基礎知識，這次來講講 Pandas 的基礎知識。

Series
s1
Out[16]:
0 a
1 b
2 c
3 d
沒有指定索引時，會默認生成一個從 0 開始到 N-1 的整型索引。

Series 會根據傳入的 list 序列中元素的類型判斷 Series 對象的數據類型，如果全部都是整型，則創建的 Series 對象是整型，如果有一個元素是浮點型，則創建的 Series 對象是浮點型，如果有一個是字符串，則創建的 Series 對象是 object 類型。

s1 = Series([1,2,3,4])
s1
Out[23]:
0 1
1 2
2 3
3 4
dtype: int64
s2 = Series([1,2,3,4.0])
s2
Out[25]:
0 1.0
1 2.0
2 3.0
3 4.0
dtype: float64
s3 = Series([1,2,3,‘4’])
s3
Out[27]:
0 1
1 2
2 3
3 4
dtype: object
除了通過 list 序列創建 Series 對象外，還可以通過 dict 創建 Series 對象。

s1 = Series({‘a’:1,‘b’:2,‘c’:3,‘d’:4})
s1
Out[37]:
a 1
b 2
c 3
d 4
dtype: int64
通過 dict 詞典創建 Series 對象時，會將詞典的鍵初始化 Series 的 Index，而 dict 的 value 初始化 Series 的 value。

Series 還支持傳入一個 dict 詞典和一個 list 序列創建 Series 對象：

dict1 = {‘a’:1,‘b’:2,‘c’:3,‘d’:4}
index1 = [‘a’,‘b’,‘e’]
s1 = Series(dict1,index=index1)
s1
Out[51]:
a 1.0
b 2.0
e NaN
dtype: float64
上面的代碼中，指定了創建的 Series 對象 s1 的索引是 index1，即’a’,‘b’和’e’。s1 的值是 dict1 中和 index1 索引相匹配的值，如果不匹配，則顯示 NaN。例如索引’e’和 dict1 中的鍵沒有相匹配的，則索引’e’的值為 NaN。索引’a’和索引’b’都匹配得上，因此值為 1 和 2。

Series 通過索引訪問值：

s1 = Series({‘a’:1,‘b’:2,‘c’:3,‘d’:4})
s1
Out[39]:
a 1
b 2
c 3
d 4
dtype: int64
s1[‘b’]
Out[40]: 2
上面代碼中通過 s1[‘b’] 就可以訪問到索引 b 對應的值。

Series 支持邏輯和數學運算：

s1 = Series([2,5,-10,200])
s1 * 2
Out[53]:
0 4
1 10
2 -20
3 400
dtype: int64
s1[s1>0]
Out[54]:
0 2
1 5
3 200
dtype: int64
對 Series 變量做數學運算，會作用于 Series 對象中的每一個元素。

s1 = Series([2,5,-10,200])
s1[s1>0]
Out[7]:
0 2
1 5
3 200
dtype: int64
對 Series 做邏輯運算時，會將 Series 中的值替換為 bool 類型的對象。

s1 = Series([2,5,-10,200])
s1
Out[10]:
0 2
1 5
2 -10
3 200
dtype: int64
s1 > 0
Out[11]:
0 True
1 True
2 False
3 True
dtype: bool
通過 series 的邏輯運算，可以過濾掉一些不符合條件的數據，例如過濾掉上面例子中小于 0 的元素：

s1 = Series([2,5,-10,200])
s1[s1 >0]
Out[23]:
0 2
1 5
3 200
dtype: int64
Series 對象和索引都有一個 name 屬性，通過下面的方法可以設置 Series 對象和索引的 name 值：

fruit = {0:‘apple’,1:‘orange’,2:‘banana’}
fruitSeries = Series(fruit)
fruitSeries.name=‘Fruit’
fruitSeries
Out[27]:
0 apple
1 orange
2 banana
Name: Fruit, dtype: object
fruitSeries.index.name=‘Fruit Index’
fruitSeries
Out[29]:
Fruit Index
0 apple
1 orange
2 banana
Name: Fruit, dtype: object
可以通過 index 復制方式直接修改 Series 對象的 index：

fruitSeries.index=[‘a’,‘b’,‘c’]
fruitSeries
Out[31]:
a apple
b orange
c banana
Name: Fruit, dtype: object
DataFrame
DataFrame 是表格型的數據結構，和關系型數據庫中的表很像，都是行和列組成，有列名，索引等屬性。

我們可以認為 DataFrame 中的列其實就是上面提到的 Series，有多少列就有多少個 Series 對象，它們共享同一個索引 index。

通過 dict 字典創建 DataFrame 對象：

data = {‘fruit’:[‘Apple’,‘Apple’,‘Orange’,‘Orange’,‘Banana’],
‘year’:[2010,2011,2012,2011,2012],
‘sale’:[15000,17000,36000,24000,29000]}
frame = DataFrame(data)
frame
Out[12]:
fruit year sale
0 Apple 2010 15000
1 Apple 2011 17000
2 Orange 2012 36000
3 Orange 2011 24000
4 Banana 2012 29000
使用上面的方式創建 DataFrame 對象時，字典中每個元素的 value 值必須是列表，并且長度必須一致，如果長度不一致會報錯。例如 key 為 fruit、year、sale 對應的列表長度必須一致。

創建 DataFrame 對象和會創建 Series 對象一樣自動加上索引。

通過傳入 columns 參數指定列的順序：

DataFrame 的 index 也是可以修改的，同樣傳入一個列表：

frame = DataFrame(data,columns=[‘sale’,‘fruit’,‘year’],index=[4,3,2,1,0])
frame
Out[22]:
sale fruit year
4 15000 Apple 2010
3 17000 Apple 2011
2 36000 Orange 2012
1 24000 Orange 2011
0 29000 Banana 2012
通過傳入的 [4,3,2,1,0] 就將原來的 index 從 0,1,2,3,4 改變為 4,3,2,1,0。

通過 DataFrame 對象獲取 Series 對象：

frame[‘year’]
Out[26]:
0 2010
1 2011
2 2012
3 2011
4 2012
Name: year, dtype: int64
frame[‘fruit’]
Out[27]:
0 Apple
1 Apple
2 Orange
3 Orange
4 Banana
Name: fruit, dtype: object
frame[‘fruit’] 和 frame.fruit 都可以獲取列，并且返回的是 Series 對象。

DataFrame 賦值，就是對列賦值，首先獲取 DataFrame 對象中某列的 Series 對象，然后通過賦值的方式就可以修改列的值：

data = {‘fruit’:[‘Apple’,‘Apple’,‘Orange’,‘Orange’,‘Banana’],
‘year’:[2010,2011,2012,2011,2012],
‘sale’:[15000,17000,36000,24000,29000]}
frame = DataFrame(data,columns=[‘sale’,‘fruit’,‘year’,‘price’])
frame
Out[24]:
sale fruit year price
0 15000 Apple 2010 NaN
1 17000 Apple 2011 NaN
2 36000 Orange 2012 NaN
3 24000 Orange 2011 NaN
4 29000 Banana 2012 NaN
frame[‘price’] = 20
frame
Out[26]:
sale fruit year price
0 15000 Apple 2010 20
1 17000 Apple 2011 20
2 36000 Orange 2012 20
3 24000 Orange 2011 20
4 29000 Banana 2012 20
frame.price = 40
frame
Out[28]:
sale fruit year price
0 15000 Apple 2010 40
1 17000 Apple 2011 40
2 36000 Orange 2012 40
3 24000 Orange 2011 40
4 29000 Banana 2012 40
frame.price=np.arange(5)
frame
Out[30]:
sale fruit year price
0 15000 Apple 2010 0
1 17000 Apple 2011 1
2 36000 Orange 2012 2
3 24000 Orange 2011 3
4 29000 Banana 2012 4
通過 frame[‘price’] 或者 frame.price 獲取 price 列，然后通過 frame[‘price’]=20 或 frame.price=20 就可以將 price 列都賦值為 20。

也可以通過 numpy 的 arange 方法進行賦值。如上面的代碼所示。

可以通過 Series 給 DataFrame 對象賦值：

data = {‘fruit’:[‘Apple’,‘Apple’,‘Orange’,‘Orange’,‘Banana’],
‘year’:[2010,2011,2012,2011,2012],
‘sale’:[15000,17000,36000,24000,29000]}
frame = DataFrame(data,columns=[‘sale’,‘fruit’,‘year’,‘price’])
frame
Out[6]:
sale fruit year price
0 15000 Apple 2010 NaN
1 17000 Apple 2011 NaN
2 36000 Orange 2012 NaN
3 24000 Orange 2011 NaN
4 29000 Banana 2012 NaN
priceSeries = Series([3.4,4.2,2.4],index = [1,2,4])
frame.price = priceSeries
frame
Out[9]:
sale fruit year price
0 15000 Apple 2010 NaN
1 17000 Apple 2011 3.4
2 36000 Orange 2012 4.2
3 24000 Orange 2011 NaN
4 29000 Banana 2012 2.4
這種賦值方式，DataFrame 的索引會和 Series 的索引自動匹配，在對應的索引位置賦值，匹配不上的位置將填上缺失值 NaN。

創建的 Series 對象如果不指定索引時的賦值結果：

priceSeries = Series([3.4,4.2,2.4])
frame.price = priceSeries
frame
Out[12]:
sale fruit year price
0 15000 Apple 2010 3.4
1 17000 Apple 2011 4.2
2 36000 Orange 2012 2.4
3 24000 Orange 2011 NaN
4 29000 Banana 2012 NaN
DataFrame 還支持通過列表或者數組的方式給列賦值，但是必須保證兩者的長度一致：

priceList=[3.4,2.4,4.6,3.8,7.3]
frame.price=priceList
frame
Out[15]:
sale fruit year price
0 15000 Apple 2010 3.4
1 17000 Apple 2011 2.4
2 36000 Orange 2012 4.6
3 24000 Orange 2011 3.8
4 29000 Banana 2012 7.3
priceList=[3.4,2.4,4.6,3.8,7.3]
frame.price=priceList
賦值的列如果不存在時，相當于創建出一個新列：

frame[‘total’] = 30000
frame
Out[45]:
sale fruit year price total
0 15000 Apple 2010 3.4 30000
1 17000 Apple 2011 2.4 30000
2 36000 Orange 2012 4.6 30000
3 24000 Orange 2011 3.8 30000
4 29000 Banana 2012 7.3 30000
上面的例子通過給不存在的列賦值，新增了新列 total。必須使用 frame[‘total’] 的方式賦值，不建議使用 frame.total，使用 frame. 的方式給不存在的列賦值時，這個列會隱藏起來，直接輸出 DataFrame 對象是不會看到這個 total 這個列的，但是它又真實的存在，下面的代碼是分別使用 frame[‘total’] 和 frame.total 給 frame 對象的 total 列賦值，total 列開始是不存在的：

frame
Out[60]:
sale fruit year price
0 15000 Apple 2010 3.4
1 17000 Apple 2011 2.4
2 36000 Orange 2012 4.6
3 24000 Orange 2011 3.8
4 29000 Banana 2012 7.3
frame.total = 20
frame
Out[62]:
sale fruit year price
0 15000 Apple 2010 3.4
1 17000 Apple 2011 2.4
2 36000 Orange 2012 4.6
3 24000 Orange 2011 3.8
4 29000 Banana 2012 7.3
frame[‘total’] = 20
frame
Out[64]:
sale fruit year price total
0 15000 Apple 2010 3.4 20
1 17000 Apple 2011 2.4 20
2 36000 Orange 2012 4.6 20
3 24000 Orange 2011 3.8 20
4 29000 Banana 2012 7.3 20
使用 frame.total 方式賦值時，是看不到 total 這一列的，而用 frame[‘total’] 方式賦值時，則可以看到 total 這一列。

上面的知識有錯誤的地方還請大家及時提出，以便糾正。謝謝！

總結

以上是生活随笔為你收集整理的艺赛旗（RPA） Pandas 类库基础知识的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：模板语言(VTL):入门
下一篇： RPA-艺赛旗iS-RPA Studio