當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Pandas基础复习-DataFrame

發(fā)布時間：2025/6/15 编程问答 24 豆豆

生活随笔收集整理的這篇文章主要介紹了 Pandas基础复习-DataFrame 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

數據類型-DataFrame

DataFrame是由多個Series數據列組成的表格數據類型，每行Series值都增加了一個共用的索引
既有行索引，又有列索引
- 行索引，表明不同行，橫向索引，叫index，0軸，axis=0
- 列索引，表名不同列，縱向索引，叫columns，1軸，axis=1
DataFrame數據類型可視為：二維帶標簽數組
每列值的類型可以不同
基本操作類似Series，依據行列索引操作
常用于表達二維數據，但也可以表達多維數據(Dataframe嵌套，極少用)

DataFrame數據類型創(chuàng)建

Python list列表 創(chuàng)建DataFrame

import pandas as pddf = pd.DataFrame([True, 1, 2.3, 'a', '你好']) # 1維 df 001234

True
1
2.3
a
你好

True

2.3

你好

df = pd.DataFrame([[True,1,2.3,'a','你好'],[1,2,3,4,5]]) #2維 df 0123401

True	1	2.3	a	你好
1	2	3.0	4	5

# 3維，不建議 df = pd.DataFrame([[[True,1,2.3,'a','你好'],[1,2,3,4,5]],[[True,1,2.3,'a','你好'],[1,2,3,4,5]]]) df 0101

[True, 1, 2.3, a, 你好]	[1, 2, 3, 4, 5]
[True, 1, 2.3, a, 你好]	[1, 2, 3, 4, 5]

Python 字典 創(chuàng)建DataFrame

df = pd.DataFrame({'one':[1,2,3,4],'two':[9,8,7,6]}) df onetwo0123

1	9
2	8
3	7
4	6

# 自定義行索引 df = pd.DataFrame({'one':[1,2,3,4],'two':[9,8,7,6]},index = ['a','b','c','d']) df onetwoabcd

1	9
2	8
3	7
4	6

df = pd.DataFrame({'A' : 1,'B' : 2.3,'C' : ['x','y',5] #需要多行 }) df ABC012

1	2.3	x
1	2.3	y
1	2.3	5

dt = {'one' : pd.Series([1,2,3],index=['a','b','c']),'two' : pd.Series([9,8,7,6],index=['a','b','c','d',]) } dt {'one': a 1b 2c 3dtype: int64, 'two': a 9b 8c 7d 6dtype: int64} # one two自動列索引,abcd自動行索引.每個元素對應DataFrame的一列,每個元素內的鍵值對應一行 d = pd.DataFrame(dt) d onetwoabcd

1.0	9
2.0	8
3.0	7
NaN	6

# 數據根據行列索引自動補齊 d_2 = pd.DataFrame(dt,index=['b','c','d'],columns=['two','three']) d_2 twothreebcd

8	NaN
7	NaN
6	NaN

ndarray數組 創(chuàng)建DataFrame

import numpy as npdf = pd.DataFrame(np.arange(10).reshape(2,5)) # 自動生成行/列索引 df 0123401

0	1	2	3	4
5	6	7	8	9

# 自定義行列索引 df = pd.DataFrame(np.random.randn(6,4),index=[1,2,3,4,5,6],columns=['a','b','c','d']) df abcd123456

0.274340	0.296507	0.751198	0.763512
0.181134	0.675380	0.553695	0.632163
-0.059765	0.347702	1.138297	-0.143998
-1.370677	-0.951640	0.135964	-0.665875
1.490610	0.420539	0.628784	2.119896
-1.669737	1.167765	1.254722	-0.948624

Series 創(chuàng)建DataFrame

e = pd.DataFrame([pd.Series([1,2,3]),pd.Series([9,8,7,6])],index=['a','b']) e 0123ab

1.0	2.0	3.0	NaN
9.0	8.0	7.0	6.0

DataFrame屬性

di = {'姓名':['張三','李四','王五','趙六'],'性別':['男','女','女','男'],'年齡':[12,22,32,42],'地址':['北京','上海','廣州','深圳'] } di {'地址': ['北京', '上海', '廣州', '深圳'],'姓名': ['張三', '李四', '王五', '趙六'],'年齡': [12, 22, 32, 42],'性別': ['男', '女', '女', '男']} d = pd.DataFrame(di,index=['d1','d2','d3','d4']) d 地址姓名年齡性別d1d2d3d4

北京	張三	12	男
上海	李四	22	女
廣州	王五	32	女
深圳	趙六	42	男

d.head() # 顯示頭部幾行地址姓名年齡性別d1d2d3d4

北京	張三	12	男
上海	李四	22	女
廣州	王五	32	女
深圳	趙六	42	男

d.tail(3) # 顯示末尾幾行地址姓名年齡性別d2d3d4

上海	李四	22	女
廣州	王五	32	女
深圳	趙六	42	男

d.info() # 相關信息概覽 <class 'pandas.core.frame.DataFrame'> Index: 4 entries, d1 to d4 Data columns (total 4 columns): 地址 4 non-null object 姓名 4 non-null object 年齡 4 non-null int64 性別 4 non-null object dtypes: int64(1), object(3) memory usage: 160.0+ bytes d.shape # 行數列數 (4, 4) d.dtypes # 列數據類型地址 object 姓名 object 年齡 int64 性別 object dtype: object d.index # 獲取行索引 Index(['d1', 'd2', 'd3', 'd4'], dtype='object') d.columns # 獲取列索引 Index(['地址', '姓名', '年齡', '性別'], dtype='object') d.values # 獲取值 array([['北京', '張三', 12, '男'],['上海', '李四', 22, '女'],['廣州', '王五', 32, '女'],['深圳', '趙六', 42, '男']], dtype=object)

DataFrame查增改刪

查 Read

類list/ndarray數據訪問方式

dates = pd.date_range('20130101',periods=10) dates DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08','2013-01-09', '2013-01-10'],dtype='datetime64[ns]', freq='D') df = pd.DataFrame(np.random.randn(10,4),index=dates,columns=['A','B','C','D']) df ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106
0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
-0.950517	-0.226887	-0.097138	-0.442010
0.076178	-0.518970	1.142290	-0.952401
1.371702	-1.028873	-1.470106	-0.113098
0.126720	-0.251519	-2.212507	1.050036
-1.246918	1.530266	1.761499	0.940741
0.941099	-2.420932	1.927863	-0.549143
1.951555	-0.264012	-0.171690	0.869293

#索引 df['A'] 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 df.A 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 df['A']['2013-01-01'] # 先列后行 0.75407705661157032 df.A['2013-01-01'] 0.75407705661157032 df[['A','C']] AC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.557050
0.103394	-0.413054
0.174730	1.781379
-0.950517	-0.097138
0.076178	1.142290
1.371702	-1.470106
0.126720	-2.212507
-1.246918	1.761499
0.941099	1.927863
1.951555	-0.171690

Pandas專用的數據訪問方式 — .loc 通過自定義索引獲取數據 #選取某行 df.loc['2013-01-01'] A 0.754077 B -0.346202 C -0.557050 D 0.778106 Name: 2013-01-01 00:00:00, dtype: float64 #選取某列 df.loc[:,'A'] 2013-01-01 0.754077 2013-01-02 0.103394 2013-01-03 0.174730 2013-01-04 -0.950517 2013-01-05 0.076178 2013-01-06 1.371702 2013-01-07 0.126720 2013-01-08 -1.246918 2013-01-09 0.941099 2013-01-10 1.951555 Freq: D, Name: A, dtype: float64 # 選取特定值 df.loc['2013-01-01','A'] # 先行后列 0.75407705661157032 # 選取指定的行/列 df.loc[[dates[0],dates[2]],:] # 指定行 ABCD2013-01-012013-01-03

0.754077	-0.346202	-0.557050	0.778106
0.174730	2.056007	1.781379	1.643397

df.loc[:,['A','B']] # 指定列 AB2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202
0.103394	-1.051044
0.174730	2.056007
-0.950517	-0.226887
0.076178	-0.518970
1.371702	-1.028873
0.126720	-0.251519
-1.246918	1.530266
0.941099	-2.420932
1.951555	-0.264012

df.loc[[dates[0],dates[2]],['A','B']] # 指定行列 AB2013-01-012013-01-03

0.754077	-0.346202
0.174730	2.056007

# 切片 df.loc['2013-01-01':'2013-01-04',:] # 對行切片 ABCD2013-01-012013-01-022013-01-032013-01-04

0.754077	-0.346202	-0.557050	0.778106
0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
-0.950517	-0.226887	-0.097138	-0.442010

df.loc[:,'A':'C'] # 對列切片 ABC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050
0.103394	-1.051044	-0.413054
0.174730	2.056007	1.781379
-0.950517	-0.226887	-0.097138
0.076178	-0.518970	1.142290
1.371702	-1.028873	-1.470106
0.126720	-0.251519	-2.212507
-1.246918	1.530266	1.761499
0.941099	-2.420932	1.927863
1.951555	-0.264012	-0.171690

# 切片選取連續(xù)區(qū)塊。行，列。左開右閉 df.loc['2013-01-01':'2013-01-04','A':'C'] ABC2013-01-012013-01-022013-01-032013-01-04

0.754077	-0.346202	-0.557050
0.103394	-1.051044	-0.413054
0.174730	2.056007	1.781379
-0.950517	-0.226887	-0.097138

.iloc 通過默認索引獲取數據

# 選取某行 df.iloc[3] A -0.950517 B -0.226887 C -0.097138 D -0.442010 Name: 2013-01-04 00:00:00, dtype: float64 # 選取某列 df.iloc[:,2] 2013-01-01 -0.557050 2013-01-02 -0.413054 2013-01-03 1.781379 2013-01-04 -0.097138 2013-01-05 1.142290 2013-01-06 -1.470106 2013-01-07 -2.212507 2013-01-08 1.761499 2013-01-09 1.927863 2013-01-10 -0.171690 Freq: D, Name: C, dtype: float64 # 選取特定值: df.iloc[1,2] -0.41305425875508139 # 選取指定的行/列 df.iloc[[1,2,4],:] # 指定行 ABCD2013-01-022013-01-032013-01-05

0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
0.076178	-0.518970	1.142290	-0.952401

df.iloc[:,[0,2]] # 指定列 AC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.557050
0.103394	-0.413054
0.174730	1.781379
-0.950517	-0.097138
0.076178	1.142290
1.371702	-1.470106
0.126720	-2.212507
-1.246918	1.761499
0.941099	1.927863
1.951555	-0.171690

df.iloc[[1,2,4],[0,2]] # 指定行列，先行后列 AC2013-01-022013-01-032013-01-05

0.103394	-0.413054
0.174730	1.781379
0.076178	1.142290

# 切片 df.iloc[1:3,:] # 對行切片: ABCD2013-01-022013-01-03

0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397

df.iloc[:,1:3] # 對列切片: BC2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

-0.346202	-0.557050
-1.051044	-0.413054
2.056007	1.781379
-0.226887	-0.097138
-0.518970	1.142290
-1.028873	-1.470106
-0.251519	-2.212507
1.530266	1.761499
-2.420932	1.927863
-0.264012	-0.171690

df.iloc[3:5,0:2] # 切片選取連續(xù)區(qū)塊。行，列。左開右閉 AB2013-01-042013-01-05

-0.950517	-0.226887
0.076178	-0.518970

Boolean索引

# 通過某列選擇數據: df[df.A > 0] ABCD2013-01-012013-01-022013-01-032013-01-052013-01-062013-01-072013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106
0.103394	-1.051044	-0.413054	0.268955
0.174730	2.056007	1.781379	1.643397
0.076178	-0.518970	1.142290	-0.952401
1.371702	-1.028873	-1.470106	-0.113098
0.126720	-0.251519	-2.212507	1.050036
0.941099	-2.420932	1.927863	-0.549143
1.951555	-0.264012	-0.171690	0.869293

# 通過where選擇數據: b = df[df > 0] b ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	NaN	NaN	0.778106
0.103394	NaN	NaN	0.268955
0.174730	2.056007	1.781379	1.643397
NaN	NaN	NaN	NaN
0.076178	NaN	1.142290	NaN
1.371702	NaN	NaN	NaN
0.126720	NaN	NaN	1.050036
NaN	1.530266	1.761499	0.940741
0.941099	NaN	1.927863	NaN
1.951555	NaN	NaN	0.869293

type(b['A']['2013-01-01']) numpy.float64 # 通過 isin() 過濾數據: df2 = df.copy() df2['E'] = ['one', 'one','two','three','four','three','five','four','three','five'] df2 ABCDE2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106	one
0.103394	-1.051044	-0.413054	0.268955	one
0.174730	2.056007	1.781379	1.643397	two
-0.950517	-0.226887	-0.097138	-0.442010	three
0.076178	-0.518970	1.142290	-0.952401	four
1.371702	-1.028873	-1.470106	-0.113098	three
0.126720	-0.251519	-2.212507	1.050036	five
-1.246918	1.530266	1.761499	0.940741	four
0.941099	-2.420932	1.927863	-0.549143	three
1.951555	-0.264012	-0.171690	0.869293	five

df2['E'].isin(['one','four']) 2013-01-01 True 2013-01-02 True 2013-01-03 False 2013-01-04 False 2013-01-05 True 2013-01-06 False 2013-01-07 False 2013-01-08 True 2013-01-09 False 2013-01-10 False Freq: D, Name: E, dtype: bool df2[df2['E'].isin(['one','four'])] ABCDE2013-01-012013-01-022013-01-052013-01-08

0.754077	-0.346202	-0.557050	0.778106	one
0.103394	-1.051044	-0.413054	0.268955	one
0.076178	-0.518970	1.142290	-0.952401	four
-1.246918	1.530266	1.761499	0.940741	four

增 Create

s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20130102', periods=6)) s1 2013-01-02 1 2013-01-03 2 2013-01-04 3 2013-01-05 4 2013-01-06 5 2013-01-07 6 Freq: D, dtype: int64 # 新增一列數據 df2['F'] = s1 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	0.778106	one	NaN
0.103394	-1.051044	-0.413054	0.268955	one	1.0
0.174730	2.056007	1.781379	1.643397	two	2.0
-0.950517	-0.226887	-0.097138	-0.442010	three	3.0
0.076178	-0.518970	1.142290	-0.952401	four	4.0
1.371702	-1.028873	-1.470106	-0.113098	three	5.0
0.126720	-0.251519	-2.212507	1.050036	five	6.0
-1.246918	1.530266	1.761499	0.940741	four	NaN
0.941099	-2.420932	1.927863	-0.549143	three	NaN
1.951555	-0.264012	-0.171690	0.869293	five	NaN

改 Update

# 更新一列值 df2.loc[:,'D'] 2013-01-01 0.778106 2013-01-02 0.268955 2013-01-03 1.643397 2013-01-04 -0.442010 2013-01-05 -0.952401 2013-01-06 -0.113098 2013-01-07 1.050036 2013-01-08 0.940741 2013-01-09 -0.549143 2013-01-10 0.869293 Freq: D, Name: D, dtype: float64 df2.loc[:,'D'] = 5 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	5	one	NaN
0.103394	-1.051044	-0.413054	5	one	1.0
0.174730	2.056007	1.781379	5	two	2.0
-0.950517	-0.226887	-0.097138	5	three	3.0
0.076178	-0.518970	1.142290	5	four	4.0
1.371702	-1.028873	-1.470106	5	three	5.0
0.126720	-0.251519	-2.212507	5	five	6.0
-1.246918	1.530266	1.761499	5	four	NaN
0.941099	-2.420932	1.927863	5	three	NaN
1.951555	-0.264012	-0.171690	5	five	NaN

df2.iloc[1,3] 5 df2.iloc[1,3] = 10.1 df2 ABCDEF2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

0.754077	-0.346202	-0.557050	5.0	one	NaN
0.103394	-1.051044	-0.413054	10.1	one	1.0
0.174730	2.056007	1.781379	5.0	two	2.0
-0.950517	-0.226887	-0.097138	5.0	three	3.0
0.076178	-0.518970	1.142290	5.0	four	4.0
1.371702	-1.028873	-1.470106	5.0	three	5.0
0.126720	-0.251519	-2.212507	5.0	five	6.0
-1.246918	1.530266	1.761499	5.0	four	NaN
0.941099	-2.420932	1.927863	5.0	three	NaN
1.951555	-0.264012	-0.171690	5.0	five	NaN

# 通過where更新 df3 = df.copy() df3[df3 > 0] = -df3 df3 ABCD2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-062013-01-072013-01-082013-01-092013-01-10

-0.754077	-0.346202	-0.557050	-0.778106
-0.103394	-1.051044	-0.413054	-0.268955
-0.174730	-2.056007	-1.781379	-1.643397
-0.950517	-0.226887	-0.097138	-0.442010
-0.076178	-0.518970	-1.142290	-0.952401
-1.371702	-1.028873	-1.470106	-0.113098
-0.126720	-0.251519	-2.212507	-1.050036
-1.246918	-1.530266	-1.761499	-0.940741
-0.941099	-2.420932	-1.927863	-0.549143
-1.951555	-0.264012	-0.171690	-0.869293

總結

以上是生活随笔為你收集整理的Pandas基础复习-DataFrame的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Java中的泛型方法
下一篇： Codeforces 861 B Whi