成為華爾街金融巨鱷第三課:
Pandas2:學(xué)會(huì)使用Pandas-DataFrame
import pandas
as pd
import numpy
as np
一、DataFrame簡(jiǎn)介和創(chuàng)建:二維數(shù)據(jù)對(duì)象
可以簡(jiǎn)單理解為excel表格
創(chuàng)建方法一:利用字典創(chuàng)建
pd
. DataFrame
( { "one" : [ 1 , 2 , 3 ] , 'two' : [ 4 , 5 , 6 ] } )
和Series類(lèi)似,我們可以為行指定索引
pd
. DataFrame
( { "one" : [ 1 , 2 , 3 ] , 'two' : [ 4 , 5 , 6 ] } , index
= [ 'a' , 'b' , 'c' ] )
創(chuàng)建方法二:利用Series創(chuàng)建
pd
. DataFrame
( { 'one' : pd
. Series
( [ 1 , 2 , 3 ] , index
= [ 'a' , 'b' , 'c' ] ) , 'two' : pd
. Series
( [ 1 , 2 , 3 , 4 ] , index
= [ 'd' , 'c' , 'a' , 'b' ] ) } )
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
可見(jiàn)DataFrame在創(chuàng)建時(shí)會(huì)自動(dòng)進(jìn)行索引對(duì)其
創(chuàng)建方法三:利用csv創(chuàng)建
pd
. read_csv
( 'test.csv' )
保存到csv
df
= pd
. DataFrame
( { 'one' : pd
. Series
( [ 1 , 2 , 3 ] , index
= [ 'a' , 'b' , 'c' ] ) , 'two' : pd
. Series
( [ 1 , 2 , 3 , 4 ] , index
= [ 'd' , 'c' , 'a' , 'b' ] ) } )
df
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
df
. to_csv
( 'test2.csv' )
二、DataFrame常見(jiàn)屬性
1、index、columns和vlues屬性
作用:index用來(lái)獲取 # 行 # 索引 ; columns獲取# 列 # 屬性;values用來(lái)獲取值 # 數(shù)組
df
= pd
. DataFrame
( { 'one' : pd
. Series
( [ 1 , 2 , 3 ] , index
= [ 'a' , 'b' , 'c' ] ) , 'two' : pd
. Series
( [ 1 , 2 , 3 , 4 ] , index
= [ 'd' , 'c' , 'a' , 'b' ] ) } )
df
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
df
. index
Index(['a', 'b', 'c', 'd'], dtype='object')
df
. columns
Index(['one', 'two'], dtype='object')
df
. values
array([[ 1., 3.],[ 2., 4.],[ 3., 2.],[nan, 1.]])
2.T屬性
作用:轉(zhuǎn)置
df
. T
abcd one1.0 2.0 3.0 NaN two3.0 4.0 2.0 1.0
3.describe()方法
作用:返回詳細(xì)信息
df
. describe
( )
onetwo count3.0 4.000000 mean2.0 2.500000 std1.0 1.290994 min1.0 1.000000 25%1.5 1.750000 50%2.0 2.500000 75%2.5 3.250000 max3.0 4.000000
count: 該列數(shù)據(jù)共有多少條
mean:該列數(shù)據(jù)平均值
std:該列數(shù)據(jù)標(biāo)準(zhǔn)差
min:該列數(shù)據(jù)最小值
25%:該列數(shù)據(jù)從小到大25%位置上的數(shù)
50%:該列數(shù)據(jù)中位數(shù)
75%:該列數(shù)據(jù)從小到大75%位置上的數(shù)
max:該列數(shù)據(jù)最大值
三、DataFrame索引和切片
df
= pd
. DataFrame
( { 'one' : pd
. Series
( [ 1 , 2 , 3 ] , index
= [ 'a' , 'b' , 'c' ] ) , 'two' : pd
. Series
( [ 1 , 2 , 3 , 4 ] , index
= [ 'd' , 'c' , 'a' , 'b' ] ) } )
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
df取值可以采用df[x][y]的方法取值,表示取x列的y行,注意對(duì)比numpy,這里前一個(gè)中括號(hào)內(nèi)表示列
eg:取第one行第a列的1.0
df
[ 'one' ] [ 'a' ]
1.0
雖然這樣可以輕松地取到想要的值,但是一般情況下,我們不采取這種方式,因?yàn)闀?huì)出現(xiàn)類(lèi)似于Series整數(shù)索引的問(wèn)題
于是,一般情況下我們還是使用loc和iloc來(lái)進(jìn)行取值
用loc方式取值,在這種情況下逗號(hào)前表示行,逗號(hào)后表示列,和numpy類(lèi)似
df
. loc
[ 'a' , 'one' ]
1.0
!特別注意:DataFrame事實(shí)上是由n個(gè)Series對(duì)象所組成的,因此可以通過(guò)列索引直接取到某一列,卻不能通過(guò)行索引直接取到某一行,想要獲取某一行可以采取切片的方式
df
[ 'one' ]
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
df
[ 'a' ]
---------------------------------------------------------------------------KeyError Traceback (most recent call last)c:\users\lenovo\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2894 try:
-> 2895 return self._engine.get_loc(casted_key)2896 except KeyError as err:pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()KeyError: 'a'The above exception was the direct cause of the following exception:KeyError Traceback (most recent call last)<ipython-input-34-9637ce7feee6> in <module>1 # 嚴(yán)禁通過(guò)行索引獲取某一行
----> 2 df['a']c:\users\lenovo\appdata\local\programs\python\python37\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)2904 if self.columns.nlevels > 1:2905 return self._getitem_multilevel(key)
-> 2906 indexer = self.columns.get_loc(key)2907 if is_integer(indexer):2908 indexer = [indexer]c:\users\lenovo\appdata\local\programs\python\python37\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)2895 return self._engine.get_loc(casted_key)2896 except KeyError as err:
-> 2897 raise KeyError(key) from err2898 2899 if tolerance is not None:KeyError: 'a'
df
. loc
[ 'a' , : ]
one 1.0
two 3.0
Name: a, dtype: float64
根據(jù)我的嘗試,切片獲取:可以省略寫(xiě)成df.loc[‘a(chǎn)’,],甚至可以省略逗號(hào)df.loc[‘a(chǎn)’],但是這與上面的注意違背所以盡量不要省略逗號(hào)
df
= pd
. DataFrame
( { 'one' : pd
. Series
( [ 1 , 2 , 3 ] , index
= [ 'a' , 'b' , 'c' ] ) , 'two' : pd
. Series
( [ 1 , 2 , 3 , 4 ] , index
= [ 'd' , 'c' , 'a' , 'b' ] ) } )
df
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
行/列索引部分除了常規(guī)索引,還可以是切片、布爾值索引、花式索引任意搭配
eg:
df
. loc
[ [ 'a' , 'b' ] , 'one' : 'two' ]
注意點(diǎn)回顧:和Series一樣,在使用鍵索引時(shí),是左閉右也閉的區(qū)間
四、數(shù)據(jù)對(duì)齊與缺失值處理
df
= pd
. DataFrame
( { 'one' : pd
. Series
( [ 1 , 2 , 3 ] , index
= [ 'a' , 'b' , 'c' ] ) , 'two' : pd
. Series
( [ 1 , 2 , 3 , 4 ] , index
= [ 'd' , 'c' , 'a' , 'b' ] ) } )
df
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
df2
= pd
. DataFrame
( { 'two' : [ 1 , 2 , 3 , 4 ] , 'one' : [ 5 , 6 , 7 , 8 ] } , index
= [ 'd' , 'c' , 'a' , 'b' ] )
df2
df
+ df2
onetwo a8.0 6 b10.0 8 c9.0 4 dNaN 2
DataFrame遵循數(shù)據(jù)對(duì)齊的原則,在運(yùn)算時(shí),會(huì)行與列都分別對(duì)齊
df
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
缺失值處理方法1:缺失值填充
df
. fillna
( 0 )
onetwo a1.0 3 b2.0 4 c3.0 2 d0.0 1
缺失值處理方法2:缺失值刪除
1.dropna()方法
df
. dropna
( )
2.dropna()方法的how參數(shù)
df
onetwo a1.0 3 b2.0 4 c3.0 2 dNaN 1
df
. loc
[ 'c' , 'two' ] = np
. nan
df
. loc
[ 'd' , 'two' ] = np
. nan
df
onetwo a1.0 3.0 b2.0 4.0 c3.0 NaN dNaN NaN
df
. dropna
( how
= 'all' )
onetwo a1.0 3.0 b2.0 4.0 c3.0 NaN
3.dropna()方法的axis參數(shù)
axis=0表示以行為單位,axis=1表示以列為單位# df.dropna(axis=1)表示將存在缺失值的列刪除,默認(rèn)axis=0
df2
df2
. loc
[ 'a' , 'two' ] = np
. nan
df2
twoone d1.0 5 c2.0 6 aNaN 7 b4.0 8
df2
. dropna
( axis
= 1 )
df2
. isnull
( )
twoone dFalse False cFalse False aTrue False bFalse False
五、DataFrame常見(jiàn)函數(shù)
1.求平均值
df
= df2
df
twoone d1.0 5 c2.0 6 aNaN 7 b4.0 8
df
. mean
( )
two 2.333333
one 6.500000
dtype: float64
mean():mean方法將忽略缺失值,計(jì)算出每一列的平均值,并返回一個(gè)Series對(duì)象
axis參數(shù):axis=1可以按行求平均值 默認(rèn)為0按列求平均值
df
. mean
( axis
= 1 )
d 3.0
c 4.0
a 7.0
b 6.0
dtype: float64
另外,sum,std等方法和mean()類(lèi)似
2.按值排序 : sort_values()
df
twoone d1.0 5 c2.0 6 aNaN 7 b4.0 8
df
. sort_values
( by
= 'two' )
twoone d1.0 5 c2.0 6 b4.0 8 aNaN 7
df
. sort_values
( by
= 'two' , ascending
= False )
twoone b4.0 8 c2.0 6 d1.0 5 aNaN 7
df
. sort_values
( by
= 'd' , ascending
= False , axis
= 1 )
onetwo d5 1.0 c6 2.0 a7 NaN b8 4.0
總結(jié):
1.df.sort_values()用以按值排序
2.by參數(shù):指定按哪一列(行)排序
3.ascending參數(shù):指定升序或降序排序,True為升序,False為降序,默認(rèn)為T(mén)rue
4.axis參數(shù):指定按行/列排序,axis=0按列排序,axis=1按行排序,默認(rèn)為0,如果by的參數(shù)為行標(biāo)簽,則必須賦值axis=1
5.關(guān)于缺失值NaN:如果存在缺失值,則缺失值不參與排序,統(tǒng)一放在最后面
3.按列排序 : sort_index()
df
twoone d1.0 5 c2.0 6 aNaN 7 b4.0 8
df
. sort_index
( )
twoone aNaN 7 b4.0 8 c2.0 6 d1.0 5
df
. sort_index
( ascending
= False )
twoone d1.0 5 c2.0 6 b4.0 8 aNaN 7
df
. sort_index
( ascending
= True , axis
= 1 )
onetwo d5 1.0 c6 2.0 a7 NaN b8 4.0
sort_index僅有ascending,axis兩個(gè)參數(shù)使用方法和按值排序類(lèi)似
六、OMG太牛辣——DataFrame時(shí)間序列
1.pandas時(shí)間對(duì)象處理
pd
. to_datetime
( [ "2021-01-10" , "2021/MAY/1" ] )
DatetimeIndex(['2021-01-10', '2021-05-01'], dtype='datetime64[ns]', freq=None)
2.pandas時(shí)間對(duì)象自動(dòng)生成
函數(shù)pd.date_range()的start,end/periods參數(shù)說(shuō)明
start:開(kāi)始時(shí)間
end:結(jié)束時(shí)間
periods:可以指定start不指定end改指定periods,指的是生成從start開(kāi)始的periods天時(shí)間,同樣的,可以指定end不指定start改指定periods
pd
. date_range
( '2021-01-01' , '2021-03-01' )
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04','2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08','2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12','2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16','2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20','2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24','2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28','2021-01-29', '2021-01-30', '2021-01-31', '2021-02-01','2021-02-02', '2021-02-03', '2021-02-04', '2021-02-05','2021-02-06', '2021-02-07', '2021-02-08', '2021-02-09','2021-02-10', '2021-02-11', '2021-02-12', '2021-02-13','2021-02-14', '2021-02-15', '2021-02-16', '2021-02-17','2021-02-18', '2021-02-19', '2021-02-20', '2021-02-21','2021-02-22', '2021-02-23', '2021-02-24', '2021-02-25','2021-02-26', '2021-02-27', '2021-02-28', '2021-03-01'],dtype='datetime64[ns]', freq='D')
pd
. date_range
( '2021-01-01' , periods
= 60 )
DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04','2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08','2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12','2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16','2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20','2021-01-21', '2021-01-22', '2021-01-23', '2021-01-24','2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28','2021-01-29', '2021-01-30', '2021-01-31', '2021-02-01','2021-02-02', '2021-02-03', '2021-02-04', '2021-02-05','2021-02-06', '2021-02-07', '2021-02-08', '2021-02-09','2021-02-10', '2021-02-11', '2021-02-12', '2021-02-13','2021-02-14', '2021-02-15', '2021-02-16', '2021-02-17','2021-02-18', '2021-02-19', '2021-02-20', '2021-02-21','2021-02-22', '2021-02-23', '2021-02-24', '2021-02-25','2021-02-26', '2021-02-27', '2021-02-28', '2021-03-01'],dtype='datetime64[ns]', freq='D')
函數(shù)pd.date_range()的freq參數(shù)說(shuō)明:
freq:指定生成的時(shí)間間隔單位,默認(rèn)為D(天),此外,還有‘H’(小時(shí)),‘W’(周)等
W(周)分為,‘W-MON’表示從start開(kāi)始輸出每周周一,默認(rèn)只輸入W表示'W-SUN'
B:只輸出工作日
pd
. date_range
( '2021-01-01' , periods
= 60 , freq
= 'H' )
DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 01:00:00','2021-01-01 02:00:00', '2021-01-01 03:00:00','2021-01-01 04:00:00', '2021-01-01 05:00:00','2021-01-01 06:00:00', '2021-01-01 07:00:00','2021-01-01 08:00:00', '2021-01-01 09:00:00','2021-01-01 10:00:00', '2021-01-01 11:00:00','2021-01-01 12:00:00', '2021-01-01 13:00:00','2021-01-01 14:00:00', '2021-01-01 15:00:00','2021-01-01 16:00:00', '2021-01-01 17:00:00','2021-01-01 18:00:00', '2021-01-01 19:00:00','2021-01-01 20:00:00', '2021-01-01 21:00:00','2021-01-01 22:00:00', '2021-01-01 23:00:00','2021-01-02 00:00:00', '2021-01-02 01:00:00','2021-01-02 02:00:00', '2021-01-02 03:00:00','2021-01-02 04:00:00', '2021-01-02 05:00:00','2021-01-02 06:00:00', '2021-01-02 07:00:00','2021-01-02 08:00:00', '2021-01-02 09:00:00','2021-01-02 10:00:00', '2021-01-02 11:00:00','2021-01-02 12:00:00', '2021-01-02 13:00:00','2021-01-02 14:00:00', '2021-01-02 15:00:00','2021-01-02 16:00:00', '2021-01-02 17:00:00','2021-01-02 18:00:00', '2021-01-02 19:00:00','2021-01-02 20:00:00', '2021-01-02 21:00:00','2021-01-02 22:00:00', '2021-01-02 23:00:00','2021-01-03 00:00:00', '2021-01-03 01:00:00','2021-01-03 02:00:00', '2021-01-03 03:00:00','2021-01-03 04:00:00', '2021-01-03 05:00:00','2021-01-03 06:00:00', '2021-01-03 07:00:00','2021-01-03 08:00:00', '2021-01-03 09:00:00','2021-01-03 10:00:00', '2021-01-03 11:00:00'],dtype='datetime64[ns]', freq='H')
pd
. date_range
( '2021-01-01' , periods
= 60 , freq
= 'W' )
DatetimeIndex(['2021-01-03', '2021-01-10', '2021-01-17', '2021-01-24','2021-01-31', '2021-02-07', '2021-02-14', '2021-02-21','2021-02-28', '2021-03-07', '2021-03-14', '2021-03-21','2021-03-28', '2021-04-04', '2021-04-11', '2021-04-18','2021-04-25', '2021-05-02', '2021-05-09', '2021-05-16','2021-05-23', '2021-05-30', '2021-06-06', '2021-06-13','2021-06-20', '2021-06-27', '2021-07-04', '2021-07-11','2021-07-18', '2021-07-25', '2021-08-01', '2021-08-08','2021-08-15', '2021-08-22', '2021-08-29', '2021-09-05','2021-09-12', '2021-09-19', '2021-09-26', '2021-10-03','2021-10-10', '2021-10-17', '2021-10-24', '2021-10-31','2021-11-07', '2021-11-14', '2021-11-21', '2021-11-28','2021-12-05', '2021-12-12', '2021-12-19', '2021-12-26','2022-01-02', '2022-01-09', '2022-01-16', '2022-01-23','2022-01-30', '2022-02-06', '2022-02-13', '2022-02-20'],dtype='datetime64[ns]', freq='W-SUN')
pd
. date_range
( '2021-01-01' , periods
= 60 , freq
= 'W-Fri' )
DatetimeIndex(['2021-01-01', '2021-01-08', '2021-01-15', '2021-01-22','2021-01-29', '2021-02-05', '2021-02-12', '2021-02-19','2021-02-26', '2021-03-05', '2021-03-12', '2021-03-19','2021-03-26', '2021-04-02', '2021-04-09', '2021-04-16','2021-04-23', '2021-04-30', '2021-05-07', '2021-05-14','2021-05-21', '2021-05-28', '2021-06-04', '2021-06-11','2021-06-18', '2021-06-25', '2021-07-02', '2021-07-09','2021-07-16', '2021-07-23', '2021-07-30', '2021-08-06','2021-08-13', '2021-08-20', '2021-08-27', '2021-09-03','2021-09-10', '2021-09-17', '2021-09-24', '2021-10-01','2021-10-08', '2021-10-15', '2021-10-22', '2021-10-29','2021-11-05', '2021-11-12', '2021-11-19', '2021-11-26','2021-12-03', '2021-12-10', '2021-12-17', '2021-12-24','2021-12-31', '2022-01-07', '2022-01-14', '2022-01-21','2022-01-28', '2022-02-04', '2022-02-11', '2022-02-18'],dtype='datetime64[ns]', freq='W-FRI')
pd
. date_range
( '2021-01-01' , periods
= 60 , freq
= 'B' )
DatetimeIndex(['2021-01-01', '2021-01-04', '2021-01-05', '2021-01-06','2021-01-07', '2021-01-08', '2021-01-11', '2021-01-12','2021-01-13', '2021-01-14', '2021-01-15', '2021-01-18','2021-01-19', '2021-01-20', '2021-01-21', '2021-01-22','2021-01-25', '2021-01-26', '2021-01-27', '2021-01-28','2021-01-29', '2021-02-01', '2021-02-02', '2021-02-03','2021-02-04', '2021-02-05', '2021-02-08', '2021-02-09','2021-02-10', '2021-02-11', '2021-02-12', '2021-02-15','2021-02-16', '2021-02-17', '2021-02-18', '2021-02-19','2021-02-22', '2021-02-23', '2021-02-24', '2021-02-25','2021-02-26', '2021-03-01', '2021-03-02', '2021-03-03','2021-03-04', '2021-03-05', '2021-03-08', '2021-03-09','2021-03-10', '2021-03-11', '2021-03-12', '2021-03-15','2021-03-16', '2021-03-17', '2021-03-18', '2021-03-19','2021-03-22', '2021-03-23', '2021-03-24', '2021-03-25'],dtype='datetime64[ns]', freq='B')
pd
. date_range
( '2021-01-01' , periods
= 60 , freq
= '1h20min' )
DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 01:20:00','2021-01-01 02:40:00', '2021-01-01 04:00:00','2021-01-01 05:20:00', '2021-01-01 06:40:00','2021-01-01 08:00:00', '2021-01-01 09:20:00','2021-01-01 10:40:00', '2021-01-01 12:00:00','2021-01-01 13:20:00', '2021-01-01 14:40:00','2021-01-01 16:00:00', '2021-01-01 17:20:00','2021-01-01 18:40:00', '2021-01-01 20:00:00','2021-01-01 21:20:00', '2021-01-01 22:40:00','2021-01-02 00:00:00', '2021-01-02 01:20:00','2021-01-02 02:40:00', '2021-01-02 04:00:00','2021-01-02 05:20:00', '2021-01-02 06:40:00','2021-01-02 08:00:00', '2021-01-02 09:20:00','2021-01-02 10:40:00', '2021-01-02 12:00:00','2021-01-02 13:20:00', '2021-01-02 14:40:00','2021-01-02 16:00:00', '2021-01-02 17:20:00','2021-01-02 18:40:00', '2021-01-02 20:00:00','2021-01-02 21:20:00', '2021-01-02 22:40:00','2021-01-03 00:00:00', '2021-01-03 01:20:00','2021-01-03 02:40:00', '2021-01-03 04:00:00','2021-01-03 05:20:00', '2021-01-03 06:40:00','2021-01-03 08:00:00', '2021-01-03 09:20:00','2021-01-03 10:40:00', '2021-01-03 12:00:00','2021-01-03 13:20:00', '2021-01-03 14:40:00','2021-01-03 16:00:00', '2021-01-03 17:20:00','2021-01-03 18:40:00', '2021-01-03 20:00:00','2021-01-03 21:20:00', '2021-01-03 22:40:00','2021-01-04 00:00:00', '2021-01-04 01:20:00','2021-01-04 02:40:00', '2021-01-04 04:00:00','2021-01-04 05:20:00', '2021-01-04 06:40:00'],dtype='datetime64[ns]', freq='80T')
3.時(shí)間序列
什么是時(shí)間序列
時(shí)間序列就是以時(shí)間對(duì)象為索引的Series或DataFrame。eg:
sr
= pd
. Series
( np
. arange
( 1000 ) , index
= pd
. date_range
( '2020-01-01' , periods
= 1000 ) )
sr
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4...
2022-09-22 995
2022-09-23 996
2022-09-24 997
2022-09-25 998
2022-09-26 999
Freq: D, Length: 1000, dtype: int32
sr
. index
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04','2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08','2020-01-09', '2020-01-10',...'2022-09-17', '2022-09-18', '2022-09-19', '2022-09-20','2022-09-21', '2022-09-22', '2022-09-23', '2022-09-24','2022-09-25', '2022-09-26'],dtype='datetime64[ns]', length=1000, freq='D')
時(shí)間序列的特殊作用
時(shí)間序列可以直接查找某一年/月/日的數(shù)據(jù) 甚至支持年月日的切片
sr
[ '2020' ]
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4...
2020-12-27 361
2020-12-28 362
2020-12-29 363
2020-12-30 364
2020-12-31 365
Freq: D, Length: 366, dtype: int32
sr
[ '2020-3' ]
2020-03-01 60
2020-03-02 61
2020-03-03 62
2020-03-04 63
2020-03-05 64
2020-03-06 65
2020-03-07 66
2020-03-08 67
2020-03-09 68
2020-03-10 69
2020-03-11 70
2020-03-12 71
2020-03-13 72
2020-03-14 73
2020-03-15 74
2020-03-16 75
2020-03-17 76
2020-03-18 77
2020-03-19 78
2020-03-20 79
2020-03-21 80
2020-03-22 81
2020-03-23 82
2020-03-24 83
2020-03-25 84
2020-03-26 85
2020-03-27 86
2020-03-28 87
2020-03-29 88
2020-03-30 89
2020-03-31 90
Freq: D, dtype: int32
sr
[ '2020-3-19' ]
78
sr
[ '2020-03' : '2021-05-1' ]
2020-03-01 60
2020-03-02 61
2020-03-03 62
2020-03-04 63
2020-03-05 64...
2021-04-27 482
2021-04-28 483
2021-04-29 484
2021-04-30 485
2021-05-01 486
Freq: D, Length: 427, dtype: int32
涉及時(shí)間序列的函數(shù):resample()——強(qiáng)大的統(tǒng)計(jì)函數(shù)
sr
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4...
2022-09-22 995
2022-09-23 996
2022-09-24 997
2022-09-25 998
2022-09-26 999
Freq: D, Length: 1000, dtype: int32
sr
. resample
( 'W' ) . sum ( )
2020-01-05 10
2020-01-12 56
2020-01-19 105
2020-01-26 154
2020-02-02 203...
2022-09-04 6818
2022-09-11 6867
2022-09-18 6916
2022-09-25 6965
2022-10-02 999
Freq: W-SUN, Length: 144, dtype: int32
sr
. resample
( 'W' ) . mean
( )
2020-01-05 2
2020-01-12 8
2020-01-19 15
2020-01-26 22
2020-02-02 29...
2022-09-04 974
2022-09-11 981
2022-09-18 988
2022-09-25 995
2022-10-02 999
Freq: W-SUN, Length: 144, dtype: int32
七、文件操作
pd
. read_csv
( 'maotai.csv' )
日期收盤(pán)開(kāi)盤(pán)高低交易量漲跌幅 02021/11/12 1,773.78 1,778.00 1,785.05 1,767.00 1.76M 0.24% 12021/11/11 1,769.60 1,752.93 1,769.60 1,741.50 2.27M 0.89% 22021/11/10 1,753.99 1,790.01 1,795.00 1,735.00 3.53M -2.01% 32021/11/9 1,790.01 1,819.98 1,827.87 1,782.00 2.74M -1.65% 42021/11/8 1,820.10 1,820.00 1,830.80 1,802.05 1.77M 0.01% ...... ... ... ... ... ... ... 2392020/11/18 1,693.65 1,715.00 1,720.53 1,683.16 3.52M -1.29% 2402020/11/17 1,715.80 1,740.00 1,742.35 1,701.07 2.52M -0.82% 2412020/11/16 1,730.05 1,711.00 1,730.05 1,697.26 3.06M 1.47% 2422020/11/13 1,705.00 1,724.00 1,728.88 1,691.00 2.82M -1.72% 2432020/11/12 1,734.79 1,730.01 1,750.00 1,722.27 2.35M 0.20%
244 rows × 7 columns
df
= pd
. read_csv
( 'maotai.csv' , index_col
= 0 )
df
收盤(pán)開(kāi)盤(pán)高低交易量漲跌幅 日期 2021/11/121,773.78 1,778.00 1,785.05 1,767.00 1.76M 0.24% 2021/11/111,769.60 1,752.93 1,769.60 1,741.50 2.27M 0.89% 2021/11/101,753.99 1,790.01 1,795.00 1,735.00 3.53M -2.01% 2021/11/91,790.01 1,819.98 1,827.87 1,782.00 2.74M -1.65% 2021/11/81,820.10 1,820.00 1,830.80 1,802.05 1.77M 0.01% ...... ... ... ... ... ... 2020/11/181,693.65 1,715.00 1,720.53 1,683.16 3.52M -1.29% 2020/11/171,715.80 1,740.00 1,742.35 1,701.07 2.52M -0.82% 2020/11/161,730.05 1,711.00 1,730.05 1,697.26 3.06M 1.47% 2020/11/131,705.00 1,724.00 1,728.88 1,691.00 2.82M -1.72% 2020/11/121,734.79 1,730.01 1,750.00 1,722.27 2.35M 0.20%
244 rows × 6 columns
df
. index
Index(['2021/11/12', '2021/11/11', '2021/11/10', '2021/11/9', '2021/11/8','2021/11/5', '2021/11/4', '2021/11/3', '2021/11/2', '2021/11/1',...'2020/11/25', '2020/11/24', '2020/11/23', '2020/11/20', '2020/11/19','2020/11/18', '2020/11/17', '2020/11/16', '2020/11/13', '2020/11/12'],dtype='object', name='日期', length=244)
df
= pd
. read_csv
( 'maotai.csv' , index_col
= 0 , thousands
= ',' , parse_dates
= True )
df
收盤(pán)開(kāi)盤(pán)高低交易量漲跌幅 日期 2021-11-121773.78 1778.00 1785.05 1767.00 1.76M 0.24% 2021-11-111769.60 1752.93 1769.60 1741.50 2.27M 0.89% 2021-11-101753.99 1790.01 1795.00 1735.00 3.53M -2.01% 2021-11-091790.01 1819.98 1827.87 1782.00 2.74M -1.65% 2021-11-081820.10 1820.00 1830.80 1802.05 1.77M 0.01% ...... ... ... ... ... ... 2020-11-181693.65 1715.00 1720.53 1683.16 3.52M -1.29% 2020-11-171715.80 1740.00 1742.35 1701.07 2.52M -0.82% 2020-11-161730.05 1711.00 1730.05 1697.26 3.06M 1.47% 2020-11-131705.00 1724.00 1728.88 1691.00 2.82M -1.72% 2020-11-121734.79 1730.01 1750.00 1722.27 2.35M 0.20%
244 rows × 6 columns
df
. index
DatetimeIndex(['2021-11-12', '2021-11-11', '2021-11-10', '2021-11-09','2021-11-08', '2021-11-05', '2021-11-04', '2021-11-03','2021-11-02', '2021-11-01',...'2020-11-25', '2020-11-24', '2020-11-23', '2020-11-20','2020-11-19', '2020-11-18', '2020-11-17', '2020-11-16','2020-11-13', '2020-11-12'],dtype='datetime64[ns]', name='日期', length=244, freq=None)
df
= pd
. read_csv
( 'maotai.csv' , index_col
= 0 , parse_dates
= [ 0 ] )
df
. index
DatetimeIndex(['2021-11-12', '2021-11-11', '2021-11-10', '2021-11-09','2021-11-08', '2021-11-05', '2021-11-04', '2021-11-03','2021-11-02', '2021-11-01',...'2020-11-25', '2020-11-24', '2020-11-23', '2020-11-20','2020-11-19', '2020-11-18', '2020-11-17', '2020-11-16','2020-11-13', '2020-11-12'],dtype='datetime64[ns]', name='日期', length=244, freq=None)
pd
. read_csv
( 'maotai.csv' , index_col
= 0 , header
= None )
123456 0 日期收盤(pán) 開(kāi)盤(pán) 高 低 交易量 漲跌幅 2021/11/121,773.78 1,778.00 1,785.05 1,767.00 1.76M 0.24% 2021/11/111,769.60 1,752.93 1,769.60 1,741.50 2.27M 0.89% 2021/11/101,753.99 1,790.01 1,795.00 1,735.00 3.53M -2.01% 2021/11/91,790.01 1,819.98 1,827.87 1,782.00 2.74M -1.65% ...... ... ... ... ... ... 2020/11/181,693.65 1,715.00 1,720.53 1,683.16 3.52M -1.29% 2020/11/171,715.80 1,740.00 1,742.35 1,701.07 2.52M -0.82% 2020/11/161,730.05 1,711.00 1,730.05 1,697.26 3.06M 1.47% 2020/11/131,705.00 1,724.00 1,728.88 1,691.00 2.82M -1.72% 2020/11/121,734.79 1,730.01 1,750.00 1,722.27 2.35M 0.20%
245 rows × 6 columns
pd
. read_csv
( 'maotai.csv' , index_col
= 0 , header
= None , names
= [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' ] )
bcdefg a 日期收盤(pán) 開(kāi)盤(pán) 高 低 交易量 漲跌幅 2021/11/121,773.78 1,778.00 1,785.05 1,767.00 1.76M 0.24% 2021/11/111,769.60 1,752.93 1,769.60 1,741.50 2.27M 0.89% 2021/11/101,753.99 1,790.01 1,795.00 1,735.00 3.53M -2.01% 2021/11/91,790.01 1,819.98 1,827.87 1,782.00 2.74M -1.65% ...... ... ... ... ... ... 2020/11/181,693.65 1,715.00 1,720.53 1,683.16 3.52M -1.29% 2020/11/171,715.80 1,740.00 1,742.35 1,701.07 2.52M -0.82% 2020/11/161,730.05 1,711.00 1,730.05 1,697.26 3.06M 1.47% 2020/11/131,705.00 1,724.00 1,728.88 1,691.00 2.82M -1.72% 2020/11/121,734.79 1,730.01 1,750.00 1,722.27 2.35M 0.20%
245 rows × 6 columns
pd
. read_csv
( 'maotai.csv' , index_col
= 0 , header
= None , names
= [ 'a' , 'b' , 'c' , 'd' , 'e' , 'f' , 'g' ] , na_values
= [ '收盤(pán)' , '開(kāi)盤(pán)' ] )
bcdefg a 日期NaN NaN 高 低 交易量 漲跌幅 2021/11/121,773.78 1,778.00 1,785.05 1,767.00 1.76M 0.24% 2021/11/111,769.60 1,752.93 1,769.60 1,741.50 2.27M 0.89% 2021/11/101,753.99 1,790.01 1,795.00 1,735.00 3.53M -2.01% 2021/11/91,790.01 1,819.98 1,827.87 1,782.00 2.74M -1.65% ...... ... ... ... ... ... 2020/11/181,693.65 1,715.00 1,720.53 1,683.16 3.52M -1.29% 2020/11/171,715.80 1,740.00 1,742.35 1,701.07 2.52M -0.82% 2020/11/161,730.05 1,711.00 1,730.05 1,697.26 3.06M 1.47% 2020/11/131,705.00 1,724.00 1,728.88 1,691.00 2.82M -1.72% 2020/11/121,734.79 1,730.01 1,750.00 1,722.27 2.35M 0.20%
245 rows × 6 columns
總結(jié)
以上是生活随笔 為你收集整理的成为华尔街金融巨鳄第三课: Pandas2:学会使用Pandas-DataFrame 的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
如果覺(jué)得生活随笔 網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔 推薦給好友。