數據類型-DataFrame
- DataFrame是由多個Series數據列組成的表格數據類型,每行Series值都增加了一個共用的索引
- 既有行索引,又有列索引
- 行索引,表明不同行,橫向索引,叫index,0軸,axis=0
- 列索引,表名不同列,縱向索引,叫columns,1軸,axis=1
- DataFrame數據類型可視為:二維 帶標簽 數組
- 每列值的類型可以不同
- 基本操作類似Series,依據行列索引操作
- 常用于表達二維數據,但也可以表達多維數據(Dataframe嵌套,極少用)
DataFrame數據類型創(chuàng)建
Python list列表 創(chuàng)建DataFrame
import pandas
as pddf = pd.DataFrame([
True,
1,
2.3,
'a',
'你好'])
df
df = pd.DataFrame([[
True,
1,
2.3,
'a',
'你好'],[
1,
2,
3,
4,
5]])
df
01234
0| True | 1 | 2.3 | a | 你好 |
1| 1 | 2 | 3.0 | 4 | 5 |
df = pd.DataFrame([[[
True,
1,
2.3,
'a',
'你好'],[
1,
2,
3,
4,
5]],[[
True,
1,
2.3,
'a',
'你好'],[
1,
2,
3,
4,
5]]])
df
01
0| [True, 1, 2.3, a, 你好] | [1, 2, 3, 4, 5] |
1| [True, 1, 2.3, a, 你好] | [1, 2, 3, 4, 5] |
Python 字典 創(chuàng)建DataFrame
df = pd.DataFrame({
'one':[
1,
2,
3,
4],
'two':[
9,
8,
7,
6]})
df
df = pd.DataFrame({
'one':[
1,
2,
3,
4],
'two':[
9,
8,
7,
6]},index = [
'a',
'b',
'c',
'd'])
df
df = pd.DataFrame({
'A' :
1,
'B' :
2.3,
'C' : [
'x',
'y',
5]
})
df
dt = {
'one' : pd.Series([
1,
2,
3],index=[
'a',
'b',
'c']),
'two' : pd.Series([
9,
8,
7,
6],index=[
'a',
'b',
'c',
'd',])
}
dt {'one': a 1b 2c 3dtype: int64, 'two': a 9b 8c 7d 6dtype: int64}
d = pd.DataFrame(dt)
d
onetwo
a| 1.0 | 9 |
b| 2.0 | 8 |
c| 3.0 | 7 |
d| NaN | 6 |
d_2 = pd.DataFrame(dt,index=[
'b',
'c',
'd'],columns=[
'two',
'three'])
d_2
ndarray數組 創(chuàng)建DataFrame
import numpy
as npdf = pd.DataFrame(np.arange(
10).reshape(
2,
5))
df
df = pd.DataFrame(np.random.randn(
6,
4),index=[
1,
2,
3,
4,
5,
6],columns=[
'a',
'b',
'c',
'd'])
df
abcd
1| 0.274340 | 0.296507 | 0.751198 | 0.763512 |
2| 0.181134 | 0.675380 | 0.553695 | 0.632163 |
3| -0.059765 | 0.347702 | 1.138297 | -0.143998 |
4| -1.370677 | -0.951640 | 0.135964 | -0.665875 |
5| 1.490610 | 0.420539 | 0.628784 | 2.119896 |
6| -1.669737 | 1.167765 | 1.254722 | -0.948624 |
Series 創(chuàng)建DataFrame
e = pd.DataFrame([pd.Series([
1,
2,
3]),pd.Series([
9,
8,
7,
6])],index=[
'a',
'b'])
e
0123
a| 1.0 | 2.0 | 3.0 | NaN |
b| 9.0 | 8.0 | 7.0 | 6.0 |
DataFrame屬性
di = {
'姓名':[
'張三',
'李四',
'王五',
'趙六'],
'性別':[
'男',
'女',
'女',
'男'],
'年齡':[
12,
22,
32,
42],
'地址':[
'北京',
'上海',
'廣州',
'深圳']
}
di {'地址': ['北京', '上海', '廣州', '深圳'],'姓名': ['張三', '李四', '王五', '趙六'],'年齡': [12, 22, 32, 42],'性別': ['男', '女', '女', '男']}
d = pd.DataFrame(di,index=[
'd1',
'd2',
'd3',
'd4'])
d
地址姓名年齡性別
d1| 北京 | 張三 | 12 | 男 |
d2| 上海 | 李四 | 22 | 女 |
d3| 廣州 | 王五 | 32 | 女 |
d4| 深圳 | 趙六 | 42 | 男 |
d.head()
地址姓名年齡性別
d1| 北京 | 張三 | 12 | 男 |
d2| 上海 | 李四 | 22 | 女 |
d3| 廣州 | 王五 | 32 | 女 |
d4| 深圳 | 趙六 | 42 | 男 |
d.tail(
3)
地址姓名年齡性別
d2| 上海 | 李四 | 22 | 女 |
d3| 廣州 | 王五 | 32 | 女 |
d4| 深圳 | 趙六 | 42 | 男 |
d.info() <class 'pandas.core.frame.DataFrame'>
Index: 4 entries, d1 to d4
Data columns (total 4 columns):
地址 4 non-null object
姓名 4 non-null object
年齡 4 non-null int64
性別 4 non-null object
dtypes: int64(1), object(3)
memory usage: 160.0+ bytes
d.shape (4, 4)
d.dtypes 地址 object
姓名 object
年齡 int64
性別 object
dtype: object
d.index Index(['d1', 'd2', 'd3', 'd4'], dtype='object')
d.columns Index(['地址', '姓名', '年齡', '性別'], dtype='object')
d.values array([['北京', '張三', 12, '男'],['上海', '李四', 22, '女'],['廣州', '王五', 32, '女'],['深圳', '趙六', 42, '男']], dtype=object)
DataFrame查增改刪
查 Read
類list/ndarray數據訪問方式
dates = pd.date_range(
'20130101',periods=
10)
dates DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04','2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08','2013-01-09', '2013-01-10'],dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(
10,
4),index=dates,columns=[
'A',
'B',
'C',
'D'])
df
ABCD
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | -0.442010 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | -0.952401 |
2013-01-06| 1.371702 | -1.028873 | -1.470106 | -0.113098 |
2013-01-07| 0.126720 | -0.251519 | -2.212507 | 1.050036 |
2013-01-08| -1.246918 | 1.530266 | 1.761499 | 0.940741 |
2013-01-09| 0.941099 | -2.420932 | 1.927863 | -0.549143 |
2013-01-10| 1.951555 | -0.264012 | -0.171690 | 0.869293 |
df[
'A'] 2013-01-01 0.754077
2013-01-02 0.103394
2013-01-03 0.174730
2013-01-04 -0.950517
2013-01-05 0.076178
2013-01-06 1.371702
2013-01-07 0.126720
2013-01-08 -1.246918
2013-01-09 0.941099
2013-01-10 1.951555
Freq: D, Name: A, dtype: float64
df.A 2013-01-01 0.754077
2013-01-02 0.103394
2013-01-03 0.174730
2013-01-04 -0.950517
2013-01-05 0.076178
2013-01-06 1.371702
2013-01-07 0.126720
2013-01-08 -1.246918
2013-01-09 0.941099
2013-01-10 1.951555
Freq: D, Name: A, dtype: float64
df[
'A'][
'2013-01-01'] 0.75407705661157032
df.A[
'2013-01-01'] 0.75407705661157032
df[[
'A',
'C']]
AC
2013-01-01| 0.754077 | -0.557050 |
2013-01-02| 0.103394 | -0.413054 |
2013-01-03| 0.174730 | 1.781379 |
2013-01-04| -0.950517 | -0.097138 |
2013-01-05| 0.076178 | 1.142290 |
2013-01-06| 1.371702 | -1.470106 |
2013-01-07| 0.126720 | -2.212507 |
2013-01-08| -1.246918 | 1.761499 |
2013-01-09| 0.941099 | 1.927863 |
2013-01-10| 1.951555 | -0.171690 |
Pandas專用的數據訪問方式 — .loc 通過自定義索引獲取數據
df.loc[
'2013-01-01'] A 0.754077
B -0.346202
C -0.557050
D 0.778106
Name: 2013-01-01 00:00:00, dtype: float64
df.loc[:,
'A'] 2013-01-01 0.754077
2013-01-02 0.103394
2013-01-03 0.174730
2013-01-04 -0.950517
2013-01-05 0.076178
2013-01-06 1.371702
2013-01-07 0.126720
2013-01-08 -1.246918
2013-01-09 0.941099
2013-01-10 1.951555
Freq: D, Name: A, dtype: float64
df.loc[
'2013-01-01',
'A'] 0.75407705661157032
df.loc[[dates[
0],dates[
2]],:]
ABCD
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
df.loc[:,[
'A',
'B']]
AB
2013-01-01| 0.754077 | -0.346202 |
2013-01-02| 0.103394 | -1.051044 |
2013-01-03| 0.174730 | 2.056007 |
2013-01-04| -0.950517 | -0.226887 |
2013-01-05| 0.076178 | -0.518970 |
2013-01-06| 1.371702 | -1.028873 |
2013-01-07| 0.126720 | -0.251519 |
2013-01-08| -1.246918 | 1.530266 |
2013-01-09| 0.941099 | -2.420932 |
2013-01-10| 1.951555 | -0.264012 |
df.loc[[dates[
0],dates[
2]],[
'A',
'B']]
AB
2013-01-01| 0.754077 | -0.346202 |
2013-01-03| 0.174730 | 2.056007 |
df.loc[
'2013-01-01':
'2013-01-04',:]
ABCD
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | -0.442010 |
df.loc[:,
'A':
'C']
ABC
2013-01-01| 0.754077 | -0.346202 | -0.557050 |
2013-01-02| 0.103394 | -1.051044 | -0.413054 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 |
2013-01-06| 1.371702 | -1.028873 | -1.470106 |
2013-01-07| 0.126720 | -0.251519 | -2.212507 |
2013-01-08| -1.246918 | 1.530266 | 1.761499 |
2013-01-09| 0.941099 | -2.420932 | 1.927863 |
2013-01-10| 1.951555 | -0.264012 | -0.171690 |
df.loc[
'2013-01-01':
'2013-01-04',
'A':
'C']
ABC
2013-01-01| 0.754077 | -0.346202 | -0.557050 |
2013-01-02| 0.103394 | -1.051044 | -0.413054 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 |
.iloc 通過默認索引獲取數據
df.iloc[
3] A -0.950517
B -0.226887
C -0.097138
D -0.442010
Name: 2013-01-04 00:00:00, dtype: float64
df.iloc[:,
2] 2013-01-01 -0.557050
2013-01-02 -0.413054
2013-01-03 1.781379
2013-01-04 -0.097138
2013-01-05 1.142290
2013-01-06 -1.470106
2013-01-07 -2.212507
2013-01-08 1.761499
2013-01-09 1.927863
2013-01-10 -0.171690
Freq: D, Name: C, dtype: float64
df.iloc[
1,
2] -0.41305425875508139
df.iloc[[
1,
2,
4],:]
ABCD
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | -0.952401 |
df.iloc[:,[
0,
2]]
AC
2013-01-01| 0.754077 | -0.557050 |
2013-01-02| 0.103394 | -0.413054 |
2013-01-03| 0.174730 | 1.781379 |
2013-01-04| -0.950517 | -0.097138 |
2013-01-05| 0.076178 | 1.142290 |
2013-01-06| 1.371702 | -1.470106 |
2013-01-07| 0.126720 | -2.212507 |
2013-01-08| -1.246918 | 1.761499 |
2013-01-09| 0.941099 | 1.927863 |
2013-01-10| 1.951555 | -0.171690 |
df.iloc[[
1,
2,
4],[
0,
2]]
AC
2013-01-02| 0.103394 | -0.413054 |
2013-01-03| 0.174730 | 1.781379 |
2013-01-05| 0.076178 | 1.142290 |
df.iloc[
1:
3,:]
ABCD
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
df.iloc[:,
1:
3]
BC
2013-01-01| -0.346202 | -0.557050 |
2013-01-02| -1.051044 | -0.413054 |
2013-01-03| 2.056007 | 1.781379 |
2013-01-04| -0.226887 | -0.097138 |
2013-01-05| -0.518970 | 1.142290 |
2013-01-06| -1.028873 | -1.470106 |
2013-01-07| -0.251519 | -2.212507 |
2013-01-08| 1.530266 | 1.761499 |
2013-01-09| -2.420932 | 1.927863 |
2013-01-10| -0.264012 | -0.171690 |
df.iloc[
3:
5,
0:
2]
AB
2013-01-04| -0.950517 | -0.226887 |
2013-01-05| 0.076178 | -0.518970 |
Boolean索引
df[df.A >
0]
ABCD
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | -0.952401 |
2013-01-06| 1.371702 | -1.028873 | -1.470106 | -0.113098 |
2013-01-07| 0.126720 | -0.251519 | -2.212507 | 1.050036 |
2013-01-09| 0.941099 | -2.420932 | 1.927863 | -0.549143 |
2013-01-10| 1.951555 | -0.264012 | -0.171690 | 0.869293 |
b = df[df >
0]
b
ABCD
2013-01-01| 0.754077 | NaN | NaN | 0.778106 |
2013-01-02| 0.103394 | NaN | NaN | 0.268955 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 |
2013-01-04| NaN | NaN | NaN | NaN |
2013-01-05| 0.076178 | NaN | 1.142290 | NaN |
2013-01-06| 1.371702 | NaN | NaN | NaN |
2013-01-07| 0.126720 | NaN | NaN | 1.050036 |
2013-01-08| NaN | 1.530266 | 1.761499 | 0.940741 |
2013-01-09| 0.941099 | NaN | 1.927863 | NaN |
2013-01-10| 1.951555 | NaN | NaN | 0.869293 |
type(b[
'A'][
'2013-01-01']) numpy.float64
df2 = df.copy()
df2[
'E'] = [
'one',
'one',
'two',
'three',
'four',
'three',
'five',
'four',
'three',
'five']
df2
ABCDE
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 | one |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 | one |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 | two |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | -0.442010 | three |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | -0.952401 | four |
2013-01-06| 1.371702 | -1.028873 | -1.470106 | -0.113098 | three |
2013-01-07| 0.126720 | -0.251519 | -2.212507 | 1.050036 | five |
2013-01-08| -1.246918 | 1.530266 | 1.761499 | 0.940741 | four |
2013-01-09| 0.941099 | -2.420932 | 1.927863 | -0.549143 | three |
2013-01-10| 1.951555 | -0.264012 | -0.171690 | 0.869293 | five |
df2[
'E'].isin([
'one',
'four']) 2013-01-01 True
2013-01-02 True
2013-01-03 False
2013-01-04 False
2013-01-05 True
2013-01-06 False
2013-01-07 False
2013-01-08 True
2013-01-09 False
2013-01-10 False
Freq: D, Name: E, dtype: bool
df2[df2[
'E'].isin([
'one',
'four'])]
ABCDE
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 | one |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 | one |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | -0.952401 | four |
2013-01-08| -1.246918 | 1.530266 | 1.761499 | 0.940741 | four |
增 Create
s1 = pd.Series([
1,
2,
3,
4,
5,
6], index=pd.date_range(
'20130102', periods=
6))
s1 2013-01-02 1
2013-01-03 2
2013-01-04 3
2013-01-05 4
2013-01-06 5
2013-01-07 6
Freq: D, dtype: int64
df2[
'F'] = s1
df2
ABCDEF
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 0.778106 | one | NaN |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 0.268955 | one | 1.0 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 1.643397 | two | 2.0 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | -0.442010 | three | 3.0 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | -0.952401 | four | 4.0 |
2013-01-06| 1.371702 | -1.028873 | -1.470106 | -0.113098 | three | 5.0 |
2013-01-07| 0.126720 | -0.251519 | -2.212507 | 1.050036 | five | 6.0 |
2013-01-08| -1.246918 | 1.530266 | 1.761499 | 0.940741 | four | NaN |
2013-01-09| 0.941099 | -2.420932 | 1.927863 | -0.549143 | three | NaN |
2013-01-10| 1.951555 | -0.264012 | -0.171690 | 0.869293 | five | NaN |
改 Update
df2.loc[:,
'D'] 2013-01-01 0.778106
2013-01-02 0.268955
2013-01-03 1.643397
2013-01-04 -0.442010
2013-01-05 -0.952401
2013-01-06 -0.113098
2013-01-07 1.050036
2013-01-08 0.940741
2013-01-09 -0.549143
2013-01-10 0.869293
Freq: D, Name: D, dtype: float64
df2.loc[:,
'D'] =
5
df2
ABCDEF
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 5 | one | NaN |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 5 | one | 1.0 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 5 | two | 2.0 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | 5 | three | 3.0 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | 5 | four | 4.0 |
2013-01-06| 1.371702 | -1.028873 | -1.470106 | 5 | three | 5.0 |
2013-01-07| 0.126720 | -0.251519 | -2.212507 | 5 | five | 6.0 |
2013-01-08| -1.246918 | 1.530266 | 1.761499 | 5 | four | NaN |
2013-01-09| 0.941099 | -2.420932 | 1.927863 | 5 | three | NaN |
2013-01-10| 1.951555 | -0.264012 | -0.171690 | 5 | five | NaN |
df2.iloc[
1,
3] 5
df2.iloc[
1,
3] =
10.1
df2
ABCDEF
2013-01-01| 0.754077 | -0.346202 | -0.557050 | 5.0 | one | NaN |
2013-01-02| 0.103394 | -1.051044 | -0.413054 | 10.1 | one | 1.0 |
2013-01-03| 0.174730 | 2.056007 | 1.781379 | 5.0 | two | 2.0 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | 5.0 | three | 3.0 |
2013-01-05| 0.076178 | -0.518970 | 1.142290 | 5.0 | four | 4.0 |
2013-01-06| 1.371702 | -1.028873 | -1.470106 | 5.0 | three | 5.0 |
2013-01-07| 0.126720 | -0.251519 | -2.212507 | 5.0 | five | 6.0 |
2013-01-08| -1.246918 | 1.530266 | 1.761499 | 5.0 | four | NaN |
2013-01-09| 0.941099 | -2.420932 | 1.927863 | 5.0 | three | NaN |
2013-01-10| 1.951555 | -0.264012 | -0.171690 | 5.0 | five | NaN |
df3 = df.copy()
df3[df3 >
0] = -df3
df3
ABCD
2013-01-01| -0.754077 | -0.346202 | -0.557050 | -0.778106 |
2013-01-02| -0.103394 | -1.051044 | -0.413054 | -0.268955 |
2013-01-03| -0.174730 | -2.056007 | -1.781379 | -1.643397 |
2013-01-04| -0.950517 | -0.226887 | -0.097138 | -0.442010 |
2013-01-05| -0.076178 | -0.518970 | -1.142290 | -0.952401 |
2013-01-06| -1.371702 | -1.028873 | -1.470106 | -0.113098 |
2013-01-07| -0.126720 | -0.251519 | -2.212507 | -1.050036 |
2013-01-08| -1.246918 | -1.530266 | -1.761499 | -0.940741 |
2013-01-09| -0.941099 | -2.420932 | -1.927863 | -0.549143 |
2013-01-10| -1.951555 | -0.264012 | -0.171690 | -0.869293 |
總結
以上是生活随笔為你收集整理的Pandas基础复习-DataFrame的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。