pandas 机器学习_机器学习的PANDAS
pandas 機器學習
Pandas is one of the tools in Machine Learning which is used for data cleaning and analysis. It has features which are used for exploring, cleaning, transforming and visualizing from data.
Pandas是機器學習中用于數據清理和分析的工具之一。 它具有用于根據數據進行探索,清理,轉換和可視化的功能。
Steps in Machine Learning Process機器學習過程中的步驟Now let us understand and learn more about Pandas-
現在,讓我們了解并了解有關Pandas的更多信息-
1.Introduction to Pandas
1,熊貓簡介
Pandas is an open-source python package built on top of Numpy developed by Wes McKinney. It is used as one of the most important data cleaning and analysis tool. It provides fast, flexible, and expressive data structures.
Pandas是一個由Wes McKinney開發的基于Numpy的開源python軟件包。 它被用作最重要的數據清理和分析工具之一。 它提供了快速,靈活和富有表現力的數據結構。
Pandas is derived from the term “Panel-data-s” an econometrics term for data sets include observations over multiple time periods for the same individuals. -source Wikipedia
大熊貓從術語“ 泛 EL- DA TA- 的 ”計量經濟學術語用于數據集包括在用于同一個體多個時間段的觀察而得。 源維基百科
2. How to install Pandas?
2.如何安裝熊貓?
You can install Pandas by using the following commands:
您可以使用以下命令安裝Pandas:
# to install pandas in terminal or command line use one of the commandspip install pandas
# or
conda install pandas
# To install pandas in jupyter notebook use this command
!pip install pandas
3. How to import Pandas?
3.如何進口大熊貓?
import pandas as pdBy using the above command you can easily import pandas library.
通過使用以上命令,您可以輕松導入pandas庫。
4.Pandas Data Structures
4,熊貓數據結構
Pandas deals with three types of data structures :
熊貓處理三種類型的數據結構:
- Series 系列
- DataFrame 數據框
- Panel 面板
a)Series is a one-dimensional array-like structure with homogeneous data. The size of the series is immutable(cannot be changed) but its values are mutable.
a)系列是具有同類數據的一維數組狀結構。 系列的大小是不可變的(無法更改),但是其值是可變的。
Examples of Series系列實例b) DataFrame is a two-dimensional array-like structure with heterogeneous data. Data is aligned in a tabular manner(Rows &Columns form). The size and values of DataFrame are mutable.
b)DataFrame是具有異類數據的二維數組狀結構。 數據以表格形式對齊(行和列形式)。 DataFrame的大小和值是可變的。
Creating DataFrame from Series從系列創建DataFramec) The panel is a three-dimensional data structure with heterogeneous data. It is hard to represent the panel in graphical representation. But it can be illustrated as a container of DataFrame. The size and values of a Panel are mutable.
c)面板是具有異構數據的三維數據結構。 很難用圖形表示面板。 但這可以說明為DataFrame的容器。 面板的大小和值是可變的。
Representation of Series, DataFrame & Panel系列,DataFrame和面板的表示Note: Series and DataFrame are the most widely used data structure in pandas. Therefore, I will only discuss these two data structures later.
注意:Series和DataFrame是熊貓中使用最廣泛的數據結構。 因此,我將僅在稍后討論這兩個數據結構。
5.Creating Series and Data Frames
5,創建系列和數據框
Series :
系列:
pandas.Series(data=None , index=None , dtype=None,copy = false)
pandas.Series(數據=無,索引=無,dtype =無,復制=假)
parameters:
參數:
data: data takes various forms like ndarray, list, constants
數據:數據采用各種形式,例如ndarray,list,常量
index: index values must be unique and hashable, the same length as data. Default np.arrange(n) if no index is passed.
index:索引值必須是唯一且可哈希的,且長度與數據相同。 如果未傳遞索引,則默認為np.arrange(n)。
dtype: dtype is for data type. If None, the data type will be inferred
dtype:dtype用于數據類型。 如果為None,則將推斷數據類型
copy: Copy input data. default false
復制:復制輸入數據。 默認為假
a)Creating Series from ndarray
a)從ndarray創建系列
#import the pandas library and aliasing as pdimport pandas as pdimport numpy as np
data = np.array(['a','b','c','d'])
example_1 = pd.Series(data,index=[0,1,2,3)
print example_1
out[1]: 0 a
1 b
2 c
3 d
b)Creating Series from Dictionary
b)從字典創建系列
import pandas as pdimport numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s
out:
a 0.0
b 1.0
c 2.0
dtype: float64Note:Dictionary keys are used to construct index here.
c)Creating Series from scalar
c)從標量創建系列
import pandas as pdimport numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s
out:
0 5
1 5
2 5
3 5
dtype: int64
Dataframe:
數據框:
pandas.DataFrame(data=None , index=None , dtype=None,copy = false)
pandas.DataFrame(data = None,index = None,dtype = None,copy = false)
parameters:
參數:
data: data takes various forms like ndarray, list, constants
數據:數據采用各種形式,例如ndarray,列表,常量
index-: index values must be unique and hashable, the same length as data. Default np.arrange(n) if no index is passed.
index-:索引值必須是唯一的且可哈希的,與數據的長度相同。 如果未傳遞索引,則默認為np.arrange(n) 。
dtype: dtype is for data type. If None, the data type will be inferred
dtype:dtype用于數據類型。 如果為None,則將推斷數據類型
copy: Copy input data. default false
復制:復制輸入數據。 默認為假
a)Creating DataFrame from List
a)從列表創建DataFrame
Example-1:import pandas as pddata = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
out:
0
0 1
1 2
2 3
3 4
4 5Example-2-
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
out:
Name Age
0 Alex 10.0
1 Bob 12.0
2 Clarke 13.0Note: the dtype parameter changes the type of Age column to floating point.
b)Create a DataFrame from Dict of ndarrays / Lists
b)從ndarrays / Lists的Dict創建一個DataFrame
All the ndarrays must be of the same length. If the index is passed, then the length of the index should equal to the length of the arrays.
所有ndarray的長度必須相同。 如果傳遞了索引,則索引的長度應等于數組的長度。
If no index is passed, then by default, the index will be range(n), where n is the array length.
如果沒有傳遞索引,則默認情況下,索引將是range(n),其中n是數組長度。
Example-1import pandas as pd
data ={'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age',[28,34,29,42]}
df = pd.DataFrame(data)
print df
out:
Age Name
0 28 Tom
1 34 Jack
2 29 Steve
3 42 RickyNote: Here,0,1,2,3 are the default index assinged each using the function range(n).Example-2
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
out:
Age Name
rank1 28 Tom
rank2 34 Jack
rank3 29 Steve
rank4 42 Ricky
6.Pandas: How to read and write files?
6.Pandas:如何讀寫文件?
Pandas can read and write any types of files such as Excel, CSV and many other files, it is one of its crucial features. Methods like df.read_*() and df.to_*() helps you to work with all types of files effectively.
熊貓可以讀寫任何類型的文件,例如Excel,CSV和許多其他文件,這是其重要功能之一。 諸如df.read _ *()和df.to _ *()之類的方法可幫助您有效地處理所有類型的文件。
I will use Iris-dataset for the explanation.
我將使用虹膜數據集進行解釋。
source: https://www.kaggle.com/uciml/iris
來源: https : //www.kaggle.com/uciml/iris
a)Reading .csv file
a)讀取.csv文件
In [1]:import pandas as pdIn [2]:data = pd.read_csv("Iris_dataset.csv")
In [3]:data.head()
out[3]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
b)Writing file in .xlsx format
b)以.xlsx格式寫入文件
In [4]:import pandas as pdIn [5]:data= pd.read_csv("Iris_dataset.csv")
In [6]:type(data)
out[6]:pandas.core.frame.DataFrame
In [7]:data_csv = data.to_excel('iris_data.xlsx')
Similarly, you can read and write other files also.
同樣,您也可以讀取和寫入其他文件。
7. DataFrame columns and rows(.shape) & Number of dimensions.
7. DataFrame的列和行(.shape)&維數。
a)df.shape function in pandas returns the output as (m,n) where m is a number of rows and n is the number of columns in the data frame.
a) pandas中的df.shape函數以(m,n)返回輸出,其中m是數據幀中的行數,n是數據幀中的列數。
b)df.ndim return an int representing the number of axes/array dimensions. It will return 1 for Series and 2 for DataFrame.
b) df.ndim返回一個表示軸/數組維數的整數。 對于Series,它將返回1,對于DataFrame,它將返回2。
In [8]:data.shapeout[8]:(150,6)
In [9]:data.ndim
out[9]: 2
8. Preview DataFrame with .head()& .tail()
8.使用.head()和.tail()預覽DataFrame
a)df.head(n=no. of rows to be returned) returns the first n rows from the DataFrame.
a) df.head(n =要返回的行數)返回DataFrame的前n行。
b)df.tail(n=no. of rows to be returned) returns the last n rows from the DataFrame.
b) df.tail(n =要返回的行數)返回DataFrame的最后n行。
#df.head()In [3]:data.head(4)
out[3]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
Note:in head(n) you can cange value of n which shows n no. of rows in output#df.tail()
In [4]:data.tail()
out[4]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
146 6.7 3.0 5.2 2.3 Iris_virginica
147 6.3 2.5 5.0 1.9 Iris-virginica
148 6.5 3.0 5.2 2.0 Iris-virginica
149 6.2 3.4 5.4 2.3 Iris-virginica
150 5.9 3.0 5.1 1.8 Iris-virginica
9.High-level statistics with pandas
9,大熊貓統計
a)Pandas DataFrame.describe()
a)熊貓DataFrame.describe()
The describe() method is used for calculating some statistical data like percentile, mean and std of the numerical values of the Series or DataFrame. It analyzes both numeric and object series and also the DataFrame column sets of mixed data types.
describe()方法用于計算一些統計數據,例如Series或DataFrame的數值的百分位數,均值和標準差。 它分析數字和對象系列以及混合數據類型的DataFrame列集。
In [5]data.describe()out[5]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000#for one column
In [8]:data['SepalLengthCm'].describe()
out[8]:
count 150.000000
mean 5.843333
std 0.828066
min 4.300000
25% 5.100000
50% 5.800000
75% 6.400000
max 7.900000
Name: SepalLengthCm, dtype: float64
10. .dtypes and .astype() functions in Pandas
10. Pandas中的.dtypes和.astype()函數
To check the data types of columns of the data frame .dtypes function is used. Series have only one type of data types but in a data frame, there can be mixed data types.
檢查數據框列的數據類型.dtypes函數 用來。 系列只有一種類型的數據類型,但是在一個數據幀中,可以有多種數據類型。
#for iris_datasetIn [1]:data.dtypes
out[1]:Id int64
SepalLengthCm float64
SepalWidthCm float64
PetalLengthCm float64
PetalWidthCm float64
Species object
dtype: object
Note: Any column in the data frame having character/string appear as ‘object’ datatypes.
注意:數據框中具有字符/字符串的任何列均顯示為“對象”數據類型。
If you to change the data type of a specific column in the data frame, use the .astype() function.
如果要更改數據框中特定列的數據類型,請使用.astype()函數。
For example, to see the ‘SepalLengthCm’ column as a string
例如,以字符串形式查看“ SepalLengthCm”列
In [2]: data['SepalLengthCm'].astype(str)out[2]: 0 5.1
1 4.9
2 4.7
3 4.6
4 5.0
...
145 6.7
146 6.3
147 6.5
148 6.2
149 5.9
Name: SepalLengthCm, Length: 150, dtype: object
11. Selection of columns and rows in Pandas DataFrame
11.在Pandas DataFrame中選擇列和行
i)Three primary methods for selecting columns from data frames in pandas.
i)從熊貓數據框中選擇列的三種主要方法。
a)df.column_name (Dot notation)
a)df.column_name(點表示法)
In [3]:data.Species.head()out[3]:0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Name: Species, dtype: object
b)df[‘column_name’] (using square braces and the name of the column as a string)
b)df ['column_name'](使用方括號和列名作為字符串)
In [3]:data['Species'].head()out[3]:0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Name: Species, dtype: object
c)data.iloc[0:0, <column_number>](using numeric indexing and the iloc selector)
c)data.iloc [0:0,<column_number>](使用數字索引和iloc選擇器)
In [3]:data.iloc[0:,5].head()out[3]:0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
Name: Species, dtype: object
ii)Selecting rows
ii)選擇行
Rows in a DataFrame are selected, typically, using the iloc/loc selection methods, or using logical selectors (selecting based on the value of another column or variable).
通常使用iloc / loc選擇方法或使用邏輯選擇器(基于另一列或變量的值進行選擇)來選擇DataFrame中的行。
#selecting rows using iloc/locExample-1In [7]:data.iloc[0:5,].head()out[7]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosaExample-2In [8]:data.iloc[4, : ]
out[8]:Id 5
SepalLengthCm 5
SepalWidthCm 3.6
PetalLengthCm 1.4
PetalWidthCm 0.2
Species Iris-setosa
Name: 4, dtype: object
iii)Selecting both rows and columns at a time
iii)一次選擇行和列
#selecting column number 1 and five number of row's values of that columnMethod-1In [9]:data.iloc[0:5,1].head()
out[9]:0 5.1
1 4.9
2 4.7
3 4.6
4 5.0
Name: SepalLengthCm, dtype: float64Method-2In [10]:data.loc[0:5, ['SepalLengthCm']].head()
out[10]: SepalLengthCm
0 5.1
1 4.9
2 4.7
3 4.6
4 5.0
12.Filtering
12,過濾
In [11]:data[data['Species']=='Iris-virginica'].head()out[11]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
101 6.3 3.3 6.0 2.5 Iris-virginica 102 5.8 2.7 5.1 1.9 Iris-virginica
103 7.1 3.0 5.9 2.1 Iris-virginica
104 6.3 2.9 5.6 1.8 Iris-virginica
105 6.5 3.0 5.8 2.2 Iris-virginicaIn [12]:data[data['SepalLengthCm']>6.3].head()
out[12]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
51 7.0 3.2 4.7 1.4 Iris-versicolor
52 6.4 3.2 4.5 1.5 Iris-versicolor
53 6.9 3.1 4.9 1.5 Iris-versicolor
55 6.5 2.8 4.6 1.5 Iris-versicolor
59 6.6 2.9 4.6 1.3 Iris-versicolorIn [12]:data[data['SepalLengthCm']<6.3].head()
out[12]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
13. Manipulating DataFrames in Pandas
13.在熊貓中操作數據框
a)Renaming Columns in a data frame
a)重命名數據框中的列
Column renames can be achieved easily in Pandas using the DataFrame rename( ) function.
使用DataFrame named ()函數可以在Pandas中輕松實現列重命名 。
df.rename(columns = {“old_column_name”: “new_column_name”})
df.rename(columns = {“ old_column_name”:“ new_column_name”})
#Renaming one columnIn [13]:data.rename(columns={'SepalLengthCm':'Sepal_Length'})
out[13]:
Id Sepal_Length SepalWidthCm PetalLengthCm PetalWidthCm Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa#Renaming multi-columns
In [14]:data.rename(columns={'SepalLengthCm':'Sepal_Length','SepalWidthCm':'Sepal_Width', 'PetalLengthCm':'Petal_Length','PetalWidthCm':'Petal_Width'})
out[13]:
Id Sepal_Length Sepal_Width Petal_Length Petal_Width Species
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
b)Deleting rows and columns in a data frame
b)刪除數據框中的行和列
Pandas use the “drop( )” function to delete rows and columns from DataFrames
熊貓使用“ drop ()”功能從DataFrames中刪除行和列
df.drop(name _or _names ,axis= ‘columns’/1 OR ‘rows’ /0)
df.drop(name或_names,axis ='columns'/ 1 OR'rows'/ 0)
i) Deleting columns from Data Frame.
i)從數據框中刪除列。
#delete the 'Id' column from the dataframeIn [15]:data.drop('Id',axis= 1).head()#other method data.drop(columns="Id").head()out[15]:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
5.1 3.5 1.4 0.2 Iris-setosa
4.9 3.0 1.4 0.2 Iris-setosa
4.7 3.2 1.3 0.2 Iris-setosa
4.6 3.1 1.5 0.2 Iris-setosa
5.0 3.6 1.4 0.2 Iris-setosa#Delete multiple columns from the dataframeIn [16]:data.drop(["Id","SepalWidthCm"],axis= 1).head()
out[16]:SepalLengthCm PetalLengthCm PetalWidthCm Species
5.1 1.4 0.2 Iris-setosa
4.9 1.4 0.2 Iris-setosa
4.7 1.3 0.2 Iris-setosa
4.6 1.5 0.2 Iris-setosa
5.0 1.4 0.2 Iris-setosa
ii) To delete rows from a DataFrame using drop () function set axis to 0.
ii)要使用drop()函數從DataFrame刪除行,請將軸設置為0。
# Delete the rows with labels 0,1In [17]:data.drop([0,1],axis=0).head(3)
out[17]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa# Delete the first two rows using iloc selector
In[18]:data.iloc[2:,].head(3)
out[18]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa
c) Dealing with Missing Values
c)處理缺失的價值觀
Many datasets have missing values, pandas provide multiple ways to deal with it.
許多數據集缺少值,熊貓提供了多種處理方法。
i) Finding missing values
i)尋找缺失的價值
To detect any missing values in a data frame, provide functions like isnull(), isna(). Both of them do the same thing.
要檢測數據幀中的任何缺失值,請提供isull(),isna()之類的函數。 他們倆都做同樣的事情。
In [19]:data.isna().head(3)out[19]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
False False False False False False
False False False False False False
False False False False False False#df.isna() returns the dataframe with boolean values indicating missing values.(True=missing value is present ,False= missing value is not present.)
Note: You can also choose to use notna() which is just the opposite of isna().
注意:您也可以選擇使用notna(),這是ISNA正好相反()。
Using df.isna().sum() you can get the number of missing values in each column.
使用df.isna()。sum()可以獲取每列中缺失值的數量。
In [20]:data.isna().sum()out[20]:Id 0
SepalLengthCm 0
SepalWidthCm 0
PetalLengthCm 0
PetalWidthCm 0
Species 0
dtype: int64
ii) Drop missing values.
ii)刪除缺失值。
To drop a row or column with missing values, pandas provide dropna() function.
要刪除缺少值的行或列,pandas提供dropna()函數。
df.dropna(axis=0 or 1,how = ‘any’ or ‘all’)
df.dropna(axis = 0或1,how ='any'或'all')
- ‘any’: drop if there is any missing value 'any':如果缺少任何值,則丟棄
- ‘all’: drop if all values are missing 'all':如果所有值都缺失,則刪除
iii)Replacing missing values
iii)替換缺失值
To replace any missing value from the data frame, pandas provide fillna() function.
為了替換數據框中的所有缺失值,熊貓提供了fillna()函數。
14. Computation with pandas DataFrame
14.用pandas DataFrame計算
a) Element-By-Element Mathematical Operations in DataFrame
a)DataFrame中的逐元素數學運算
Many operations such as addition, subtraction, multiplication, division e.t.c can be done elementwise in any data frame by the following ways:
可以通過以下方法在任何數據幀中逐元素完成許多操作,例如加,減,乘,除等。
df+ λ (addition where λ is any scalar value)
df +λ (其中λ是任何標量值的加法)
Similarly, df — λ is for subtraction & df*λ is for multiplication
類似地,df-λ用于減法,df *λ用于乘法
Example:#adding 1 element-by-element in Column ="SepalLengthCm"
In [21]:data["SepalLengthCm"] + 1
out[21]:0 6.1
1 5.9
2 5.7
3 5.6
4 6.0
...
145 7.7
146 7.3
147 7.5
148 7.2
149 6.9
Name: SepalLengthCm, Length: 150, dtype: float64
#similarly you can do other operatio
b)Statistical operations in DataFrame
b)DataFrame中的統計操作
i) count( ) : Number of non-null observations
i)count():非空觀察數
Example:In [23]:data["SepalLengthCm"].count()
out[23]:150
ii) sum( ): Sum of values
ii)sum():值的總和
Example:In [24]:data["SepalLengthCm"].sum()
out[24]:876.5
iii) mean(): Mean of values
iii)mean():均值
Example:In [25]:data["SepalLengthCm"].mean()
out[25]:5.843333333333334
Other important satistical functions in pandas are:-
熊貓的其他重要的諷刺功能是:
mad( ): Mean absolute deviation
mad():平均絕對偏差
median( ):Arithmetic median of values
中位數():數值的算術中位數
min( ): Minimum
min():最小值
max( ): Maximum
max():最大值
mode( ): Mode
模式():模式
prod( ): Product of values
prod():價值的乘積
std( ): Bessel-corrected sample standard deviation
std():貝塞爾校正的樣本標準偏差
var( ): Unbiased variance
var():無偏方差
sem( ): Standard error of the mean
sem():平均值的標準誤
skew( ): Sample skewness (3rd moment)
偏斜度():樣本偏斜度(第3瞬間)
Kurt( ): Sample kurtosis (4th moment)
Kurt():峰度樣本(第4刻)
quantile( ): Sample quantile (Value at %)
分位數():樣本分位數(%值)
value_counts( ):Count of unique values
value_counts():唯一值的計數
15. Additional useful functions in Pandas
15.熊貓的其他有用功能
a) groupby()-this function is used for grouping data in pandas
a)groupby()-此函數用于對熊貓中的數據進行分組
Group by a single column in a data frame按數據框中的單個列分組 Example:In [26]:data.groupby("Species").count()
out[26]: Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
Species
Iris-setosa 50 50 50 50 50
Iris-versicolor 50 50 50 50 50
Iris-virginica 50 50 50 50 50
These are some important functions that are widely used in Machine Learning. You can overwhelm by the number of pandas functions, so stick to the most used pandas functions for better understanding of this beautiful library and get most out of it.
這些是機器學習中廣泛使用的一些重要功能。 您可能會因熊貓函數的數量而感到不知所措,因此請堅持使用最常用的熊貓函數,以便更好地了解這個漂亮的庫并充分利用它。
Happy coding 😃
快樂編碼😃
翻譯自: https://medium.com/mlpoint/pandas-for-machine-learning-53846bc9a98b
pandas 機器學習
總結
以上是生活随笔為你收集整理的pandas 机器学习_机器学习的PANDAS的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 我的世界怎么创建房间
- 下一篇: 文本摘要提取_了解自动文本摘要-1:提取