【Python】20个Pandas数据实战案例,干货多多
今天我們講一下pandas當中的數據過濾內容,小編之前也寫過也一篇相類似的文章,但是是基于文本數據的過濾,大家有興趣也可以去查閱一下。
下面小編會給出大概20個案例來詳細說明數據過濾的方法,首先我們先建立要用到的數據集,代碼如下
import?pandas?as?pd df?=?pd.DataFrame({"name":?["John","Jane","Emily","Lisa","Matt"],"note":?[92,94,87,82,90],"profession":["Electrical?engineer","Mechanical?engineer","Data?scientist","Accountant","Athlete"],"date_of_birth":["1998-11-01","2002-08-14","1996-01-12","2002-10-24","2004-04-05"],"group":["A","B","B","A","C"] })output
name??note???????????profession?date_of_birth?group 0???John????92??Electrical?engineer????1998-11-01?????A 1???Jane????94??Mechanical?engineer????2002-08-14?????B 2??Emily????87???????Data?scientist????1996-01-12?????B 3???Lisa????82???????????Accountant????2002-10-24?????A 4???Matt????90??????????????Athlete????2004-04-05?????C篩選表格中的若干列
代碼如下
df[["name","note"]]output
name??note 0???John????92 1???Jane????94 2??Emily????87 3???Lisa????82 4???Matt????90再篩選出若干行
我們基于上面搜索出的結果之上,再篩選出若干行,代碼如下
df.loc[:3,?["name","note"]]output
name??note 0???John????92 1???Jane????94 2??Emily????87 3???Lisa????82根據索引來過濾數據
這里我們用到的是iloc方法,代碼如下
df.iloc[:3,?2]output
0????Electrical?engineer 1????Mechanical?engineer 2?????????Data?scientist通過比較運算符來篩選數據
df[df.note?>?90]output
name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A 1??Jane????94??Mechanical?engineer????2002-08-14?????Bdt屬性接口
dt屬性接口是用于處理時間類型的數據的,當然首先我們需要將字符串類型的數據,或者其他類型的數據轉換成事件類型的數據,然后再處理,代碼如下
df.date_of_birth?=?df.date_of_birth.astype("datetime64[ns]") df[df.date_of_birth.dt.month==11]output
name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A或者我們也可以
df[df.date_of_birth.dt.year?>?2000]output
name??note???????????profession?date_of_birth?group 1??Jane????94??Mechanical?engineer????2002-08-14?????B 3??Lisa????82???????????Accountant????2002-10-24?????A 4??Matt????90??????????????Athlete????2004-04-05?????C多個條件交集過濾數據
當我們遇上多個條件,并且是交集的情況下過濾數據時,代碼應該這么來寫
df[(df.date_of_birth.dt.year?>?2000)?&??(df.profession.str.contains("engineer"))]output
name??note???????????profession?date_of_birth?group 1??Jane????94??Mechanical?engineer????2002-08-14?????B多個條件并集篩選數據
當多個條件是以并集的方式來過濾數據的時候,代碼如下
df[(df.note?>?90)?|?(df.profession=="Data?scientist")]output
name??note???????????profession?date_of_birth?group 0???John????92??Electrical?engineer????1998-11-01?????A 1???Jane????94??Mechanical?engineer????2002-08-14?????B 2??Emily????87???????Data?scientist????1996-01-12?????BQuery方法過濾數據
Pandas當中的query方法也可以對數據進行過濾,我們將過濾的條件輸入
df.query("note?>?90")output
name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????A 1??Jane????94??Mechanical?engineer????2002-08-14?????B又或者是
df.query("group=='A'?and?note?>?89")output
name??note???????????profession?date_of_birth?group 0??John????92??Electrical?engineer????1998-11-01?????Ansmallest方法過濾數據
pandas當中的nsmallest以及nlargest方法是用來找到數據集當中最大、最小的若干數據,代碼如下
df.nsmallest(2,?"note")output
name??note??????profession?date_of_birth?group 3???Lisa????82??????Accountant????2002-10-24?????A 2??Emily????87??Data?scientist????1996-01-12?????Bdf.nlargest(2,?"note")output
name??note???????????profession?date_of_birth?group 1??Jane????94??Mechanical?engineer????2002-08-14?????B 0??John????92??Electrical?engineer????1998-11-01?????Aisna()方法
isna()方法功能在于過濾出那些是空值的數據,首先我們將表格當中的某些數據設置成空值
df.loc[0,?"profession"]?=?np.nan df[df.profession.isna()]output
name??note?profession?date_of_birth?group 0??John????92????????NaN????1998-11-01?????Anotna()方法
notna()方法上面的isna()方法正好相反的功能在于過濾出那些不是空值的數據,代碼如下
df[df.profession.notna()]output
name??note???????????profession?date_of_birth?group 1???Jane????94??Mechanical?engineer????2002-08-14?????B 2??Emily????87???????Data?scientist????1996-01-12?????B 3???Lisa????82???????????Accountant????2002-10-24?????A 4???Matt????90??????????????Athlete????2004-04-05?????Cassign方法
pandas當中的assign方法作用是直接向數據集當中來添加一列
df_1?=?df.assign(score=np.random.randint(0,100,size=5)) df_1output
name??note???????????profession?date_of_birth?group??score 0???John????92??Electrical?engineer????1998-11-01?????A?????19 1???Jane????94??Mechanical?engineer????2002-08-14?????B?????84 2??Emily????87???????Data?scientist????1996-01-12?????B?????68 3???Lisa????82???????????Accountant????2002-10-24?????A?????70 4???Matt????90??????????????Athlete????2004-04-05?????C?????39explode方法
explode()方法直譯的話,是爆炸的意思,我們經常會遇到這樣的數據集
Name????????????Hobby 0???呂布??[打籃球,?玩游戲,?喝奶茶] 1???貂蟬???????[敲代碼,?看電影] 2???趙云????????[聽音樂,?健身]Hobby列當中的每行數據都以列表的形式集中到了一起,而explode()方法則是將這些集中到一起的數據拆開來,代碼如下
Name?Hobby 0???呂布???打籃球 0???呂布???玩游戲 0???呂布???喝奶茶 1???貂蟬???敲代碼 1???貂蟬???看電影 2???趙云???聽音樂 2???趙云????健身當然我們會展開來之后,數據會存在重復的情況,
df.explode('Hobby').drop_duplicates().reset_index(drop=True)output
Name?Hobby 0???呂布???打籃球 1???呂布???玩游戲 2???呂布???喝奶茶 3???貂蟬???敲代碼 4???貂蟬???看電影 5???趙云???聽音樂 6???趙云????健身往期精彩回顧適合初學者入門人工智能的路線及資料下載(圖文+視頻)機器學習入門系列下載中國大學慕課《機器學習》(黃海廣主講)機器學習及深度學習筆記等資料打印《統計學習方法》的代碼復現專輯 AI基礎下載機器學習交流qq群955171419,加入微信群請掃碼:總結
以上是生活随笔為你收集整理的【Python】20个Pandas数据实战案例,干货多多的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 【机器学习】Auto-Sklearn:使
- 下一篇: 用python画三维图、某区域的高程,p