當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Pandas 统计分析基础笔记4 任务4.4 使用分组聚合进行组内计算

發布時間：2025/3/11 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 Pandas 统计分析基础笔记4 任务4.4 使用分组聚合进行组内计算小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

文章目錄

pandas_任務4.4 使用分組聚合進行組內計算
4.4.1 使用groupby方法拆分數據
- 代碼 4-51 對菜品訂單詳情表依據訂單編號分組
- 代碼 4-52 GroupBy 類求均值,標準差,中位數
- 代碼 4-53 agg和aggregate函數的參數及其說明
- 代碼 4-54 使用agg分別求字段的不同統計量
- 代碼 4-55 使用agg方法求不同字段的不同數目統計量
- 代碼 4-56 在agg方法中使用自定義函數
- 代碼 4-57 agg方法中使用的自定義函數含NumPy中的函數
代碼 4-58 使用agg方法做簡單的聚合
- 代碼 4-59 使用ag方法對分組數據使用不同的聚合函數
4.4.3 使用apply方法聚合數據
- 代碼 4-60 apply方法的基本用法
- 代碼 4-61 使用apply方法進行聚合操作
使用transform 方法聚合數據
- 代碼 4-62 使用transform方法將銷量和售價翻倍
- 代碼 4-63 使用transform實現組內離差標準化
4.4.5任務實現
- 代碼 4-64 訂單詳情按照日期分組
- 代碼 4-65 求分組后的訂單詳情表每日菜品銷售的均價,中位數
- 代碼 4-66 求取訂單詳情表中單日菜品總銷量

pandas_任務4.4 使用分組聚合進行組內計算
ipynb格式瀏覽

4.4.1 使用groupby方法拆分數據

代碼 4-51 對菜品訂單詳情表依據訂單編號分組

import pandas as pd import numpy as np from sqlalchemy import create_engine engine = create_engine('mysql+pymysql://root:123456@localhost:3306/zuoye') detail = pd.read_sql_table('meal_order_detail1',con = engine) detailGroup = detail[['order_id','counts','amounts']].groupby(by = 'order_id') print('分組后的訂單詳情表為：',detailGroup) 分組后的訂單詳情表為： <pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002547B8C0DC8>D:\Study\anaconda\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\\xD6\\xD0\\xB9\\xFA\\xB1\\xEA...' for column 'VARIABLE_VALUE' at row 1")result = self._query(query)

代碼 4-52 GroupBy 類求均值,標準差,中位數

print('訂單詳情表分組后前5組每組的均值為：\n', detailGroup.mean().head())print('訂單詳情表分組后前5組每組的標準差為：\n', detailGroup.std().head())print('訂單詳情表分組后前5組每組的大小為：','\n', detailGroup.size().head()) 訂單詳情表分組后前5組每組的均值為：counts amounts order_id 1002 1.0000 32.000 1003 1.2500 30.125 1004 1.0625 43.875 1008 1.0000 63.000 1011 1.0000 57.700 訂單詳情表分組后前5組每組的標準差為：counts amounts order_id 1002 0.00000 16.000000 1003 0.46291 21.383822 1004 0.25000 31.195886 1008 0.00000 64.880660 1011 0.00000 50.077828 訂單詳情表分組后前5組每組的大小為： order_id 1002 7 1003 8 1004 16 1008 5 1011 10 dtype: int64

代碼 4-53 agg和aggregate函數的參數及其說明

print('訂單詳情表的菜品銷量與售價的和與均值為：\n',detail[['counts','amounts']].agg([np.sum,np.mean])) 訂單詳情表的菜品銷量與售價的和與均值為：counts amounts sum 3088.000000 125992.000000 mean 1.111191 45.337172 訂單詳情表的菜品銷量總和與售價的均值為：counts 3088.000000 amounts 45.337172 dtype: float64

代碼 4-54 使用agg分別求字段的不同統計量

print('訂單詳情表的菜品銷量總和與售價的均值為：\n',detail.agg({'counts':np.sum,'amounts':np.mean})) 訂單詳情表的菜品銷量總和與售價的均值為：counts 3088.000000 amounts 45.337172 dtype: float64

代碼 4-55 使用agg方法求不同字段的不同數目統計量

print('菜品訂單詳情表的菜品銷量總和與售價的總和與均值為：\n',detail.agg({'counts':np.sum,'amounts':[np.mean,np.sum]})) 菜品訂單詳情表的菜品銷量總和與售價的總和與均值為：counts amounts mean NaN 45.337172 sum 3088.0 125992.000000

代碼 4-56 在agg方法中使用自定義函數

##自定義函數求兩倍的和 def DoubleSum(data):s = data.sum()*2return s print('菜品訂單詳情表的菜品銷量兩倍總和為：','\n',detail.agg({'counts':DoubleSum},axis = 0)) 菜品訂單詳情表的菜品銷量兩倍總和為： counts 6176.0 dtype: float64

代碼 4-57 agg方法中使用的自定義函數含NumPy中的函數

def DoubleSum1(data):s = np.sum(data)*2return s print('訂單詳情表的菜品銷量兩倍總和為：\n',detail.agg({'counts':DoubleSum1},axis = 0).head())print('訂單詳情表的菜品銷量與售價的和的兩倍為：\n',detail[['counts','amounts']].agg(DoubleSum1)) 訂單詳情表的菜品銷量兩倍總和為：counts 0 2.0 1 2.0 2 2.0 3 2.0 4 2.0 訂單詳情表的菜品銷量與售價的和的兩倍為：counts 6176.0 amounts 251984.0 dtype: float64

代碼 4-58 使用agg方法做簡單的聚合

print('訂單詳情表分組后前3組每組的均值為：\n', detailGroup.agg(np.mean).head(3))print('訂單詳情表分組后前3組每組的標準差為：\n', detailGroup.agg(np.std).head(3)) 訂單詳情表分組后前3組每組的均值為：counts amounts order_id 1002 1.0000 32.000 1003 1.2500 30.125 1004 1.0625 43.875 訂單詳情表分組后前3組每組的標準差為：counts amounts order_id 1002 0.00000 16.000000 1003 0.46291 21.383822 1004 0.25000 31.195886

代碼 4-59 使用ag方法對分組數據使用不同的聚合函數

print('訂單詳情分組前3組每組菜品總數和售價均值為：\n', detailGroup.agg({'counts':np.sum,'amounts':np.mean}).head(3)) 訂單詳情分組前3組每組菜品總數和售價均值為：counts amounts order_id 1002 7.0 32.000 1003 10.0 30.125 1004 17.0 43.875

4.4.3 使用apply方法聚合數據

代碼 4-60 apply方法的基本用法

print('訂單詳情表的菜品銷量與售價的均值為：\n',detail[['counts','amounts']].apply(np.mean)) 訂單詳情表的菜品銷量與售價的均值為：counts 1.111191 amounts 45.337172 dtype: float64

代碼 4-61 使用apply方法進行聚合操作

print('訂單詳情表分組后前3組每組的均值為：','\n', detailGroup.apply(np.mean).head(3)) print('訂單詳情表分組后前3組每組的標準差為：','\n', detailGroup.apply(np.std).head(3)) 訂單詳情表分組后前3組每組的均值為： order_id counts amounts order_id 1002 1.431572e+26 1.0000 32.000 1003 1.253875e+30 1.2500 30.125 1004 6.275628e+61 1.0625 43.875 訂單詳情表分組后前3組每組的標準差為： counts amounts order_id 1002 0.000000 14.813122 1003 0.433013 20.002734 1004 0.242061 30.205287

使用transform 方法聚合數據

transform方法能夠對整個DataFrame的所有元素進行操作。且transform方法只有一個參數“func”，表示對DataFrame操作的函數。

代碼 4-62 使用transform方法將銷量和售價翻倍

print('訂單詳情表的菜品銷量與售價的兩倍為：\n',detail[['counts','amounts']].transform(lambda x:x*2).head(4)) 訂單詳情表的菜品銷量與售價的兩倍為：counts amounts 0 2.0 98.0 1 2.0 96.0 2 2.0 60.0 3 2.0 50.0

代碼 4-63 使用transform實現組內離差標準化

同時transform方法還能夠對DataFrame分組后的對象GroupBy進行操作，可以實現組內離差標準化等操作。

# print('訂單詳情表分組后實現組內離差標準化后前五行為：\n', # detailGroup.transform(lambda x:(x.mean() # -x.min())/(x.max()-x.min())).head())

若在計算離差標準化的時候結果中有NaN，這是由于根據離差標準化公式，最大值和最小值相同的情況下分母是0。而分母為0的數在Python中表示為NaN。
但是這里會報ZeroDivisionError: float division by zero的bug ,就是分母為0了,奇奇怪怪感覺,書上的運行得了

4.4.5任務實現

代碼 4-64 訂單詳情按照日期分組

detail = pd.read_sql_table('meal_order_detail1',con = engine) detail['place_order_time'] = pd.to_datetime(detail['place_order_time']) detail['date'] = [i.date() for i in detail['place_order_time']] detailGroup = detail[['date','counts','amounts']].groupby(by='date') print('訂單詳情表前5組每組的數目為：\n',detailGroup.size().head()) 訂單詳情表前5組每組的數目為：date 2016-08-01 217 2016-08-02 138 2016-08-03 157 2016-08-04 144 2016-08-05 193 dtype: int64

代碼 4-65 求分組后的訂單詳情表每日菜品銷售的均價,中位數

dayMean = detailGroup.agg({'amounts':np.mean}) print('訂單詳情表前五組每日菜品均價為：\n',dayMean.head())dayMedian = detailGroup.agg({'amounts':np.median}) print('訂單詳情表前五組每日菜品售價中位數為：\n',dayMedian.head()) 訂單詳情表前五組每日菜品均價為：amounts date 2016-08-01 43.161290 2016-08-02 44.384058 2016-08-03 43.885350 2016-08-04 52.423611 2016-08-05 44.927461 訂單詳情表前五組每日菜品售價中位數為：amounts date 2016-08-01 33.0 2016-08-02 35.0 2016-08-03 38.0 2016-08-04 39.0 2016-08-05 37.0

代碼 4-66 求取訂單詳情表中單日菜品總銷量

daySaleSum = detailGroup.apply(np.sum)['counts'] print('訂單詳情表前五組每日菜品售出數目為：\n',daySaleSum.head()) 訂單詳情表前五組每日菜品售出數目為：date 2016-08-01 233.0 2016-08-02 151.0 2016-08-03 192.0 2016-08-04 169.0 2016-08-05 224.0 Name: counts, dtype: float64

總結

以上是生活随笔為你收集整理的Pandas 统计分析基础笔记4 任务4.4 使用分组聚合进行组内计算的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：提取ip和端口
下一篇： mysql 导入导出脚本_MySQL导入