數(shù)據(jù):鏈接:https://pan.baidu.com/s/1AcyhgmkWqdy-K6gvRZaFsg 提取碼:1234 用Pandas揭秘美國選民的總統(tǒng)喜好數(shù)據(jù)分析(數(shù)據(jù)清洗+可視化分析) 一.確定問題:
分析美國選民對總統(tǒng)的喜好.什么喜好?->具體就是反向分析,通過分析選民投票各種分類結(jié)果得到的投票排行,分析各種選民對什么總統(tǒng)更有好感,更偏向于誰
二.分解問題
由一我們可以將問題分為對各種分類結(jié)果的投票排行分析,各種人對美國總統(tǒng)的偏向程度,比如分析投票人的黨派,由黨派投票排行+對應(yīng)候選總統(tǒng)所屬黨派判斷從黨派投票上誰更有優(yōu)勢,當(dāng)然不排除選民是A黨派,但他選了B黨的候選人,但是這種情況畢竟屬于選舉的少數(shù)情況,因為總統(tǒng)所在黨派始終會更有話語權(quán).并且一些關(guān)鍵性法案也會更偏向于總統(tǒng)所在黨派.
三.數(shù)據(jù)分析與可視化處理: 1.獲取數(shù)據(jù): 進(jìn)行數(shù)據(jù)處理前,我們需要知道我們最終想要的數(shù)據(jù)是什么樣的,因為我們是想分析候選人與捐贈人之間的關(guān)系,所以我們想要一張數(shù)據(jù)表中有捐贈人與候選人一一對應(yīng)的關(guān)系,所以需要將目前的三張數(shù)據(jù)表進(jìn)行一一關(guān)聯(lián),匯總到需要的數(shù)據(jù)。
將委員會和候選人一一對應(yīng),通過CAND_ID關(guān)聯(lián)兩個表 將候選人和捐贈人一一對應(yīng),通過CMTE_ID關(guān)聯(lián)兩個表
具體操作如下:
#由于候選人和委員會的聯(lián)系表中無候選人姓名,只有候選人ID(CAND_ID),所以需要通過CAND_ID從候選人表中獲取到候選人姓名,最終得到候選人與委員會聯(lián)系表ccl。
# 導(dǎo)入相關(guān)處理包
import pandas as pd
# 讀取候選人信息,由于原始數(shù)據(jù)沒有表頭,需要添加表頭
#關(guān)聯(lián)weball20
. txt和ccl
. txt
candidates
= pd
. read_csv ( "weball20.txt" , sep
= '|' , names
= [ 'CAND_ID' , 'CAND_NAME' , 'CAND_ICI' , 'PTY_CD' , 'CAND_PTY_AFFILIATION' , 'TTL_RECEIPTS' , 'TRANS_FROM_AUTH' , 'TTL_DISB' , 'TRANS_TO_AUTH' , 'COH_BOP' , 'COH_COP' , 'CAND_CONTRIB' , 'CAND_LOANS' , 'OTHER_LOANS' , 'CAND_LOAN_REPAY' , 'OTHER_LOAN_REPAY' , 'DEBTS_OWED_BY' , 'TTL_INDIV_CONTRIB' , 'CAND_OFFICE_ST' , 'CAND_OFFICE_DISTRICT' , 'SPEC_ELECTION' , 'PRIM_ELECTION' , 'RUN_ELECTION' , 'GEN_ELECTION' , 'GEN_ELECTION_PRECENT' , 'OTHER_POL_CMTE_CONTRIB' , 'POL_PTY_CONTRIB' , 'CVG_END_DT' , 'INDIV_REFUNDS' , 'CMTE_REFUNDS' ] )
# 讀取候選人和委員會的聯(lián)系信息
df_data
= pd
. read_csv ( "ccl.txt" , sep
= '|' , names
= [ 'CAND_ID' , 'CAND_ELECTION_YR' , 'FEC_ELECTION_YR' , 'CMTE_ID' , 'CMTE_TP' , 'CMTE_DSGN' , 'LINKAGE_ID' ] )
# 關(guān)聯(lián)兩個表數(shù)據(jù)
df_data1
= pd
. merge ( df_data1
, candidates
)
# 提取出所需要的列
df_data1
= pd
. DataFrame ( df_data1
, columns
= [ 'CMTE_ID' , 'CAND_ID' , 'CAND_NAME' , 'CAND_PTY_AFFILIATION' ] ) #關(guān)聯(lián)上面合并好的表ccl和表itcont_2020_20200722_20200820
. txt
# 讀取個人捐贈數(shù)據(jù),由于原始數(shù)據(jù)沒有表頭,需要添加表頭
df_data2
= pd
. read_csv ( 'itcont_2020_20200722_20200820.txt' , sep
= '|' , names
= [ 'CMTE_ID' , 'AMNDT_IND' , 'RPT_TP' , 'TRANSACTION_PGI' , 'IMAGE_NUM' , 'TRANSACTION_TP' , 'ENTITY_TP' , 'NAME' , 'CITY' , 'STATE' , 'ZIP_CODE' , 'EMPLOYER' , 'OCCUPATION' , 'TRANSACTION_DT' , 'TRANSACTION_AMT' , 'OTHER_ID' , 'TRAN_ID' , 'FILE_NUM' , 'MEMO_CD' , 'MEMO_TEXT' , 'SUB_ID' ] )
# 將候選人與委員會關(guān)系表ccl和個人捐贈數(shù)據(jù)表itcont合并,通過 CMTE_ID
df_data2
= pd
. merge ( ccl
, itcont
)
# 提取需要的數(shù)據(jù)列
df_data2
= pd
. DataFrame ( df_data2
, columns
= [ 'CAND_NAME' , 'NAME' , 'STATE' , 'EMPLOYER' , 'OCCUPATION' , 'TRANSACTION_AMT' , 'TRANSACTION_DT' , 'CAND_PTY_AFFILIATION' ] )
字段說明:
CAND_NAME – 接受捐贈的候選人姓名
NAME – 捐贈人姓名
STATE – 捐贈人所在州
EMPLOYER – 捐贈人所在公司
OCCUPATION – 捐贈人職業(yè)
TRANSACTION_AMT – 捐贈數(shù)額(美元)
TRANSACTION_DT – 收到捐款的日期
CAND_PTY_AFFILIATION – 候選人黨派
得到的數(shù)據(jù)如下: 2.數(shù)據(jù)處理 獲得了可用的數(shù)據(jù)集,現(xiàn)在我們可以利用調(diào)用shape屬性查看數(shù)據(jù)的規(guī)模,調(diào)用info函數(shù)查看數(shù)據(jù)信息,調(diào)用describe函數(shù)查看數(shù)據(jù)分布。
# 查看數(shù)據(jù)規(guī)模 多少行 多少列
df_data2
. shape
# 查看整體數(shù)據(jù)信息,包括每個字段的名稱、非空數(shù)量、字段的數(shù)據(jù)類型
df_data2
. info ( )
結(jié)果如圖: 由圖可得數(shù)據(jù)中存在缺失值需要處理 通過分析對應(yīng)列的數(shù)據(jù)我們將缺失值賦值為NOT PROVIDED(實際是根據(jù)的實際請況)如其中缺失比較多的OCCUPATION實際上就很多都是平民,沒有實際的工作,所以使用NOT PROVIDED實際上比較合適,EMPLOYER為空很大概率上選民就是boss
#空值處理,統(tǒng)一填充 NOT PROVIDED
df_data2
[ 'STATE' ] . fillna ( 'NOT PROVIDED' , inplace
= True
)
df_data2
[ 'EMPLOYER' ] . fillna ( 'NOT PROVIDED' , inplace
= True
)
df_data2
[ 'OCCUPATION' ] . fillna ( 'NOT PROVIDED' , inplace
= True
)
# 對日期TRANSACTION_DT列進(jìn)行處理
, 即是為了這里的修改值
, 也是為了之后使用sum對捐贈數(shù)求和
df_data2
[ 'TRANSACTION_DT' ] = df_data2
[ 'TRANSACTION_DT' ] . astype ( str
)
# 將日期格式改為年月日
7242020
df_data2
[ 'TRANSACTION_DT' ] = [ i
[ 3 : 7 ] + i
[ 0 ] + i
[ 1 : 3 ] for i in df_data2
[ 'TRANSACTION_DT' ] ]
# 查看數(shù)據(jù)前
3 行
, 判斷是否是我們需要的數(shù)據(jù)
df_data2
. head ( 3 )
結(jié)果如下:
查看每一行的具體數(shù)據(jù)是怎么樣的方便進(jìn)行下一步操作
for i in df_data2
. columns
: print ( df_data2
. groupby ( i
) . size ( ) )
結(jié)果太多不做展示: 大致意思就是,大致瀏覽其中是否有不正常的數(shù)據(jù),比如這次我發(fā)現(xiàn)數(shù)據(jù)中捐款數(shù)存在負(fù)數(shù)的情況,鑒于這是一種不科學(xué)的情況,所以我對捐款單獨處理了一下,將所有為負(fù)的捐款都賦值為0
df_data2
[ 'TRANSACTION_AMT' ] = df_data2
[ 'TRANSACTION_AMT' ] . apply ( lambda x
: x
if x
> 0 else 0 )
3.數(shù)據(jù)可視化分析:
分析各個候選人top25獲得的總票數(shù)-使用詞云圖 分析各個候選人獲得捐款情況 分析各個候選人獲得的總捐款top10和對拜登捐贈最多的10個職業(yè) 分析選民的黨派 分析拜登在各個州獲得的捐贈占比 分析各個州總捐款top10和對應(yīng)的捐贈人數(shù)以及這幾個州的人均捐贈數(shù) 分析過去的一個月中每天捐贈數(shù)變化 分析過去一個月中總捐贈額top10的州每天捐贈流入情況 分析過去一個月中總捐贈額top10的職業(yè)每天捐贈流入情況 分析總捐贈數(shù)top10的天總捐贈人數(shù)和總捐贈數(shù)變化 分析其中產(chǎn)生差錯的原因
1.分析各個候選人top25獲得的總票數(shù)-使用詞云圖 數(shù)據(jù)可視化實際上就分為得到使用數(shù)據(jù)(一般是x,y數(shù)據(jù)我一般使用df_data_x/df_data_y/data_x/data_y的形式)和實際數(shù)據(jù)代入畫圖
from pyecharts import options as opts
#詞云圖
from pyecharts
. charts import WordCloud
, Boxplot
#是基于人數(shù)排名的
y_NAME
= df_data2
. groupby ( 'CAND_NAME' ) . size ( ) . sort_values ( ascending
= False
) . head ( 25 ) . values
. tolist ( )
x_NAME
= df_data2
. groupby ( 'CAND_NAME' ) . size ( ) . sort_values ( ascending
= False
) . head ( 25 ) . index
. tolist ( )
data_111
= list ( zip ( x_NAME
, y_NAME
) )
# print(data_111)
word
= ( WordCloud ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "1000px" , theme
= ThemeType
. DARK
) ) . add ( series_name
= "票選人物TOP25" , data_pair
= data_111
, word_size_range
= [ 20 , 100 ] , shape
= SymbolType
. DIAMOND
) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "ballots_top-25" , title_textstyle_opts
= opts
. TextStyleOpts ( font_size
= 23 ) ) , tooltip_opts
= opts
. TooltipOpts ( is_show
= True
) , )
)
結(jié)果如下:
從圖中我們可以明顯看到拜登比其他候選人在獲得票數(shù)上更有優(yōu)勢,所以我們繼續(xù)分析這些候選人總獲得的捐贈數(shù),因為我們知道美國大選不同與其他國家,大選是十分消耗資金的,所以在資金上有優(yōu)勢,在大選中也會更有優(yōu)勢
2.分析各個候選人獲得捐款情況
Y_container
= [ ]
for i in x_NAME
: #前
25 個人的箱型數(shù)據(jù)Y_container
. append ( df_data2
[ df_data2
. CAND_NAME
== i
] [ "TRANSACTION_AMT" ] . values
. tolist ( ) ) #單獨分析其中比較突出的兩個候選人,我們只關(guān)注給他們捐款的錢數(shù),公司,還有對應(yīng)捐款人的職業(yè)
print ( x_NAME
[ 1 ] + ':' )
print ( df_data2
[ df_data2
. CAND_NAME
== x_NAME
[ 1 ] ] . loc
[ : , [ 'TRANSACTION_AMT' , 'EMPLOYER' , 'OCCUPATION' ] ] . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 25 ) )
print ( df_data2
[ df_data2
. CAND_NAME
== x_NAME
[ 1 ] ] . groupby ( 'EMPLOYER' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 25 ) )
print ( x_NAME
[ 2 ] + ':' )
print ( df_data2
[ df_data2
. CAND_NAME
== x_NAME
[ 2 ] ] . loc
[ : , [ 'TRANSACTION_AMT' , 'EMPLOYER' , 'OCCUPATION' ] ] . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 25 ) )
print ( df_data2
[ df_data2
. CAND_NAME
== x_NAME
[ 2 ] ] . groupby ( 'EMPLOYER' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 25 ) ) box_plot
= Boxplot ( init_opts
= opts
. InitOpts ( width
= "2400px" , height
= "1500px" , theme
= ThemeType
. CHALK
) ) box_plot
= ( box_plot
. add_xaxis ( xaxis_data
= x_NAME
) . add_yaxis ( series_name
= "" , y_axis
= box_plot
. prepare_data ( Y_container
) , itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'blue' ) ) . set_global_opts ( datazoom_opts
= opts
. DataZoomOpts ( is_show
= True
) , title_opts
= opts
. TitleOpts ( pos_left
= "center" , title
= "GAND_NAME_TOP25獲取捐款" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "item" , axis_pointer_type
= "shadow" ) , xaxis_opts
= opts
. AxisOpts ( # type_="category", boundary_gap
= True
, splitarea_opts
= opts
. SplitAreaOpts ( is_show
= False
) , # axislabel_opts=opts.LabelOpts(formatter="expr {value}"), splitline_opts
= opts
. SplitLineOpts ( is_show
= False
) , ) , yaxis_opts
= opts
. AxisOpts ( # type_="value",#值軸,一般默認(rèn)的是類別軸 type_
= 'value' , name
= "所獲捐贈" , splitarea_opts
= opts
. SplitAreaOpts ( is_show
= True
, areastyle_opts
= opts
. AreaStyleOpts ( opacity
= 1 ) ) , ) , ) . set_series_opts ( tooltip_opts
= opts
. TooltipOpts ( formatter
= "{b}: {c}" ) )
)
結(jié)果如下: 可以看到因為第2和第三的最大值很大導(dǎo)致整體數(shù)據(jù)分析不明顯所以我們單獨取出這兩個數(shù)據(jù)并分析其余數(shù)據(jù)的獲捐贈情況 SULLIVAN,DAN:
JACOBS,CHRISTOPHER L.: 刪除異常的兩個數(shù)據(jù):
x_NAME1
= x_NAME
[ 3 : ] . copy ( )
x_NAME1
. insert ( 0 , x_NAME
[ 0 ] )
Y_container1
= Y_container
[ 3 : ] . copy ( )
Y_container1
. insert ( 0 , Y_container
[ 0 ] )
box_plot1
= Boxplot ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "1100px" , theme
= ThemeType
. CHALK
) ) box_plot1
= ( box_plot1
. add_xaxis ( xaxis_data
= x_NAME1
) . add_yaxis ( series_name
= "" , y_axis
= box_plot1
. prepare_data ( Y_container1
) , itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'blue' ) ) . set_global_opts ( datazoom_opts
= opts
. DataZoomOpts ( is_show
= True
) , title_opts
= opts
. TitleOpts ( pos_left
= "center" , title
= "GAND_NAME_TOP25獲取捐款" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "item" , axis_pointer_type
= "shadow" ) , xaxis_opts
= opts
. AxisOpts ( # type_="category", boundary_gap
= True
, splitarea_opts
= opts
. SplitAreaOpts ( is_show
= False
) , # axislabel_opts=opts.LabelOpts(formatter="expr {value}"), splitline_opts
= opts
. SplitLineOpts ( is_show
= False
) , ) , yaxis_opts
= opts
. AxisOpts ( # type_="value",#值軸,一般默認(rèn)的是類別軸 type_
= 'value' , name
= "所獲捐贈" , splitarea_opts
= opts
. SplitAreaOpts ( is_show
= True
, areastyle_opts
= opts
. AreaStyleOpts ( opacity
= 1 ) ) , ) , ) . set_series_opts ( tooltip_opts
= opts
. TooltipOpts ( formatter
= "{b}: {c}" ) )
)
結(jié)果如下:
可以發(fā)現(xiàn)拜登獲得的捐款無論是后50%還是前50%都要比其他候選人小,這不禁讓人失望,因為通過前面的詞云圖分析我們已經(jīng)很看好拜登了,因為它人數(shù)占優(yōu)勢…!!!,或許我們找到了答案 3.分析各個候選人獲得的總捐款top10和對拜登捐贈最多的10個職業(yè) 前面分析我們發(fā)現(xiàn)拜登在獲得捐款的數(shù)據(jù)中拜登一點優(yōu)勢都沒有了,但是我們發(fā)現(xiàn)拜登獲得投票的/捐款的人數(shù)比較多,是否可能最后拜登是量大數(shù)小?帶著疑問開始數(shù)據(jù)分析
# 前面的結(jié)果似乎耐人尋味,因為排名第一的拜登似乎最大值,平均值都沒別人的大
sum_data
= df_data2
. groupby ( 'CAND_NAME' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'TRANSACTION_AMT' , ascending
= True
) [ 'TRANSACTION_AMT' ]
sum_data_y
= sum_data
. values
. tolist ( )
sum_data_x
= sum_data
. index
. tolist ( )
from pyecharts
. charts import Bar
bar_1
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "1700px" , height
= "1000px" , theme
= ThemeType
. CHALK
) ) . add_xaxis ( sum_data_x
) . add_yaxis ( '獲得的捐贈總金額' , sum_data_y
, bar_width
= '40%' ) . reversal_axis ( ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= '獲得捐贈總金額排名前10的候選人' , pos_left
= 'left' ) , legend_opts
= opts
. LegendOpts ( pos_left
= 'left' , is_show
= False
) )
)
# 分析拜登參與投票中的職業(yè)捐款組成
OCCUPATION_data
= df_data2
[ df_data2
. CAND_NAME
== x_NAME
[ 0 ] ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'TRANSACTION_AMT' , ascending
= True
)
OCCUPATION_data_y
= OCCUPATION_data
[ 'TRANSACTION_AMT' ] . values
. tolist ( )
OCCUPATION_data_x
= OCCUPATION_data
. index
. tolist ( )
bar_2
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "1700px" , height
= "1000px" , theme
= ThemeType
. CHALK
) ) . add_xaxis ( OCCUPATION_data_x
) . add_yaxis ( '捐贈總金額' , OCCUPATION_data_y
, bar_width
= '40%' ) . reversal_axis ( ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= '對拜登捐贈總金額排名前10的職業(yè)' , pos_right
= 'center' ) , legend_opts
= opts
. LegendOpts ( pos_right
= 'right' , is_show
= False
) )
)
from pyecharts
. charts import Grid
grid_1
= ( Grid ( init_opts
= opts
. InitOpts ( width
= '1900px' , height
= '1000px' , theme
= ThemeType
. CHALK
) ) . add ( bar_1
, grid_opts
= opts
. GridOpts ( pos_right
= '55%' ) ) . add ( bar_2
, grid_opts
= opts
. GridOpts ( pos_left
= '55%' ) )
)
似乎結(jié)果如我們所想,拜登憑借人多的優(yōu)勢其獲得的捐款也多,然后我分析了對拜登捐款最多的10個職業(yè),發(fā)現(xiàn)更多是來自平民捐款,這也能解釋為什么拜登的平均,最大值都不如其他候選人.也是看出了支持拜登的更多是平民,而非資本家,相反前面兩個異常的數(shù)據(jù)我們可以發(fā)現(xiàn)其捐贈額很大的都是一些公司,boss捐款,因為資本家的投入使得其最大值高,平均值大
4.分析選民的黨派 黨派在選舉上有著決定作用,可以說什么黨的票數(shù)多大概率就對應(yīng)了什么黨派候選人會勝選,當(dāng)然特殊情況是多個候選人分?jǐn)偭诉@個黨派的票數(shù),(盡管可能,但是這里不做考慮,一般都是一邊倒,即使有分?jǐn)傄膊粫霈F(xiàn)大幅度拆分)
Y_CAND_PTY_AFFILIATION
= df_data2
. groupby ( 'CAND_PTY_AFFILIATION' ) . size ( ) . sort_values ( ascending
= False
) . head ( 5 ) . values
. tolist ( )
X_CAND_PTY_AFFILIATION
= df_data2
. groupby ( 'CAND_PTY_AFFILIATION' ) . size ( ) . sort_values ( ascending
= False
) . head ( 5 ) . index
. tolist ( )
from pyecharts
. charts import Bar
Bar_YX
= ( Bar ( init_opts
= opts
. InitOpts ( width
= '1900px' , height
= '1000px' , theme
= ThemeType
. CHALK
) ) . add_xaxis ( X_CAND_PTY_AFFILIATION
) . add_yaxis ( '候選人黨派獲取支持票數(shù)' , Y_CAND_PTY_AFFILIATION
) #
. reversal_axis ( ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= '候選人黨派' ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) )
)
可以看出其中民主黨占優(yōu)勢,為此查詢了下拜登所在黨派: 所以從各黨派參與人數(shù)與候選人誰會勝選確實有關(guān)系 (感興趣的可以使用過去很多年的選舉數(shù)據(jù)在判斷下,這里不做展開)
5.分析拜登在各個州獲得的捐贈占比 主要分析對拜登的捐贈top10的州各自占比,說明拜登獲得捐贈的主力在什么州
data
= df_data2
[ df_data2
[ 'CAND_NAME' ] == 'BIDEN, JOSEPH R JR' ]
data_y
= data
. groupby ( 'STATE' ) . sum ( ) . sort_values ( "TRANSACTION_AMT" , ascending
= False
) . head ( 10 ) [ 'TRANSACTION_AMT' ] . tolist ( )
data_x
= data
. groupby ( 'STATE' ) . sum ( ) . sort_values ( "TRANSACTION_AMT" , ascending
= False
) . head ( 10 ) . index
. tolist ( )
sum1
= sum ( data_y
)
data_y
= [ round ( i
/ sum1
* 100 , 1 ) for i in data_y
]
from pyecharts
. charts import Pie
pie
= ( Pie ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "1100px" , theme
= ThemeType
. PURPLE_PASSION
) ) . add ( "各州捐款數(shù)占比" , [ list ( z
) for z in
zip ( data_x
, data_y
) ] , radius
= [ "40%" , "50%" ] , center
= [ 600 , 450 ] , label_opts
= opts
. LabelOpts ( position
= "outside" , formatter
= "{a|{a}}{abg|}\n{hr|}\n {b|{b}: }{c} {per|ze8trgl8bvbq%} " , background_color
= "#eee" , border_color
= "#aaa" , border_width
= 1 , border_radius
= 4 , rich
= { "a" : { "color" : "#999" , "lineHeight" : 22 , "align" : "center" } , "abg" : { "backgroundColor" : "#e3e3e3" , "width" : "100%" , "align" : "right" , "height" : 22 , "borderRadius" : [ 4 , 4 , 0 , 0 ] , } , "hr" : { "borderColor" : "#aaa" , "width" : "100%" , "borderWidth" : 0.5 , "height" : 0 , } , "b" : { "fontSize" : 16 , "lineHeight" : 33 } , "per" : { "color" : "#eee" , "backgroundColor" : "#334455" , "padding" : [ 2 , 4 ] , "borderRadius" : 2 , } , } , ) , )
)
結(jié)果如下: 加利福尼亞州,內(nèi)華達(dá)州是拜登的捐贈主力
6.分析各個州總捐款top10和對應(yīng)的捐贈人數(shù)以及這幾個州的人均捐贈數(shù) 分析各州參與選舉的積極度和總捐款top10的各個州的人均捐款數(shù)
from pyecharts
. charts import Bar
from pyecharts import options as opts
y
= df_data2
. groupby ( 'STATE' ) . sum ( ) . sort_values ( "TRANSACTION_AMT" , ascending
= False
) . head ( 10 ) [ 'TRANSACTION_AMT' ] . tolist ( )
y
= [ round ( i
/ 1e7 , 3 ) for i in y
]
# print(y)
x
= df_data2
. groupby ( 'STATE' ) . sum ( ) . sort_values ( "TRANSACTION_AMT" , ascending
= False
) . head ( 10 ) . index
. tolist ( )
c1
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "800px" , height
= "600px" ) ) . add_xaxis ( x
) . add_yaxis ( 'TRANSACTION_AMT' , y
, itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'Teal' ) , label_opts
= opts
. LabelOpts ( is_show
= True
, position
= 'top' , formatter
= "{c}" , color
= 'cyan' ) ) #
. reversal_axis ( ) . set_global_opts ( legend_opts
= opts
. LegendOpts ( pos_left
= 'left' , pos_top
= 'top' ) , xaxis_opts
= opts
. AxisOpts ( name
= 'STATE' , name_location
= 'middle' , name_gap
= 30 , # 與x軸線的距離name_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Microsoft Yahei' , font_size
= 20 , ) , axistick_opts
= opts
. AxisTickOpts ( is_show
= True
, # is_show=False, # 是否顯示 is_inside
= True
, # 刻度線是否在內(nèi)側(cè)
) , axisline_opts
= opts
. AxisLineOpts ( symbol
= 'none' , linestyle_opts
= opts
. LineStyleOpts ( width
= 1 , color
= 'black' , ) ) , axislabel_opts
= opts
. LabelOpts ( rotate
= 40 , font_size
= 12 , font_family
= 'Arial' , font_weight
= 'bold' ) , ) , yaxis_opts
= opts
. AxisOpts ( name
= 'TRANSACTION_AMT_SUM(/1e7)' , name_location
= 'middle' , name_gap
= 60 , min_
= 0 , max_
= 3 , name_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Times New Roman' , font_size
= 20 , color
= 'black' , # font_weight='bolder', ) , axistick_opts
= opts
. AxisTickOpts ( is_show
= False
, # 是否顯示is_inside
= True
, # 刻度線是否在內(nèi)側(cè)
) , axislabel_opts
= opts
. LabelOpts ( font_size
= 12 , font_family
= 'Times New Roman' , formatter
= "{value}" # y軸顯示方式以數(shù)據(jù)形式
) , splitline_opts
= opts
. SplitLineOpts ( is_show
= True
, linestyle_opts
= opts
. LineStyleOpts ( color
= 'black' ) ) , # y軸網(wǎng)格線axisline_opts
= opts
. AxisLineOpts ( is_show
= False
) , # y軸線
) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , ) . set_series_opts ( markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( type_
= 'min' , name
= '最小值' ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) )
) x1
= df_data2
. groupby ( 'STATE' ) . size ( ) . sort_values ( ascending
= False
) . head ( 10 ) . index
. tolist ( )
y1
= df_data2
. groupby ( 'STATE' ) . size ( ) . sort_values ( ascending
= False
) . head ( 10 ) . tolist ( )
c2
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "800px" , height
= "600px" ) ) . add_xaxis ( x1
) . add_yaxis ( 'TRANSACTION' , y1
, itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'Navy' ) , label_opts
= opts
. LabelOpts ( color
= 'cyan' ) ) #
. reversal_axis ( ) . set_global_opts ( datazoom_opts
= opts
. DataZoomOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( name
= 'STATE' , name_location
= 'middle' , name_gap
= 30 , # 與x軸線的距離
# name_Rorate設(shè)置旋轉(zhuǎn)角度 # x軸名稱的格式配置 name_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Microsoft Yahei' , font_size
= 20 , ) , # 坐標(biāo)軸刻度配置項axistick_opts
= opts
. AxisTickOpts ( is_show
= True
, # is_show=False, # 是否顯示 is_inside
= True
, # 刻度線是否在內(nèi)側(cè)
) , # 坐標(biāo)軸線的配置axisline_opts
= opts
. AxisLineOpts ( linestyle_opts
= opts
. LineStyleOpts ( width
= 1 , color
= 'black' , ) ) , axislabel_opts
= opts
. LabelOpts ( rotate
= 40 , font_size
= 12 , font_family
= 'Arial' , font_weight
= 'bold' ) , ) , yaxis_opts
= opts
. AxisOpts ( name
= 'TRANSACTION_PERSON_NUMBER' , name_location
= 'middle' , name_gap
= 50 , name_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Times New Roman' , font_size
= 20 , color
= 'black' , # font_weight='bolder', ) , axistick_opts
= opts
. AxisTickOpts ( is_show
= False
, # 是否顯示is_inside
= True
, # 刻度線是否在內(nèi)側(cè)
) , axislabel_opts
= opts
. LabelOpts ( font_size
= 12 , font_family
= 'Times New Roman' , formatter
= "{value}" # y軸顯示方式以數(shù)據(jù)形式
) , splitline_opts
= opts
. SplitLineOpts ( is_show
= True
, linestyle_opts
= opts
. LineStyleOpts ( color
= 'black' ) ) , # y軸網(wǎng)格線axisline_opts
= opts
. AxisLineOpts ( is_show
= False
) , # y軸線
) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , ) . set_series_opts ( markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( type_
= 'min' , name
= '最小值' ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) )
)
from pyecharts
. charts import Line
data_1
= df_data2
. groupby ( 'STATE' ) . size ( ) . sort_values ( ascending
= False
) . head ( 10 ) . index
. tolist ( )
data_2
= df_data2
. groupby ( 'STATE' ) . sum ( ) [ 'TRANSACTION_AMT' ]
data_2
= [ data_2
[ i
] for i in data_1
]
p
= sum ( data_2
)
data_2
= [ round ( i
/ p
* 100 , 2 ) for i in data_2
] line_1
= ( Line ( init_opts
= opts
. InitOpts ( width
= "800px" , height
= "500px" , theme
= ThemeType
. PURPLE_PASSION
) ) . add_xaxis ( data_1
) . add_yaxis ( 'TRANSACTION_ATtop10_AVERAGE' , y_axis
= data_2
, ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "捐贈人數(shù)排行前10的人均捐贈數(shù)目" , pos_left
= 'left' , pos_top
= 'center' , title_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Times New Roman' , font_size
= 20 , color
= 'black' , ) ) , #一定程度上反應(yīng)了州的經(jīng)濟(jì)水平tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( name
= 'STATE' , name_location
= 'middle' , name_gap
= 50 , type_
= "category" , boundary_gap
= False
, name_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Times New Roman' , font_size
= 20 , color
= 'black' , ) , axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , legend_opts
= opts
. LegendOpts ( is_show
= False
) ,
yaxis_opts
= opts
. AxisOpts ( name
= 'TRANSACTION_PERSON_NUMBER' , name_location
= 'middle' , name_gap
= 50 , name_textstyle_opts
= opts
. TextStyleOpts ( font_family
= 'Times New Roman' , font_size
= 20 , color
= 'black' , # font_weight='bolder', ) , axistick_opts
= opts
. AxisTickOpts ( is_show
= False
, # 是否顯示is_inside
= True
, # 刻度線是否在內(nèi)側(cè)
) , axislabel_opts
= opts
. LabelOpts ( font_size
= 12 , font_family
= 'Times New Roman' , formatter
= "{value}" # y軸顯示方式以數(shù)據(jù)形式
) , axisline_opts
= opts
. AxisLineOpts ( is_show
= False
) , # y軸線
) ) . set_series_opts ( markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( type_
= 'min' , name
= '最小值' ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) )
) from pyecharts
. charts import Grid
p1
= ( Grid ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "1400px" , theme
= ThemeType
. CHALK
) ) # Page(layout=Page.SimplePageLayout) . add ( c1
, grid_opts
= opts
. GridOpts ( pos_bottom
= '60%' , pos_right
= '55%' ) ) . add ( c2
, grid_opts
= opts
. GridOpts ( pos_bottom
= '60%' , pos_left
= '55%' ) ) . add ( line_1
, grid_opts
= opts
. GridOpts ( pos_top
= '60%' , pos_right
= '50%' ) )
)
可以看出加利福尼亞州既是參與人數(shù)最多的也是捐款人數(shù)top10的州中人均捐款最多的,側(cè)面看出加利福尼亞州經(jīng)濟(jì)應(yīng)該比較好,于是我查詢了2020年美國各州的GDP數(shù)據(jù),發(fā)現(xiàn)加利福利亞州確實始終是美國經(jīng)濟(jì)最好的地方.其實我們熟知的硅谷就在加利福利亞州.另外很多世界TOP企業(yè)也在加利福利亞,列如APPLE,FACEBOOK,谷歌等等 另外我們分析到的數(shù)據(jù)中排名第二的紐約州也在GDPtop3的位置,我們也注意到捐贈數(shù)多的不一定經(jīng)濟(jì)發(fā)展好很可能只是選民參與的積極程度比較高 所以通過人均捐款數(shù)判斷州的經(jīng)濟(jì)情況大概率上是可行的 (這里還可做個補充就是再單獨對拜登的分析) 7.分析過去的一個月中每天捐贈數(shù)變化
from pyecharts
. charts import Calendar
import datetime
from pyecharts import options as optsdata_dete
= df_data2
. groupby ( 'TRANSACTION_DT' ) . sum ( ) [ 'TRANSACTION_AMT' ] . values
. tolist ( )
begin
= datetime
. date ( 2020 , 7 , 22 )
end
= datetime
. date ( 2020 , 8 , 20 )
data
= [ [ str ( begin
+ datetime
. timedelta ( days
= i
) ) , data_dete
[ i
] ] for i in
range ( ( end
- begin
) . days
+ 1 )
]
import numpy as np
split
= np
. linspace ( min ( data_dete
) , max ( data_dete
) , 8 )
end_split
= [ ]
for i in
range ( 0 , 7 ) : end_split
. append ( { 'min' : split
[ i
] , 'max' : split
[ i
+ 1 ] } )
calendar
= ( Calendar ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "800px" , theme
= ThemeType
. PURPLE_PASSION
) ) . add ( series_name
= "" , yaxis_data
= data
, calendar_opts
= opts
. CalendarOpts ( pos_top
= "120" , pos_left
= "30" , pos_right
= "30" , range_
= "2020" , yearlabel_opts
= opts
. CalendarYearLabelOpts ( is_show
= False
) , ) , # label_opts=opts.LabelOpts(color='purple') ) . set_global_opts ( title_opts
= opts
. TitleOpts ( pos_top
= "30" , pos_left
= "center" , title
= "過去一個月每天獲得的捐款數(shù)" ) , visualmap_opts
= opts
. VisualMapOpts ( max_
= max ( data_dete
) , min_
= min ( data_dete
) , orient
= "horizontal" , is_piecewise
= True
, pos_top
= "center" , pos_left
= "center" , # is_inverse=True, pieces
= end_split
, textstyle_opts
= opts
. TextStyleOpts ( color
= 'write' , font_size
= 20 , font_family
= 'Microsoft YaHei' ) , item_height
= 20 , item_width
= 30 ) , )
)
可以看到最后一天是捐款數(shù)最多的一天,前面都比較平淡,這個大概率是情緒影響的,因為這一天是決定選舉結(jié)果的一天,不止是候選人,還有選民都會比較感性.
8.分析過去一個月中總捐贈額top10的州每天捐贈流入情況
from pyecharts
. charts import ThemeRiver
river
= ( ThemeRiver ( init_opts
= opts
. InitOpts ( width
= '1900px' , height
= '1000px' , theme
= ThemeType
. CHALK
) ) . add ( series_name
= OCCUPATION_top10
, data
= dete_occupation_y
, singleaxis_opts
= opts
. SingleAxisOpts ( pos_top
= "50" , pos_bottom
= "50" , type_
= "time" ) ,
) . set_global_opts ( tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" , axis_pointer_type
= "line" ) , title_opts
= opts
. TitleOpts ( title
= '總捐贈top10職業(yè)每天的捐贈流入' ) , datazoom_opts
= opts
. DataZoomOpts ( is_show
= True
) , )
)
其中7-27號紐約州捐款變化很大,猜測有較大概率是候選人在這個州拉票,或者發(fā)生了一些特殊的事情 于是我查詢資料發(fā)現(xiàn),作為候選人之一的特朗普就是紐約州出生的,并且7-27號是共和黨全國代表大會的最后一天,想必作為特朗普的出生地紐約,在這一天選民也是比較激動的,所以出現(xiàn)大的波動很正常
9.分析過去一個月中總捐贈額top10的職業(yè)每天捐贈流入情況
river1
= ( ThemeRiver ( init_opts
= opts
. InitOpts ( width
= '1900px' , height
= '1000px' , theme
= ThemeType
. CHALK
) ) . add ( series_name
= STATE_top10
, data
= dete_state_y
, singleaxis_opts
= opts
. SingleAxisOpts ( pos_top
= "50" , pos_bottom
= "50" , type_
= "time" ) ,
) . set_global_opts ( tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" , axis_pointer_type
= "line" ) , title_opts
= opts
. TitleOpts ( title
= '總捐贈top10的州每天的捐贈流入' ) , datazoom_opts
= opts
. DataZoomOpts ( is_show
= True
) , )
)
同樣發(fā)現(xiàn)27號這一天也是職業(yè)捐助不平常的一天,FOUNDER(創(chuàng)始人,這里看為資本家吧,畢竟特朗普也是商人)
10.分析總捐贈數(shù)top10的日期總捐贈人數(shù)和總捐贈數(shù)變化
# #先取出數(shù)據(jù)量前
10 的,然后再對日期排序
data_sum_y
= df_data2
. groupby ( 'TRANSACTION_DT' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= True
) [ 'TRANSACTION_AMT' ] . head ( 10 ) . values
. tolist ( )
data_personsum_y
= df_data2
. groupby ( 'TRANSACTION_DT' ) . size ( ) . sort_values ( ascending
= True
) . head ( 10 ) . values
. tolist ( )
data_sum_x
= df_data2
. groupby ( 'TRANSACTION_DT' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= True
) . head ( 10 ) . index
. tolist ( )
data_personsum_x
= df_data2
. groupby ( 'TRANSACTION_DT' ) . size ( ) . sort_values ( ascending
= True
) . head ( 10 ) . index
. tolist ( )
ans
= list ( zip ( data_sum_x
, data_sum_y
) )
ans1
= list ( zip ( data_personsum_x
, data_personsum_y
) )
ans
= sorted ( ans
, key
= ( lambda x
: x
[ 0 ] ) )
ans1
= sorted ( ans1
, key
= ( lambda x
: x
[ 0 ] ) )
data_sum_x
= [ i
[ 0 ] for i in ans
]
data_personsum_x
= [ i
[ 0 ] for i in ans1
]
data_sum_y
= [ i
[ 1 ] for i in ans
]
data_personsum_y
= [ i
[ 1 ] for i in ans1
]
#
from pyecharts.charts import Line
line
= ( Line ( init_opts
= opts
. InitOpts ( width
= "1500px" , height
= "380px" , theme
= ThemeType
. CHALK
) ) . add_xaxis ( data_sum_x
) . add_yaxis ( 'data_personsum' , y_axis
= data_personsum_y
, markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( yaxis_opts
= opts
. AxisOpts ( min_
= 13000 ) , title_opts
= opts
. TitleOpts ( title
= "過去某10天總捐款人數(shù)/總捐款數(shù)" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( type_
= "category" , boundary_gap
= False
) , )
)
line1
= ( Line ( init_opts
= opts
. InitOpts ( width
= "1500px" , height
= "380px" ) ) . add_xaxis ( data_sum_x
) . add_yaxis ( 'data_sum' , y_axis
= data_sum_y
, markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( yaxis_opts
= opts
. AxisOpts ( min_
= 1300000 ) , # title_opts=opts.TitleOpts(title="過去某10天總捐款情況"), tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( type_
= "category" , boundary_gap
= False
) , )
)
from pyecharts
. charts import Grid
page
= ( #page是最上層的圖表不能嵌入到其他圖表之中
# Page() Grid ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "1000px" , theme
= ThemeType
. CHALK
) ) . add ( line
, grid_opts
= opts
. GridOpts ( pos_bottom
= '55%' ) ) . add ( line1
, grid_opts
= opts
. GridOpts ( pos_top
= '55%' ) )
)
如果如下: 發(fā)現(xiàn)異常數(shù)據(jù)8.3號和8.8號這兩天捐贈人數(shù)和捐贈數(shù)成反比例分布 所以我們單獨對8.3號和8.8號這兩天單獨分析 11.分析其中產(chǎn)生差錯的原因
#一般認(rèn)為捐款的人數(shù)越多其捐款總數(shù)就會越大但是發(fā)現(xiàn)有兩天人數(shù)與捐款數(shù)差距,所以我們具體分析這兩天,并加入
#一般的一天作為對照組
2020803 2020808 + 2020725 由分析各個屬性最有可能就是捐贈人職業(yè)的影響
y11
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020803' ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'OCCUPATION' ) [ 'TRANSACTION_AMT' ] . values
. tolist ( )
x11
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020803' ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'OCCUPATION' ) [ 'TRANSACTION_AMT' ] . index
. tolist ( )
y11_person
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020803' ] . groupby ( 'OCCUPATION' ) . size ( )
sum_y11
= sum ( y11
) y11_bi
= [ round ( i
/ sum_y11
* 100 / y11_person
[ x11
[ j
] ] , 4 ) for j
, i in
enumerate ( y11
) ] #每個職業(yè)在總金額的占比y12
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020808' ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'OCCUPATION' ) [ 'TRANSACTION_AMT' ] . values
. tolist ( )
x12
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020808' ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'OCCUPATION' ) [ 'TRANSACTION_AMT' ] . index
. tolist ( )
y12_person
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020808' ] . groupby ( 'OCCUPATION' ) . size ( )
sum_y12
= sum ( y12
)
y12_bi
= [ round ( i
/ sum_y12
* 100 / y12_person
[ x12
[ j
] ] , 4 ) for j
, i in
enumerate ( y12
) ] y13
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020725' ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'OCCUPATION' ) [ 'TRANSACTION_AMT' ] . values
. tolist ( )
x13
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020725' ] . groupby ( 'OCCUPATION' ) . sum ( ) . sort_values ( 'TRANSACTION_AMT' , ascending
= False
) . head ( 10 ) . sort_values ( 'OCCUPATION' ) [ 'TRANSACTION_AMT' ] . index
. tolist ( )
y13_person
= df_data2
[ df_data2
. TRANSACTION_DT
== '2020725' ] . groupby ( 'OCCUPATION' ) . size ( )
sum_y13
= sum ( y13
)
y13_bi
= [ round ( i
/ sum_y13
* 100 / y13_person
[ x13
[ j
] ] , 4 ) for j
, i in
enumerate ( y13
) ] # 做柱形圖
l11
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "400px" , theme
= ThemeType
. DARK
) ) . add_xaxis ( x11
) . add_yaxis ( 'data_personsum' , y_axis
= y11
, bar_width
= '50%' ,
itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'blue' ) , markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "過去某10天總捐款人數(shù)2020-08-03" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , )
)
l21
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "400px" , theme
= ThemeType
. DARK
) ) . add_xaxis ( x12
) . add_yaxis ( 'data_personsum' , y_axis
= y12
, bar_width
= '50%' ,
itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'red' ) , markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "過去某10天總捐款人數(shù)2020-08-08" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , )
)
l31
= ( Bar ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "400px" , theme
= ThemeType
. DARK
) ) . add_xaxis ( x13
) . add_yaxis ( 'data_personsum' , y_axis
= y13
, bar_width
= '50%' ,
itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'green' ) , # yaxis=opts.AxisOpts(min_=1300000), markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "過去某10天總捐款人數(shù)2020-07-25" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , )
)
l1
= ( Line ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "400px" , theme
= ThemeType
. DARK
) ) . add_xaxis ( x11
) . add_yaxis ( 'data_personsum' , y_axis
= y11_bi
, itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'blue' ) , markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "過去某10天 總捐款額前10職業(yè)的個人占比2020-08-03" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , )
)
l2
= ( Line ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "400px" , theme
= ThemeType
. DARK
) ) . add_xaxis ( x12
) . add_yaxis ( 'data_personsum' , y_axis
= y12_bi
, itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'red' ) , markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts (
# yaxis_opts=opts.AxisOpts(max_=1.5), title_opts
= opts
. TitleOpts ( title
= "過去某10天 總捐款額前10職業(yè)的個人占比2020-08-08" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , )
)
l3
= ( Line ( init_opts
= opts
. InitOpts ( width
= "1900px" , height
= "400px" , theme
= ThemeType
. DARK
) ) . add_xaxis ( x13
) . add_yaxis ( 'data_personsum' , y_axis
= y13_bi
, itemstyle_opts
= opts
. ItemStyleOpts ( color
= 'green' ) , # yaxis=opts.AxisOpts(min_=1300000), markline_opts
= opts
. MarkLineOpts ( data
= [ opts
. MarkLineItem ( type_
= "average" , name
= "平均值" ) , opts
. MarkLineItem ( symbol
= "none" , x
= "90%" , y
= "max" ) , opts
. MarkLineItem ( symbol
= "circle" , type_
= "max" , name
= "最高點" ) , ] ) , ) . set_global_opts ( title_opts
= opts
. TitleOpts ( title
= "過去某10天 總捐款額前10職業(yè)的個人占比2020-07-25" ) , tooltip_opts
= opts
. TooltipOpts ( trigger
= "axis" ) , toolbox_opts
= opts
. ToolboxOpts ( is_show
= True
) , xaxis_opts
= opts
. AxisOpts ( axislabel_opts
= opts
. LabelOpts ( rotate
= 40 ) ) , )
)
page1
= ( Page ( ) . add ( l11
, l21
, l31
)
)
page2
= ( Page ( ) . add ( l1
, l2
, l3
)
)
結(jié)果如下: 分析圖1我們發(fā)現(xiàn)其中8-3號這天,主力職業(yè)(RETIRED,NOT EMPLOYED)流入增加,并結(jié)合圖二發(fā)現(xiàn)這一天還有高薪職業(yè)的流入,比如 PRIVATE EQUITY單個人就占這一天的1.48%,所以我們猜測這一天還有一些高薪職業(yè)的流入所以導(dǎo)致其人數(shù)減少捐贈數(shù)增加,相反8-8號這天,主力職業(yè)減少,并且相對正常的7-25號各職業(yè)個人占比都沒有特別明顯增加,使得這一天雖然人數(shù)增加,但是總捐贈減少.
END… 創(chuàng)作不易留下你的三連吧!
總結(jié)
以上是生活随笔 為你收集整理的数据分析-pands分析美国选民对总统的喜好(python实现) 的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔 網(wǎng)站內(nèi)容還不錯,歡迎將生活随笔 推薦給好友。