20180402-B · US Tuition Costs · ggplot2, 条线图 柱状图 · R 语言数据可视化 案例 源码
生活随笔
收集整理的這篇文章主要介紹了
20180402-B · US Tuition Costs · ggplot2, 条线图 柱状图 · R 语言数据可视化 案例 源码
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
所有作品合集傳送門: Tidy Tuesday
2018 年合集傳送門: 2018
US Tuition Costs
Average Tuition and Educational Attainment in the United States。
Tidy Tuesday 在 GitHub 上的傳送地址:
Thomas Mock (2022). Tidy Tuesday: A weekly data project aimed at the R ecosystem. https://github.com/rfordatascience/tidytuesday
1. 一些環(huán)境設置
# 設置為國內鏡像, 方便快速安裝模塊 options("repos" = c(CRAN = "https://mirrors.tuna.tsinghua.edu.cn/CRAN/"))2. 設置工作路徑
wkdir <- '/home/user/R_workdir/TidyTuesday/2018/2018-04-02_US_Tuition_Costs/src-b' setwd(wkdir)3. 加載 R 包
library(tidyverse) library(lubridate) library(showtext) # 在 Ubuntu 系統上測試的, 不加這個我畫出來的漢字會亂碼 ~ showtext_auto()4. 加載數據
df_input <- readxl::read_excel("../data/us_avg_tuition.xlsx")# 簡要查看數據內容 glimpse(df_input) ## Rows: 50 ## Columns: 13 ## $ State <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", "C…albert ## $ `2004-05` <dbl> 5682.838, 4328.281, 5138.495, 5772.302, 5285.921, 4703.777, … ## $ `2005-06` <dbl> 5840.550, 4632.623, 5415.516, 6082.379, 5527.881, 5406.967, … ## $ `2006-07` <dbl> 5753.496, 4918.501, 5481.419, 6231.977, 5334.826, 5596.348, … ## $ `2007-08` <dbl> 6008.169, 5069.822, 5681.638, 6414.900, 5672.472, 6227.002, … ## $ `2008-09` <dbl> 6475.092, 5075.482, 6058.464, 6416.503, 5897.888, 6284.137, … ## $ `2009-10` <dbl> 7188.954, 5454.607, 7263.204, 6627.092, 7258.771, 6948.473, … ## $ `2010-11` <dbl> 8071.134, 5759.153, 8839.605, 6900.912, 8193.739, 7748.201, … ## $ `2011-12` <dbl> 8451.902, 5762.421, 9966.716, 7028.991, 9436.426, 8315.632, … ## $ `2012-13` <dbl> 9098.069, 6026.143, 10133.503, 7286.580, 9360.574, 8792.856,… ## $ `2013-14` <dbl> 9358.929, 6012.445, 10296.200, 7408.495, 9274.193, 9292.954,… ## $ `2014-15` <dbl> 9496.084, 6148.808, 10413.844, 7606.410, 9186.824, 9298.599,… ## $ `2015-16` <dbl> 9751.101, 6571.340, 10646.278, 7867.297, 9269.844, 9748.188,… # 檢查數據的列名 colnames(df_input) ## [1] "State" "2004-05" "2005-06" "2006-07" "2007-08" "2008-09" "2009-10" ## [8] "2010-11" "2011-12" "2012-13" "2013-14" "2014-15" "2015-16"5. 數據預處理
# 將列名簡化, 方便繪圖展示 df_tidy <- gather(df_input, key = Year, value = Tuition, -State)# 調整年份為 YYYY 格式 df_tidy <- df_tidy %>% mutate(Year = str_sub(Year, start = 1, end = 4))5. 條形圖 (一)
5.1 以 state 為分組取平均值
df_plot_1 <-df_tidy %>% select(State, Tuition) %>% group_by(State) %>% # 最好使用 dplyr::summarise() 形式調用函數, 不然容易和內置函數沖突dplyr::summarise(Mean_State = mean(Tuition)) %>% arrange(desc(Mean_State))5.2. 用 ggplot2 開始繪圖
# PS: 方便講解, 我這里進行了拆解, 具體使用時可以組合在一起 gg <- ggplot(df_plot_1) # geom_bar() 繪制條形圖 gg <- gg + geom_bar(aes(x = reorder(State, Mean_State), y = Mean_State), stat = "identity", position = "dodge", fill = "cadetblue3") # coord_flip() 倒置坐標系 gg <- gg + coord_flip() # theme_minimal() 去坐標軸邊框的最小化主題 gg <- gg + theme_minimal() gg <- gg + labs(title = "美國各個州的大學學費變化",x = NULL,y = NULL) # theme() 實現對非數據元素的調整, 對結果進行進一步渲染, 使之更加美觀 gg <- gg + theme(# panel.grid.major 主網格線, 這一步表示刪除主要網格線panel.grid.major = element_blank(),# panel.grid.minor 次網格線, 這一步表示刪除次要網格線panel.grid.minor = element_blank(),# axis.text 坐標軸刻度文本axis.text = element_text(color = "black", size = 12),# axis.title 坐標軸標題axis.title = element_text(color = "black", size = 10),# plot.title 主標題plot.title = element_text(color = "black", size = 20, face = "bold"),# plot.subtitle 次要標題plot.subtitle = element_text(color = "red", size = 12),# plot.background 圖片背景plot.background = element_rect(fill = "white"))5.3 保存圖片到 PDF 和 PNG
gg filename = '20180402-B-01' ggsave(filename = paste0(filename, ".pdf"), width = 8.6, height = 12, device = cairo_pdf) ggsave(filename = paste0(filename, ".png"), width = 8.6, height = 12, dpi = 100, device = "png")6. 條形圖 (二)
6.1 以 state 為分組取平均值, 并分類
# 以 state 為分組取平均值, 并以總均值劃分為兩類 df_plot_2 <-df_tidy %>% group_by(State) %>% # 建議使用 dplyr::mutate 形式調用函數, 不然容易與 plyr 中的函數沖突 (因為我自己就報錯了...)dplyr::mutate(Mean_State = mean(Tuition)) %>% ungroup() %>% dplyr::mutate(Classify = Mean_State - mean(Mean_State)) %>% dplyr::mutate(Category = ifelse(Classify < 0, "低于平均值", "高于平均值")) %>% arrange(desc(Classify)) %>% select(State, Classify, Category) %>% distinct()6.2. 用 ggplot2 開始繪圖
# PS: 方便講解, 我這里進行了拆解, 具體使用時可以組合在一起 gg <- ggplot(df_plot_2) # geom_bar() 繪制條形圖 gg <- gg + geom_bar(aes(x = reorder(State, -Classify), y = Classify, fill = Category),stat = "identity",position = "dodge") gg <- gg + geom_hline(yintercept = 0, linetype = 2) # coord_flip() 倒置坐標系 gg <- gg + coord_flip() gg <- gg + scale_fill_brewer() # theme_minimal() 去坐標軸邊框的最小化主題 gg <- gg + theme_minimal() gg <- gg + labs(title = "美國各個州的大學學費變化",x = NULL,y = NULL) # theme() 實現對非數據元素的調整, 對結果進行進一步渲染, 使之更加美觀 gg <- gg + theme(# panel.grid.major 主網格線, 這一步表示刪除主要網格線panel.grid.major = element_blank(),# panel.grid.minor 次網格線, 這一步表示刪除次要網格線panel.grid.minor = element_blank(),# axis.text 坐標軸刻度文本axis.text = element_text(color = "black", size = 12),# axis.title 坐標軸標題axis.title = element_text(color = "black", size = 10),# plot.title 主標題plot.title = element_text(color = "black", size = 20, face = "bold"),# plot.subtitle 次要標題plot.subtitle = element_text(color = "red", size = 12),# plot.background 圖片背景plot.background = element_rect(fill = "white"))6.3 保存圖片到 PDF 和 PNG
gg filename = '20180402-B-02' ggsave(filename = paste0(filename, ".pdf"), width = 8.6, height = 12, device = cairo_pdf) ggsave(filename = paste0(filename, ".png"), width = 8.6, height = 12, dpi = 100, device = "png")7. session-info
sessionInfo() ## R version 4.2.1 (2022-06-23) ## Platform: x86_64-pc-linux-gnu (64-bit) ## Running under: Ubuntu 20.04.4 LTS ## ## Matrix products: default ## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 ## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] showtext_0.9-5 showtextdb_3.0 sysfonts_0.8.8 lubridate_1.8.0 ## [5] forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 ## [9] readr_2.1.2 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.3.6 ## [13] tidyverse_1.3.2 ## ## loaded via a namespace (and not attached): ## [1] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 ## [4] R6_2.5.1 cellranger_1.1.0 backports_1.4.1 ## [7] reprex_2.0.2 evaluate_0.16 highr_0.9 ## [10] httr_1.4.4 pillar_1.8.1 rlang_1.0.5 ## [13] googlesheets4_1.0.1 readxl_1.4.1 rstudioapi_0.14 ## [16] jquerylib_0.1.4 rmarkdown_2.16 textshaping_0.3.6 ## [19] labeling_0.4.2 googledrive_2.0.0 munsell_0.5.0 ## [22] broom_1.0.1 compiler_4.2.1 modelr_0.1.9 ## [25] xfun_0.32 systemfonts_1.0.4 pkgconfig_2.0.3 ## [28] htmltools_0.5.3 tidyselect_1.1.2 fansi_1.0.3 ## [31] crayon_1.5.1 tzdb_0.3.0 dbplyr_2.2.1 ## [34] withr_2.5.0 grid_4.2.1 jsonlite_1.8.0 ## [37] gtable_0.3.1 lifecycle_1.0.1 DBI_1.1.3 ## [40] magrittr_2.0.3 scales_1.2.1 cli_3.3.0 ## [43] stringi_1.7.8 cachem_1.0.6 farver_2.1.1 ## [46] fs_1.5.2 xml2_1.3.3 bslib_0.4.0 ## [49] ragg_1.2.3 ellipsis_0.3.2 generics_0.1.3 ## [52] vctrs_0.4.1 RColorBrewer_1.1-3 tools_4.2.1 ## [55] glue_1.6.2 hms_1.1.2 fastmap_1.1.0 ## [58] yaml_2.3.5 colorspace_2.0-3 gargle_1.2.1 ## [61] rvest_1.0.3 knitr_1.40 haven_2.5.1 ## [64] sass_0.4.2測試數據
配套數據下載:us_avg_tuition.xlsx
總結
以上是生活随笔為你收集整理的20180402-B · US Tuition Costs · ggplot2, 条线图 柱状图 · R 语言数据可视化 案例 源码的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UnityShader 图片或者颜色混合
- 下一篇: 光一个html文件能加图片吗,网站页面设