R语言“优雅地“进行医学统计分析
本文首發于公眾號:醫學和生信筆記,完美觀看體驗請至公眾號查看本文。
醫學和生信筆記,專注R語言在臨床醫學中的使用,R語言數據分析和可視化。
文章目錄
- 主要函數
- 描述性統計
- 比較均值
- 增強R中的ANOVA
- 事后檢驗(post-hoc)
- 比較比例
- 比較方差
- 計算效應量
- 相關性分析
- 計算相關性
- 重塑相關矩陣
- 相關矩陣取子集
- 可視化相關矩陣
- 添加P值和顯著性標記
- 提取統計信息
- 數據處理輔助函數
- 其他
- 安裝和加載
- 描述性統計
- t檢驗
- 單樣本t檢驗
- 配對t檢驗
- 兩樣本t檢驗
- 分組后進行比較
- 多組間的兩兩比較
- 方差分析
- 完全隨機設計方差分析
- 隨機區組設計資料的方差分析
- 拉丁方設計方差分析
- 兩階段交叉設計資料方差分析
- 析因設計資料的方差分析
- 正交設計資料的方差分析
- 重復測量數據兩因素兩水平的方差分析
- 重復測量數據兩因素多水平的分析
- 相關分析
- 兩個變量
- 一個變量和其他所有變量
- 所有變量間兩兩相關性
- 相關矩陣
rstatix提供一個簡單直觀的管道友好的框架,與整潔的設計理念一致,用于執行基本的統計檢驗,包括t檢驗,Wilcoxon檢驗,方差分析,Kruskal-Wallis和相關性分析。每個分析的輸出會自動轉換成一個整潔的數據框架,以方便可視化。
附加功能可用于重塑,重新排序,操作和可視化相關矩陣。功能還包括析因實驗的分析,包括重復測量設計、析因設計、正交設計等。
可以計算幾個效應大小指標,包括方差分析eta平方,t檢驗的Cohen’s d和分類變量之間的關聯的Cramer’s v。 該軟件包包含用于識別單變量和多變量異常值、評估正態性和方差齊性的輔助函數。
主要函數
描述性統計
- get_summary_stat():計算描述性的統計指標;
- freq_table(): 分類變量的頻率表;
- get_mode(): 眾數;
- identify_outliers(): 使用boxplot鑒別離群值;
- mahalanobis_distance(): 計算Mahalanobi距離和離群點;
- shapiro_test() and mshapiro_test(): 正態性檢驗.
比較均值
- t_test(): 單樣本、配對樣本、獨立樣本t檢驗;
- wilcox_test(): 單樣本、配對樣本、獨立樣本秩和檢驗;
- sign_test(): 符號檢驗;
- anova_test(): 基于car::Anova()改寫,可以做:獨立測量、重復測量、混合anova;
- get_anova_test_table(): 從anova_test() 提取結果,可自動執行球形檢驗.;
- welch_anova_test(): Welch one-Way ANOVA test. 基于stats::oneway.test()改寫;
- kruskal_test(): kruskal-wallis rank sum test;
- friedman_test(): Friedman rank sum test;
- get_comparisons(): 創建需要比較的組;
- get_pvalue_position: 使用ggplot2添加p值時可自動計算添加坐標
增強R中的ANOVA
- factorial_design(): 建立因子化的設計,方便使用car::Anova()進行分析,對于重復測量Anova非常有幫助;
- anova_summary(): 提取美觀的Anova檢驗的結果,包括從car:Anova()或者stats:aov()中,主要結果包含Anova結果表、一般效應量、和一些假設檢驗,比如球形檢驗。
事后檢驗(post-hoc)
- tukey_hsd(): tukey post-hoc tests;
- dunn_test(): 計算Kruskal-Wallis的成對比較;
- games_howell_test(): Games-Howell test;
- emmeans_test(): estimated marginal means
比較比例
- prop_test(), pairwise_prop_test() and row_wise_prop_test(). Performs one-sample and two-samples z-test of proportions. Wrappers around the R base function prop.test() but have the advantage of performing pairwise and row-wise z-test of two proportions, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
- fisher_test(), pairwise_fisher_test() and row_wise_fisher_test(): Fisher’s exact test for count data. Wrappers around the R base function fisher.test() but have the advantage of performing pairwise and row-wise fisher tests, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
- chisq_test(), pairwise_chisq_gof_test(), pairwise_chisq_test_against_p(): Performs chi-squared tests, including goodness-of-fit, homogeneity and independence tests.
- binom_test(), pairwise_binom_test(), pairwise_binom_test_against_p(): Performs exact binomial test and pairwise comparisons following a significant exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample.
- multinom_test(): performs an exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample size is small.
- mcnemar_test(): performs McNemar chi-squared test to compare paired proportions. Provides pairwise comparisons between multiple groups.
- cochran_qtest(): extension of the McNemar Chi-squared test for comparing more than two paired proportions.
- prop_trend_test(): Performs chi-squared test for trend in proportion. This test is also known as Cochran-Armitage trend test
比較方差
- levene_test(): Pipe-friendly framework to easily compute Levene’s test for homogeneity of variance across groups.
- box_m(): Box’s M-test for homogeneity of covariance matrices
計算效應量
- cohens_d(): Compute cohen’s d measure of effect size for t-tests.
- wilcox_effsize(): Compute Wilcoxon effect size ?.
- eta_squared() and partial_eta_squared(): Compute effect size for ANOVA.
- kruskal_effsize(): Compute the effect size for Kruskal-Wallis test as the eta squared based on the H-statistic.
- friedman_effsize(): Compute the effect size of Friedman test using the Kendall’s W value.
- cramer_v(): Compute Cramer’s V, which measures the strength of the association between categorical variables
相關性分析
計算相關性
- cor_test(): correlation test between two or more variables using Pearson, Spearman or Kendall methods.
- cor_mat(): compute correlation matrix with p-values. Returns a data frame containing the matrix of the correlation coefficients. The output has an attribute named “pvalue”, which contains the matrix of the correlation test p-values.
- cor_get_pval(): extract a correlation matrix p-values from an object of class cor_mat().
- cor_pmat(): compute the correlation matrix, but returns only the p-values of the correlation tests.
- as_cor_mat(): convert a cor_test object into a correlation matrix format.
重塑相關矩陣
- cor_reorder(): reorder correlation matrix, according to the coefficients, using the hierarchical clustering method.
- cor_gather(): takes a correlation matrix and collapses (or melt) it into long format data frame (paired list)
- cor_spread(): spread a long correlation data frame into wide format (correlation matrix).
相關矩陣取子集
- cor_select(): subset a correlation matrix by selecting variables of interest.
- pull_triangle(), pull_upper_triangle(), pull_lower_triangle(): pull upper and lower triangular parts of a (correlation) matrix.
- replace_triangle(), replace_upper_triangle(), replace_lower_triangle(): replace upper and lower triangular parts of a (correlation) matrix.
可視化相關矩陣
- cor_as_symbols(): replaces the correlation coefficients, in a matrix, by symbols according to the value.
- cor_plot(): visualize correlation matrix using base plot.
- cor_mark_significant(): add significance levels to a correlation matrix
添加P值和顯著性標記
- adjust_pvalue(): add an adjusted p-values column to a data frame containing statistical test p-values
- add_significance(): add a column containing the p-value significance level
- p_round(), p_format(), p_mark_significant(): rounding and formatting p-values
提取統計信息
- get_pwc_label(): Extract label from pairwise comparisons.
- get_test_label(): Extract label from statistical tests.
- create_test_label(): Create labels from user specified test results
數據處理輔助函數
- df_select(), df_arrange(), df_group_by(): wrappers arround dplyr functions for supporting standard and non standard evaluations.
- df_nest_by(): Nest a tibble data frame using grouping specification. Supports standard and non standard evaluations.
- df_split_by(): Split a data frame by groups into subsets or data panel. Very similar to the function df_nest_by(). The only difference is that, it adds labels to each data subset. Labels are the combination of the grouping variable levels.
- df_unite(): Unite multiple columns into one.
- df_unite_factors(): Unite factor columns. First, order factors levels then merge them into one column. The output column is a factor.
- df_label_both(), df_label_value(): functions to label data frames rows by by one or multiple grouping variables.
- df_get_var_names(): Returns user specified variable names. Supports standard and non standard evaluation
其他
- doo(): alternative to dplyr::do for doing anything. Technically it uses nest() + mutate() + map() to apply arbitrary computation to a grouped data frame.
- sample_n_by(): sample n rows by group from a table
- convert_as_factor(), set_ref_level(), reorder_levels(): Provides pipe-friendly functions to convert simultaneously multiple variables into a factor variable.
- make_clean_names(): Pipe-friendly function to make syntactically valid column names (for input data frame) or names (for input vector).
- counts_to_cases(): converts a contingency table or a data frame of counts into a data frame of individual observations
安裝和加載
if(!require(devtools)) install.packages("devtools") devtools::install_github("kassambara/rstatix")# 或者cran install.packages("rstatix")加載包:
library(rstatix) ## ## 載入程輯包:'rstatix' ## The following object is masked from 'package:stats': ## ## filter library(ggpubr) ## 載入需要的程輯包:ggplot2描述性統計
# 指定列 iris %>% get_summary_stats(Sepal.Length, Sepal.Width, type = "full") ## # A tibble: 2 x 13 ## variable n min max median q1 q3 iqr mad mean sd se ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Sepal.Leng~ 150 4.3 7.9 5.8 5.1 6.4 1.3 1.04 5.84 0.828 0.068 ## 2 Sepal.Width 150 2 4.4 3 2.8 3.3 0.5 0.445 3.06 0.436 0.036 ## # ... with 1 more variable: ci <dbl> # 所有列 iris %>% get_summary_stats(type = "common") ## # A tibble: 4 x 10 ## variable n min max median iqr mean sd se ci ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Petal.Length 150 1 6.9 4.35 3.5 3.76 1.76 0.144 0.285 ## 2 Petal.Width 150 0.1 2.5 1.3 1.5 1.20 0.762 0.062 0.123 ## 3 Sepal.Length 150 4.3 7.9 5.8 1.3 5.84 0.828 0.068 0.134 ## 4 Sepal.Width 150 2 4.4 3 0.5 3.06 0.436 0.036 0.07 # 分組展示 iris %>% group_by(Species) %>% get_summary_stats(Sepal.Length, type = "mean_sd") ## # A tibble: 3 x 5 ## Species variable n mean sd ## <fct> <chr> <dbl> <dbl> <dbl> ## 1 setosa Sepal.Length 50 5.01 0.352 ## 2 versicolor Sepal.Length 50 5.94 0.516 ## 3 virginica Sepal.Length 50 6.59 0.636 # 眾數 get_mode(iris$Sepal.Length) ## [1] 5t檢驗
以t檢驗為例。
單樣本t檢驗
還是以之前推文中的例3-5的數據:
df <- foreign::read.spss('E:/各科資料/醫學統計學/研究生課程/3總體均數的估計與假設檢驗18-9研/例03-05.sav',to.data.frame = T, reencode = "utf-8") ## re-encoding from utf-8attributes(df)[4] <- NULLhead(df) ## no hb ## 1 1 112 ## 2 2 137 ## 3 3 129 ## 4 4 126 ## 5 5 88 ## 6 6 90單樣本t檢驗:
df %>% t_test(hb ~ 1, mu = 140) ## # A tibble: 1 x 7 ## .y. group1 group2 n statistic df p ## * <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> ## 1 hb 1 null model 36 -2.14 35 0.0397結果和t.test()一樣哦!
配對t檢驗
使用之前推文中的例3-6數據:
library(foreign) df <- foreign::read.spss('E:/各科資料/醫學統計學/研究生課程/3總體均數的估計與假設檢驗18-9研/例03-06.sav',to.data.frame = T, reencode = "utf-8") ## re-encoding from utf-8 attributes(df)[4] <- NULLhead(df) ## no x1 x2 ## 1 1 0.840 0.580 ## 2 2 0.591 0.509 ## 3 3 0.674 0.500 ## 4 4 0.632 0.316 ## 5 5 0.687 0.337 ## 6 6 0.978 0.517進行配對樣本t檢驗:
library(tidyr) st <- df %>% pivot_longer(cols = 2:3, names_to = "X", values_to = "values") %>% t_test(values ~ X, paired = T)st ## # A tibble: 1 x 8 ## .y. group1 group2 n1 n2 statistic df p ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> ## 1 values x1 x2 10 10 7.93 9 0.0000238結果和t.test()也是一樣的哦!
可視化也可直接完成!
df %>% pivot_longer(cols = 2:3, names_to = "X", values_to = "values") %>%ggpaired(x="X",y="values",color = "X",id="no",line.size = 0.5,line.color = "grey",palette = "lancet")+stat_pvalue_manual(st,label = "配對t檢驗 p = {p}",y.position = 1.3)兩樣本t檢驗
使用之前推文中的例3-8的數據
df <- foreign::read.spss('E:/各科資料/醫學統計學/研究生課程/3總體均數的估計與假設檢驗18-9研/例03-07.sav',to.data.frame = T) df$group <- c(rep('阿卡波糖',20),rep('拜糖平',20)) attributes(df)[3] <- NULLhead(df) ## no x group ## 1 1 -0.7 阿卡波糖 ## 2 2 -5.6 阿卡波糖 ## 3 3 2.0 阿卡波糖 ## 4 4 2.8 阿卡波糖 ## 5 5 0.7 阿卡波糖 ## 6 6 3.5 阿卡波糖進行獨立樣本t檢驗:
dt <- df %>% t_test(x ~ group, paired = F) dt ## # A tibble: 1 x 8 ## .y. group1 group2 n1 n2 statistic df p ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> ## 1 x 阿卡波糖 拜糖平 20 20 -0.642 36.1 0.525和自帶的結果是一樣的哦!
順便可視化一起做了:
ggboxplot(df, x="group", y="x",color = "group", palette = "lancet")+stat_pvalue_manual(dt, label = "t-test, p = {p}",y.position = 8)分組后進行比較
df <- ToothGrowth df$dose <- as.factor(df$dose) head(df) ## len supp dose ## 1 4.2 VC 0.5 ## 2 11.5 VC 0.5 ## 3 7.3 VC 0.5 ## 4 5.8 VC 0.5 ## 5 6.4 VC 0.5 ## 6 10.0 VC 0.5 stat.test <- df %>%group_by(dose) %>%t_test(len ~ supp) %>%adjust_pvalue() %>%add_significance("p.adj") stat.test ## # A tibble: 3 x 11 ## dose .y. group1 group2 n1 n2 statistic df p p.adj ## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> ## 1 0.5 len OJ VC 10 10 3.17 15.0 0.00636 0.0127 ## 2 1 len OJ VC 10 10 4.03 15.4 0.00104 0.00312 ## 3 2 len OJ VC 10 10 -0.0461 14.0 0.964 0.964 ## # ... with 1 more variable: p.adj.signif <chr> ggboxplot(df, x = "supp", y = "len",color = "supp", palette = "jco", facet.by = "dose",ylim = c(0, 40)) +stat_pvalue_manual(stat.test, label = "p.adj", y.position = 35)多組間的兩兩比較
如果有大于2個分組,會自動進行兩兩比較。
# 兩兩之間做t檢驗,自動完成 pairwise.test <- df %>% t_test(len ~ dose) pairwise.test ## # A tibble: 3 x 10 ## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 len 0.5 1 20 20 -6.48 38.0 1.27e- 7 2.54e- 7 **** ## 2 len 0.5 2 20 20 -11.8 36.9 4.4 e-14 1.32e-13 **** ## 3 len 1 2 20 20 -4.90 37.1 1.91e- 5 1.91e- 5 ****可視化:
ggboxplot(df, x = "dose", y = "len", color = "dose", palette = "lancet")+stat_pvalue_manual(pairwise.test, label = "p.adj",y.position = c(29, 35, 39))還可以指定和某一個組進行比較:
# T-test: 都和指定的組進行比較 stat.test <- df %>% t_test(len ~ dose, ref.group = "0.5") stat.test ## # A tibble: 2 x 10 ## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 len 0.5 1 20 20 -6.48 38.0 1.27e- 7 1.27e- 7 **** ## 2 len 0.5 2 20 20 -11.8 36.9 4.4 e-14 8.8 e-14 **** # Box plot ggboxplot(df, x = "dose", y = "len", ylim = c(0, 40)) +stat_pvalue_manual(stat.test, label = "p.adj.signif",y.position = c(29, 35))和全部數據進行比較:
stat.test <- df %>% t_test(len ~ dose, ref.group = "all") stat.test ## # A tibble: 3 x 10 ## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 len all 0.5 60 20 5.82 56.4 0.00000029 0.00000087 **** ## 2 len all 1 60 20 -0.660 57.5 0.512 0.512 ns ## 3 len all 2 60 20 -5.61 66.5 0.000000425 0.00000087 **** ggboxplot(df, x = "dose", y = "len", fill = "dose", palette = "lancet", alpha = 0.6) +stat_pvalue_manual(stat.test, label = "p.adj.signif", y.position = 35) +geom_hline(yintercept = mean(df$len), linetype = 2)方差分析
完全隨機設計方差分析
使用課本例4-2的數據。
首先是構造數據,本次數據自己從書上摘錄。。
trt<-c(rep("group1",30),rep("group2",30),rep("group3",30),rep("group4",30))weight<-c(3.53,4.59,4.34,2.66,3.59,3.13,3.30,4.04,3.53,3.56,3.85,4.07,1.37,3.93,2.33,2.98,4.00,3.55,2.64,2.56,3.50,3.25,2.96,4.30,3.52,3.93,4.19,2.96,4.16,2.59,2.42,3.36,4.32,2.34,2.68,2.95,2.36,2.56,2.52,2.27,2.98,3.72,2.65,2.22,2.90,1.98,2.63,2.86,2.93,2.17,2.72,1.56,3.11,1.81,1.77,2.80,3.57,2.97,4.02,2.31,2.86,2.28,2.39,2.28,2.48,2.28,3.48,2.42,2.41,2.66,3.29,2.70,2.66,3.68,2.65,2.66,2.32,2.61,3.64,2.58,3.65,3.21,2.23,2.32,2.68,3.04,2.81,3.02,1.97,1.68,0.89,1.06,1.08,1.27,1.63,1.89,1.31,2.51,1.88,1.41,3.19,1.92,0.94,2.11,2.81,1.98,1.74,2.16,3.37,2.97,1.69,1.19,2.17,2.28,1.72,2.47,1.02,2.52,2.10,3.71)data1<-data.frame(trt,weight)head(data1) ## trt weight ## 1 group1 3.53 ## 2 group1 4.59 ## 3 group1 4.34 ## 4 group1 2.66 ## 5 group1 3.59 ## 6 group1 3.13進行完全隨機設計資料的方差分析:
data1 %>% anova_test(weight ~ trt) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 trt 3 116 24.884 1.67e-12 * 0.392隨機區組設計資料的方差分析
使用例4-3的數據。
首先是構造數據,本次數據自己從書上摘錄。。
weight <- c(0.82,0.65,0.51,0.73,0.54,0.23,0.43,0.34,0.28,0.41,0.21,0.31,0.68,0.43,0.24) block <- c(rep(c("1","2","3","4","5"),each=3)) group <- c(rep(c("A","B","C"),5))data4_4 <- data.frame(weight,block,group)head(data4_4) ## weight block group ## 1 0.82 1 A ## 2 0.65 1 B ## 3 0.51 1 C ## 4 0.73 2 A ## 5 0.54 2 B ## 6 0.23 2 C數據一共3列,第一列是小白鼠肉瘤重量,第二列是區組因素(5個區組),第三列是分組(一共3組)
進行隨機區組設計資料的方差分析:
data4_4 %>% anova_test(weight ~ group + block) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 group 2 8 11.937 0.004 * 0.749 ## 2 block 4 8 5.978 0.016 * 0.749拉丁方設計方差分析
使用課本例4-5的數據。
首先是構造數據,本次數據自己從書上摘錄。。
psize <- c(87,75,81,75,84,66,73,81,87,85,64,79,73,73,74,78,73,77,77,68,69,74,76,73,64,64,72,76,70,81,75,77,82,61,82,61) drug <- c("C","B","E","D","A","F","B","A","D","C","F","E","F","E","B","A","D","C","A","F","C","B","E","D","D","C","F","E","B","A","E","D","A","F","C","B") col_block <- c(rep(1:6,6)) row_block <- c(rep(1:6,each=6)) mydata <- data.frame(psize,drug,col_block,row_block) mydata$col_block <- factor(mydata$col_block) mydata$row_block <- factor(mydata$row_block) str(mydata) ## 'data.frame': 36 obs. of 4 variables: ## $ psize : num 87 75 81 75 84 66 73 81 87 85 ... ## $ drug : chr "C" "B" "E" "D" ... ## $ col_block: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ... ## $ row_block: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...數據一共4列,第一列是皮膚皰疹大小,第二列是不同藥物(處理因素,共5種),第三列是列區組因素,第四列是行區組因素。
進行拉丁方設計的方差分析:
mydata %>% anova_test(psize ~ drug + col_block + row_block) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 drug 5 20 3.906 0.012 * 0.494 ## 2 col_block 5 20 0.500 0.772 0.111 ## 3 row_block 5 20 1.466 0.245 0.268兩階段交叉設計資料方差分析
使用課本例4-6的數據。
首先是構造數據,本次數據自己從書上摘錄。。
contain <- c(760,770,860,855,568,602,780,800,960,958,940,952,635,650,440,450,528,530,800,803) phase <- rep(c("phase_1","phase_2"),10) type <- c("A","B","B","A","A","B","A","B","B","A","B","A","A","B","B","A","A","B","B","A") testid <- rep(1:10,each=2) mydata <- data.frame(testid,phase,type,contain)str(mydata) ## 'data.frame': 20 obs. of 4 variables: ## $ testid : int 1 1 2 2 3 3 4 4 5 5 ... ## $ phase : chr "phase_1" "phase_2" "phase_1" "phase_2" ... ## $ type : chr "A" "B" "B" "A" ... ## $ contain: num 760 770 860 855 568 602 780 800 960 958 ...mydata$testid <- factor(mydata$testid)數據一共4列,第一列是受試者id,第二列是不同階段,第三列是測定方法,第四列是測量值。
進行兩階段交叉設計資料方差分析:
mydata %>% anova_test(contain ~ phase + type + testid) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 phase 1 8 9.925 1.40e-02 * 0.554 ## 2 type 1 8 4.019 8.00e-02 0.334 ## 3 testid 9 8 1240.195 1.32e-11 * 0.999結果和課本一致!
析因設計資料的方差分析
使用課本例11-1的數據,自己手動摘錄:
df11_1 <- data.frame(x1 = rep(c("外膜縫合","束膜縫合"), each = 10),x2 = rep(c("縫合1個月","縫合2個月"), each = 5),y = c(10,10,40,50,10,30,30,70,60,30,10,20,30,50,30,50,50,70,60,30) )str(df11_1) ## 'data.frame': 20 obs. of 3 variables: ## $ x1: chr "外膜縫合" "外膜縫合" "外膜縫合" "外膜縫合" ... ## $ x2: chr "縫合1個月" "縫合1個月" "縫合1個月" "縫合1個月" ... ## $ y : num 10 10 40 50 10 30 30 70 60 30 ...數據一共3列,第1列是縫合方法,第2列是時間,第3列是軸突通過率。
進行析因設計資料的方差分析:
df11_1 %>% anova_test(y ~ x1 * x2) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 x1 1 16 0.600 0.450 0.036 ## 2 x2 1 16 8.067 0.012 * 0.335 ## 3 x1:x2 1 16 0.067 0.800 0.004正交設計資料的方差分析
使用課本例11-4的數據
df11_4 <- data.frame(a = rep(c("5度","25度"),each = 4),b = rep(c(0.5, 5.0), each = 2),c = c(10, 30),d = c(6.0, 8.0,8.0,6.0,8.0,6.0,6.0,8.0),x = c(86,95,91,94,91,96,83,88) )df11_4$a <- factor(df11_4$a) df11_4$b <- factor(df11_4$b) df11_4$c <- factor(df11_4$c) df11_4$d <- factor(df11_4$d)str(df11_4) ## 'data.frame': 8 obs. of 5 variables: ## $ a: Factor w/ 2 levels "25度","5度": 2 2 2 2 1 1 1 1 ## $ b: Factor w/ 2 levels "0.5","5": 1 1 2 2 1 1 2 2 ## $ c: Factor w/ 2 levels "10","30": 1 2 1 2 1 2 1 2 ## $ d: Factor w/ 2 levels "6","8": 1 2 2 1 2 1 1 2 ## $ x: num 86 95 91 94 91 96 83 88進行正交設計資料的方差分析:
df11_4 %>% anova_test(x ~ a + b + c + d + a * b) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 a 1 2 3.2 0.216 0.615 ## 2 b 1 2 7.2 0.115 0.783 ## 3 c 1 2 24.2 0.039 * 0.924 ## 4 d 1 2 1.8 0.312 0.474 ## 5 a:b 1 2 20.0 0.047 * 0.909重復測量數據兩因素兩水平的方差分析
使用課本例12-1的數據,直接讀取:
df12_1 <- foreign::read.spss("E:/各科資料/醫學統計學/研究生課程/析因設計重復測量/9重復測量18-9研/12-1.sav", to.data.frame = T)str(df12_1) ## 'data.frame': 20 obs. of 5 variables: ## $ n : num 1 2 3 4 5 6 7 8 9 10 ... ## $ x1 : num 130 124 136 128 122 118 116 138 126 124 ... ## $ x2 : num 114 110 126 116 102 100 98 122 108 106 ... ## $ group: Factor w/ 2 levels "處理組","對照組": 1 1 1 1 1 1 1 1 1 1 ... ## $ d : num 16 14 10 12 20 18 18 16 18 18 ... ## - attr(*, "variable.labels")= Named chr [1:5] "編號" "治療前血壓" "治療后血壓" "組別" ... ## ..- attr(*, "names")= chr [1:5] "n" "x1" "x2" "group" ... ## - attr(*, "codepage")= int 936數據一共5列(第5列是自己算出來的,其實原始數據只有4列),第1 列是編號,第2列是治療前血壓,第3例是治療后血壓,第4列是分組,第5列是血壓前后差值。
進行重復測量數據兩因素兩水平的方差分析前,先把數據轉換一下格式:
suppressMessages(library(tidyverse)) df12_11 <- df12_1[,1:4] %>% pivot_longer(cols = 2:3,names_to = "time",values_to = "hp") %>% mutate_if(is.character, as.factor)df12_11$n <- factor(df12_11$n)str(df12_11) ## tibble [40 x 4] (S3: tbl_df/tbl/data.frame) ## $ n : Factor w/ 20 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5 ... ## $ group: Factor w/ 2 levels "處理組","對照組": 1 1 1 1 1 1 1 1 1 1 ... ## $ time : Factor w/ 2 levels "x1","x2": 1 2 1 2 1 2 1 2 1 2 ... ## $ hp : num [1:40] 130 114 124 110 136 126 128 116 122 102 ...進行重復測量數據兩因素兩水平的方差分析:
hp是因變量,time是測量時間(治療前和治療后各測量一次),group是分組因素(兩種治療方法),n是受試者編號
df12_11 %>% anova_test(hp ~ time * group + Error(n/time)) ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 group 1 18 1.574 2.26e-01 0.071 ## 2 time 1 18 55.008 7.08e-07 * 0.278 ## 3 group:time 1 18 18.771 4.01e-04 * 0.116這個結果也是和課本一樣的,只不過沒有顯示組內誤差的情況!
重復測量數據兩因素多水平的分析
使用課本例12-3的數據,直接讀取:
df12_3 <- foreign::read.spss("E:/各科資料/醫學統計學/研究生課程/析因設計重復測量/9重復測量18-9研/例12-03.sav",to.data.frame = T)str(df12_3) ## 'data.frame': 15 obs. of 7 variables: ## $ No : num 1 2 3 4 5 6 7 8 9 10 ... ## $ group: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 2 2 2 2 2 ... ## $ t0 : num 120 118 119 121 127 121 122 128 117 118 ... ## $ t1 : num 108 109 112 112 121 120 121 129 115 114 ... ## $ t2 : num 112 115 119 119 127 118 119 126 111 116 ... ## $ t3 : num 120 126 124 126 133 131 129 135 123 123 ... ## $ t4 : num 117 123 118 120 126 137 133 142 131 133 ... ## - attr(*, "variable.labels")= Named chr [1:7] "序號" "組別" "" "" ... ## ..- attr(*, "names")= chr [1:7] "No" "group" "t0" "t1" ...數據一共7列,第1列是患者編號,第2列是誘導方法(3種),第3-7列是5個時間點的血壓。
首先轉換數據格式:
library(tidyverse)df12_31 <- df12_3 %>% pivot_longer(cols = 3:7, names_to = "times", values_to = "hp")df12_31$No <- factor(df12_31$No) df12_31$times <- factor(df12_31$times)str(df12_31) ## tibble [75 x 4] (S3: tbl_df/tbl/data.frame) ## $ No : Factor w/ 15 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 2 ... ## $ group: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ... ## $ times: Factor w/ 5 levels "t0","t1","t2",..: 1 2 3 4 5 1 2 3 4 5 ... ## $ hp : num [1:75] 120 108 112 120 117 118 109 115 126 123 ...進行重復測量兩因素多水平的方差分析:
df12_31 %>% anova_test(hp ~ times * group + Error(No/(times))) ## ANOVA Table (type II tests) ## ## $ANOVA ## Effect DFn DFd F p p<.05 ges ## 1 group 2 12 5.783 1.70e-02 * 0.430 ## 2 times 4 48 106.558 3.02e-23 * 0.659 ## 3 group:times 8 48 19.101 1.62e-12 * 0.409 ## ## $`Mauchly's Test for Sphericity` ## Effect W p p<.05 ## 1 times 0.293 0.178 ## 2 group:times 0.293 0.178 ## ## $`Sphericity Corrections` ## Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] ## 1 times 0.679 2.71, 32.58 1.87e-16 * 0.896 3.59, 43.03 4.65e-21 ## 2 group:times 0.679 5.43, 32.58 4.26e-09 * 0.896 7.17, 43.03 2.04e-11 ## p[HF]<.05 ## 1 * ## 2 *這個結果也是和課本一樣的,而且直接給出了球形檢驗的結果!非常方便。
相關分析
以R語言自帶的mtcars數據集為例。
mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec)head(mydata) ## mpg disp hp drat wt qsec ## Mazda RX4 21.0 160 110 3.90 2.620 16.46 ## Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 ## Datsun 710 22.8 108 93 3.85 2.320 18.61 ## Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 ## Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 ## Valiant 18.1 225 105 2.76 3.460 20.22兩個變量
mydata %>% cor_test(wt, mpg, method = "pearson") ## # A tibble: 1 x 8 ## var1 var2 cor statistic p conf.low conf.high method ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 wt mpg -0.87 -9.56 1.29e-10 -0.934 -0.744 Pearson一個變量和其他所有變量
mydata %>% cor_test(mpg, method = "pearson") ## # A tibble: 5 x 8 ## var1 var2 cor statistic p conf.low conf.high method ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 mpg disp -0.85 -8.75 9.38e-10 -0.923 -0.708 Pearson ## 2 mpg hp -0.78 -6.74 1.79e- 7 -0.885 -0.586 Pearson ## 3 mpg drat 0.68 5.10 1.78e- 5 0.436 0.832 Pearson ## 4 mpg wt -0.87 -9.56 1.29e-10 -0.934 -0.744 Pearson ## 5 mpg qsec 0.42 2.53 1.71e- 2 0.0820 0.670 Pearson所有變量間兩兩相關性
mydata %>% cor_test(method = "pearson") ## # A tibble: 36 x 8 ## var1 var2 cor statistic p conf.low conf.high method ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 mpg mpg 1 Inf 0 1 1 Pearson ## 2 mpg disp -0.85 -8.75 9.38e-10 -0.923 -0.708 Pearson ## 3 mpg hp -0.78 -6.74 1.79e- 7 -0.885 -0.586 Pearson ## 4 mpg drat 0.68 5.10 1.78e- 5 0.436 0.832 Pearson ## 5 mpg wt -0.87 -9.56 1.29e-10 -0.934 -0.744 Pearson ## 6 mpg qsec 0.42 2.53 1.71e- 2 0.0820 0.670 Pearson ## 7 disp mpg -0.85 -8.75 9.38e-10 -0.923 -0.708 Pearson ## 8 disp disp 1 Inf 0 1 1 Pearson ## 9 disp hp 0.79 7.08 7.14e- 8 0.611 0.893 Pearson ## 10 disp drat -0.71 -5.53 5.28e- 6 -0.849 -0.481 Pearson ## # ... with 26 more rows可以看到所有的輸出都是整潔格式的,而且都給出了詳細的統計值。
相關矩陣
當然也提供了相關矩陣的方法。
cor.mat <- mydata %>% cor_mat()cor.mat ## # A tibble: 6 x 7 ## rowname mpg disp hp drat wt qsec ## * <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 mpg 1 -0.85 -0.78 0.68 -0.87 0.42 ## 2 disp -0.85 1 0.79 -0.71 0.89 -0.43 ## 3 hp -0.78 0.79 1 -0.45 0.66 -0.71 ## 4 drat 0.68 -0.71 -0.45 1 -0.71 0.091 ## 5 wt -0.87 0.89 0.66 -0.71 1 -0.17 ## 6 qsec 0.42 -0.43 -0.71 0.091 -0.17 1展示P值矩陣:
cor.mat %>% cor_get_pval() ## # A tibble: 6 x 7 ## rowname mpg disp hp drat wt qsec ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 mpg 0 9.38e-10 0.000000179 0.0000178 1.29e- 10 0.0171 ## 2 disp 9.38e-10 0 0.0000000714 0.00000528 1.22e- 11 0.0131 ## 3 hp 1.79e- 7 7.14e- 8 0 0.00999 4.15e- 5 0.00000577 ## 4 drat 1.78e- 5 5.28e- 6 0.00999 0 4.78e- 6 0.62 ## 5 wt 1.29e-10 1.22e-11 0.0000415 0.00000478 2.27e-236 0.339 ## 6 qsec 1.71e- 2 1.31e- 2 0.00000577 0.62 3.39e- 1 0使用*代替p值:
cor.mat %>% cor_as_symbols() %>% pull_lower_triangle() ## rowname mpg disp hp drat wt qsec ## 1 mpg ## 2 disp * ## 3 hp * * ## 4 drat + + . ## 5 wt * * + + ## 6 qsec . . +星號和相關性一起顯示:
cor.mat %>% cor_mark_significant() ## rowname mpg disp hp drat wt qsec ## 1 mpg ## 2 disp -0.85**** ## 3 hp -0.78**** 0.79**** ## 4 drat 0.68**** -0.71**** -0.45** ## 5 wt -0.87**** 0.89**** 0.66**** -0.71**** ## 6 qsec 0.42* -0.43* -0.71**** 0.091 -0.17簡單的可視化:
cor.mat %>% cor_reorder() %>% cor_plot()這個圖明顯是使用corrplot包畫出來的,可以直接使用此包畫出更好看的圖,后期將會專門介紹corrplot包!
本文首發于公眾號:醫學和生信筆記,完美觀看體驗請至公眾號查看本文。
醫學和生信筆記,專注R語言在臨床醫學中的使用,R語言數據分析和可視化。
總結
以上是生活随笔為你收集整理的R语言“优雅地“进行医学统计分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 直播视频app源码的靓号可以怎样实现?
- 下一篇: 基于SSM社区养老院服务系统