當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

R语言“优雅地“进行医学统计分析

發布時間：2024/1/1 编程问答 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 R语言“优雅地“进行医学统计分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文首發于公眾號：醫學和生信筆記，完美觀看體驗請至公眾號查看本文。

醫學和生信筆記，專注R語言在臨床醫學中的使用，R語言數據分析和可視化。

文章目錄

- 主要函數
- - 描述性統計
  - 比較均值
  - 增強R中的ANOVA
  - 事后檢驗（post-hoc）
  - 比較比例
  - 比較方差
  - 計算效應量
  - 相關性分析
  - - 計算相關性
    - 重塑相關矩陣
    - 相關矩陣取子集
    - 可視化相關矩陣
  - 添加P值和顯著性標記
  - 提取統計信息
  - 數據處理輔助函數
  - 其他
- 安裝和加載
- 描述性統計
- t檢驗
- - 單樣本t檢驗
  - 配對t檢驗
  - 兩樣本t檢驗
  - 分組后進行比較
  - 多組間的兩兩比較
- 方差分析
- - 完全隨機設計方差分析
  - 隨機區組設計資料的方差分析
  - 拉丁方設計方差分析
  - 兩階段交叉設計資料方差分析
  - 析因設計資料的方差分析
  - 正交設計資料的方差分析
  - 重復測量數據兩因素兩水平的方差分析
  - 重復測量數據兩因素多水平的分析
- 相關分析
- - 兩個變量
  - 一個變量和其他所有變量
  - 所有變量間兩兩相關性
  - 相關矩陣

rstatix提供一個簡單直觀的管道友好的框架，與整潔的設計理念一致，用于執行基本的統計檢驗，包括t檢驗，Wilcoxon檢驗，方差分析，Kruskal-Wallis和相關性分析。每個分析的輸出會自動轉換成一個整潔的數據框架，以方便可視化。

附加功能可用于重塑，重新排序，操作和可視化相關矩陣。功能還包括析因實驗的分析，包括重復測量設計、析因設計、正交設計等。

可以計算幾個效應大小指標，包括方差分析eta平方，t檢驗的Cohen’s d和分類變量之間的關聯的Cramer’s v。該軟件包包含用于識別單變量和多變量異常值、評估正態性和方差齊性的輔助函數。

主要函數

描述性統計

get_summary_stat():計算描述性的統計指標；
freq_table(): 分類變量的頻率表；
get_mode(): 眾數；
identify_outliers(): 使用boxplot鑒別離群值；
mahalanobis_distance(): 計算Mahalanobi距離和離群點；
shapiro_test() and mshapiro_test(): 正態性檢驗.

比較均值

t_test(): 單樣本、配對樣本、獨立樣本t檢驗；
wilcox_test(): 單樣本、配對樣本、獨立樣本秩和檢驗；
sign_test(): 符號檢驗；
anova_test(): 基于car::Anova()改寫，可以做：獨立測量、重復測量、混合anova；
get_anova_test_table(): 從anova_test() 提取結果，可自動執行球形檢驗.；
welch_anova_test(): Welch one-Way ANOVA test. 基于stats::oneway.test()改寫；
kruskal_test(): kruskal-wallis rank sum test；
friedman_test(): Friedman rank sum test；
get_comparisons(): 創建需要比較的組；
get_pvalue_position: 使用ggplot2添加p值時可自動計算添加坐標

增強R中的ANOVA

factorial_design(): 建立因子化的設計，方便使用car::Anova()進行分析，對于重復測量Anova非常有幫助；
anova_summary(): 提取美觀的Anova檢驗的結果，包括從car:Anova()或者stats:aov()中，主要結果包含Anova結果表、一般效應量、和一些假設檢驗，比如球形檢驗。

事后檢驗（post-hoc）

tukey_hsd(): tukey post-hoc tests;
dunn_test(): 計算Kruskal-Wallis的成對比較；
games_howell_test(): Games-Howell test；
emmeans_test(): estimated marginal means

比較比例

prop_test(), pairwise_prop_test() and row_wise_prop_test(). Performs one-sample and two-samples z-test of proportions. Wrappers around the R base function prop.test() but have the advantage of performing pairwise and row-wise z-test of two proportions, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
fisher_test(), pairwise_fisher_test() and row_wise_fisher_test(): Fisher’s exact test for count data. Wrappers around the R base function fisher.test() but have the advantage of performing pairwise and row-wise fisher tests, the post-hoc tests following a significant chi-square test of homogeneity for 2xc and rx2 contingency tables.
chisq_test(), pairwise_chisq_gof_test(), pairwise_chisq_test_against_p(): Performs chi-squared tests, including goodness-of-fit, homogeneity and independence tests.
binom_test(), pairwise_binom_test(), pairwise_binom_test_against_p(): Performs exact binomial test and pairwise comparisons following a significant exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample.
multinom_test(): performs an exact multinomial test. Alternative to the chi-square test of goodness-of-fit-test when the sample size is small.
mcnemar_test(): performs McNemar chi-squared test to compare paired proportions. Provides pairwise comparisons between multiple groups.
cochran_qtest(): extension of the McNemar Chi-squared test for comparing more than two paired proportions.
prop_trend_test(): Performs chi-squared test for trend in proportion. This test is also known as Cochran-Armitage trend test

比較方差

levene_test(): Pipe-friendly framework to easily compute Levene’s test for homogeneity of variance across groups.
box_m(): Box’s M-test for homogeneity of covariance matrices

計算效應量

cohens_d(): Compute cohen’s d measure of effect size for t-tests.
wilcox_effsize(): Compute Wilcoxon effect size ?.
eta_squared() and partial_eta_squared(): Compute effect size for ANOVA.
kruskal_effsize(): Compute the effect size for Kruskal-Wallis test as the eta squared based on the H-statistic.
friedman_effsize(): Compute the effect size of Friedman test using the Kendall’s W value.
cramer_v(): Compute Cramer’s V, which measures the strength of the association between categorical variables

提取統計信息

get_pwc_label(): Extract label from pairwise comparisons.
get_test_label(): Extract label from statistical tests.
create_test_label(): Create labels from user specified test results

數據處理輔助函數

df_select(), df_arrange(), df_group_by(): wrappers arround dplyr functions for supporting standard and non standard evaluations.
df_nest_by(): Nest a tibble data frame using grouping specification. Supports standard and non standard evaluations.
df_split_by(): Split a data frame by groups into subsets or data panel. Very similar to the function df_nest_by(). The only difference is that, it adds labels to each data subset. Labels are the combination of the grouping variable levels.
df_unite(): Unite multiple columns into one.
df_unite_factors(): Unite factor columns. First, order factors levels then merge them into one column. The output column is a factor.
df_label_both(), df_label_value(): functions to label data frames rows by by one or multiple grouping variables.
df_get_var_names(): Returns user specified variable names. Supports standard and non standard evaluation

其他

doo(): alternative to dplyr::do for doing anything. Technically it uses nest() + mutate() + map() to apply arbitrary computation to a grouped data frame.
sample_n_by(): sample n rows by group from a table
convert_as_factor(), set_ref_level(), reorder_levels(): Provides pipe-friendly functions to convert simultaneously multiple variables into a factor variable.
make_clean_names(): Pipe-friendly function to make syntactically valid column names (for input data frame) or names (for input vector).
counts_to_cases(): converts a contingency table or a data frame of counts into a data frame of individual observations

安裝和加載

if(!require(devtools)) install.packages("devtools") devtools::install_github("kassambara/rstatix")# 或者cran install.packages("rstatix")

加載包：

library(rstatix) ## ## 載入程輯包：'rstatix' ## The following object is masked from 'package:stats': ## ## filter library(ggpubr) ## 載入需要的程輯包：ggplot2

描述性統計

# 指定列 iris %>% get_summary_stats(Sepal.Length, Sepal.Width, type = "full") ## # A tibble: 2 x 13 ## variable n min max median q1 q3 iqr mad mean sd se ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Sepal.Leng~ 150 4.3 7.9 5.8 5.1 6.4 1.3 1.04 5.84 0.828 0.068 ## 2 Sepal.Width 150 2 4.4 3 2.8 3.3 0.5 0.445 3.06 0.436 0.036 ## # ... with 1 more variable: ci <dbl> # 所有列 iris %>% get_summary_stats(type = "common") ## # A tibble: 4 x 10 ## variable n min max median iqr mean sd se ci ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Petal.Length 150 1 6.9 4.35 3.5 3.76 1.76 0.144 0.285 ## 2 Petal.Width 150 0.1 2.5 1.3 1.5 1.20 0.762 0.062 0.123 ## 3 Sepal.Length 150 4.3 7.9 5.8 1.3 5.84 0.828 0.068 0.134 ## 4 Sepal.Width 150 2 4.4 3 0.5 3.06 0.436 0.036 0.07 # 分組展示 iris %>% group_by(Species) %>% get_summary_stats(Sepal.Length, type = "mean_sd") ## # A tibble: 3 x 5 ## Species variable n mean sd ## <fct> <chr> <dbl> <dbl> <dbl> ## 1 setosa Sepal.Length 50 5.01 0.352 ## 2 versicolor Sepal.Length 50 5.94 0.516 ## 3 virginica Sepal.Length 50 6.59 0.636 # 眾數 get_mode(iris$Sepal.Length) ## [1] 5

t檢驗

以t檢驗為例。

單樣本t檢驗

還是以之前推文中的例3-5的數據：

df <- foreign::read.spss('E:/各科資料/醫學統計學/研究生課程/3總體均數的估計與假設檢驗18-9研/例03-05.sav',to.data.frame = T, reencode = "utf-8") ## re-encoding from utf-8attributes(df)[4] <- NULLhead(df) ## no hb ## 1 1 112 ## 2 2 137 ## 3 3 129 ## 4 4 126 ## 5 5 88 ## 6 6 90

單樣本t檢驗：

df %>% t_test(hb ~ 1, mu = 140) ## # A tibble: 1 x 7 ## .y. group1 group2 n statistic df p ## * <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> ## 1 hb 1 null model 36 -2.14 35 0.0397

結果和t.test()一樣哦！

配對t檢驗

使用之前推文中的例3-6數據：

library(foreign) df <- foreign::read.spss('E:/各科資料/醫學統計學/研究生課程/3總體均數的估計與假設檢驗18-9研/例03-06.sav',to.data.frame = T, reencode = "utf-8") ## re-encoding from utf-8 attributes(df)[4] <- NULLhead(df) ## no x1 x2 ## 1 1 0.840 0.580 ## 2 2 0.591 0.509 ## 3 3 0.674 0.500 ## 4 4 0.632 0.316 ## 5 5 0.687 0.337 ## 6 6 0.978 0.517

進行配對樣本t檢驗：

library(tidyr) st <- df %>% pivot_longer(cols = 2:3, names_to = "X", values_to = "values") %>% t_test(values ~ X, paired = T)st ## # A tibble: 1 x 8 ## .y. group1 group2 n1 n2 statistic df p ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> ## 1 values x1 x2 10 10 7.93 9 0.0000238

結果和t.test()也是一樣的哦！

可視化也可直接完成！

df %>% pivot_longer(cols = 2:3, names_to = "X", values_to = "values") %>%ggpaired(x="X",y="values",color = "X",id="no",line.size = 0.5,line.color = "grey",palette = "lancet")+stat_pvalue_manual(st,label = "配對t檢驗 p = {p}",y.position = 1.3)

兩樣本t檢驗

使用之前推文中的例3-8的數據

df <- foreign::read.spss('E:/各科資料/醫學統計學/研究生課程/3總體均數的估計與假設檢驗18-9研/例03-07.sav',to.data.frame = T) df$group <- c(rep('阿卡波糖',20),rep('拜糖平',20)) attributes(df)[3] <- NULLhead(df) ## no x group ## 1 1 -0.7 阿卡波糖 ## 2 2 -5.6 阿卡波糖 ## 3 3 2.0 阿卡波糖 ## 4 4 2.8 阿卡波糖 ## 5 5 0.7 阿卡波糖 ## 6 6 3.5 阿卡波糖

進行獨立樣本t檢驗：

dt <- df %>% t_test(x ~ group, paired = F) dt ## # A tibble: 1 x 8 ## .y. group1 group2 n1 n2 statistic df p ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> ## 1 x 阿卡波糖拜糖平 20 20 -0.642 36.1 0.525

和自帶的結果是一樣的哦！

順便可視化一起做了：

ggboxplot(df, x="group", y="x",color = "group", palette = "lancet")+stat_pvalue_manual(dt, label = "t-test, p = {p}",y.position = 8)

分組后進行比較

df <- ToothGrowth df$dose <- as.factor(df$dose) head(df) ## len supp dose ## 1 4.2 VC 0.5 ## 2 11.5 VC 0.5 ## 3 7.3 VC 0.5 ## 4 5.8 VC 0.5 ## 5 6.4 VC 0.5 ## 6 10.0 VC 0.5 stat.test <- df %>%group_by(dose) %>%t_test(len ~ supp) %>%adjust_pvalue() %>%add_significance("p.adj") stat.test ## # A tibble: 3 x 11 ## dose .y. group1 group2 n1 n2 statistic df p p.adj ## * <fct> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> ## 1 0.5 len OJ VC 10 10 3.17 15.0 0.00636 0.0127 ## 2 1 len OJ VC 10 10 4.03 15.4 0.00104 0.00312 ## 3 2 len OJ VC 10 10 -0.0461 14.0 0.964 0.964 ## # ... with 1 more variable: p.adj.signif <chr> ggboxplot(df, x = "supp", y = "len",color = "supp", palette = "jco", facet.by = "dose",ylim = c(0, 40)) +stat_pvalue_manual(stat.test, label = "p.adj", y.position = 35)

多組間的兩兩比較

如果有大于2個分組，會自動進行兩兩比較。

# 兩兩之間做t檢驗，自動完成 pairwise.test <- df %>% t_test(len ~ dose) pairwise.test ## # A tibble: 3 x 10 ## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 len 0.5 1 20 20 -6.48 38.0 1.27e- 7 2.54e- 7 **** ## 2 len 0.5 2 20 20 -11.8 36.9 4.4 e-14 1.32e-13 **** ## 3 len 1 2 20 20 -4.90 37.1 1.91e- 5 1.91e- 5 ****

可視化：

ggboxplot(df, x = "dose", y = "len", color = "dose", palette = "lancet")+stat_pvalue_manual(pairwise.test, label = "p.adj",y.position = c(29, 35, 39))

還可以指定和某一個組進行比較：

# T-test: 都和指定的組進行比較 stat.test <- df %>% t_test(len ~ dose, ref.group = "0.5") stat.test ## # A tibble: 2 x 10 ## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 len 0.5 1 20 20 -6.48 38.0 1.27e- 7 1.27e- 7 **** ## 2 len 0.5 2 20 20 -11.8 36.9 4.4 e-14 8.8 e-14 **** # Box plot ggboxplot(df, x = "dose", y = "len", ylim = c(0, 40)) +stat_pvalue_manual(stat.test, label = "p.adj.signif",y.position = c(29, 35))

和全部數據進行比較：

stat.test <- df %>% t_test(len ~ dose, ref.group = "all") stat.test ## # A tibble: 3 x 10 ## .y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif ## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 len all 0.5 60 20 5.82 56.4 0.00000029 0.00000087 **** ## 2 len all 1 60 20 -0.660 57.5 0.512 0.512 ns ## 3 len all 2 60 20 -5.61 66.5 0.000000425 0.00000087 **** ggboxplot(df, x = "dose", y = "len", fill = "dose", palette = "lancet", alpha = 0.6) +stat_pvalue_manual(stat.test, label = "p.adj.signif", y.position = 35) +geom_hline(yintercept = mean(df$len), linetype = 2)

方差分析

完全隨機設計方差分析

使用課本例4-2的數據。

首先是構造數據，本次數據自己從書上摘錄。。

trt<-c(rep("group1",30),rep("group2",30),rep("group3",30),rep("group4",30))weight<-c(3.53,4.59,4.34,2.66,3.59,3.13,3.30,4.04,3.53,3.56,3.85,4.07,1.37,3.93,2.33,2.98,4.00,3.55,2.64,2.56,3.50,3.25,2.96,4.30,3.52,3.93,4.19,2.96,4.16,2.59,2.42,3.36,4.32,2.34,2.68,2.95,2.36,2.56,2.52,2.27,2.98,3.72,2.65,2.22,2.90,1.98,2.63,2.86,2.93,2.17,2.72,1.56,3.11,1.81,1.77,2.80,3.57,2.97,4.02,2.31,2.86,2.28,2.39,2.28,2.48,2.28,3.48,2.42,2.41,2.66,3.29,2.70,2.66,3.68,2.65,2.66,2.32,2.61,3.64,2.58,3.65,3.21,2.23,2.32,2.68,3.04,2.81,3.02,1.97,1.68,0.89,1.06,1.08,1.27,1.63,1.89,1.31,2.51,1.88,1.41,3.19,1.92,0.94,2.11,2.81,1.98,1.74,2.16,3.37,2.97,1.69,1.19,2.17,2.28,1.72,2.47,1.02,2.52,2.10,3.71)data1<-data.frame(trt,weight)head(data1) ## trt weight ## 1 group1 3.53 ## 2 group1 4.59 ## 3 group1 4.34 ## 4 group1 2.66 ## 5 group1 3.59 ## 6 group1 3.13

進行完全隨機設計資料的方差分析：

data1 %>% anova_test(weight ~ trt) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 trt 3 116 24.884 1.67e-12 * 0.392

隨機區組設計資料的方差分析

使用例4-3的數據。

首先是構造數據，本次數據自己從書上摘錄。。

weight <- c(0.82,0.65,0.51,0.73,0.54,0.23,0.43,0.34,0.28,0.41,0.21,0.31,0.68,0.43,0.24) block <- c(rep(c("1","2","3","4","5"),each=3)) group <- c(rep(c("A","B","C"),5))data4_4 <- data.frame(weight,block,group)head(data4_4) ## weight block group ## 1 0.82 1 A ## 2 0.65 1 B ## 3 0.51 1 C ## 4 0.73 2 A ## 5 0.54 2 B ## 6 0.23 2 C

數據一共3列，第一列是小白鼠肉瘤重量，第二列是區組因素（5個區組），第三列是分組（一共3組）

進行隨機區組設計資料的方差分析：

data4_4 %>% anova_test(weight ~ group + block) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 group 2 8 11.937 0.004 * 0.749 ## 2 block 4 8 5.978 0.016 * 0.749

拉丁方設計方差分析

使用課本例4-5的數據。

首先是構造數據，本次數據自己從書上摘錄。。

psize <- c(87,75,81,75,84,66,73,81,87,85,64,79,73,73,74,78,73,77,77,68,69,74,76,73,64,64,72,76,70,81,75,77,82,61,82,61) drug <- c("C","B","E","D","A","F","B","A","D","C","F","E","F","E","B","A","D","C","A","F","C","B","E","D","D","C","F","E","B","A","E","D","A","F","C","B") col_block <- c(rep(1:6,6)) row_block <- c(rep(1:6,each=6)) mydata <- data.frame(psize,drug,col_block,row_block) mydata$col_block <- factor(mydata$col_block) mydata$row_block <- factor(mydata$row_block) str(mydata) ## 'data.frame': 36 obs. of 4 variables: ## $ psize : num 87 75 81 75 84 66 73 81 87 85 ... ## $ drug : chr "C" "B" "E" "D" ... ## $ col_block: Factor w/ 6 levels "1","2","3","4",..: 1 2 3 4 5 6 1 2 3 4 ... ## $ row_block: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...

數據一共4列，第一列是皮膚皰疹大小，第二列是不同藥物（處理因素，共5種），第三列是列區組因素，第四列是行區組因素。

進行拉丁方設計的方差分析：

mydata %>% anova_test(psize ~ drug + col_block + row_block) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 drug 5 20 3.906 0.012 * 0.494 ## 2 col_block 5 20 0.500 0.772 0.111 ## 3 row_block 5 20 1.466 0.245 0.268

兩階段交叉設計資料方差分析

使用課本例4-6的數據。

首先是構造數據，本次數據自己從書上摘錄。。

contain <- c(760,770,860,855,568,602,780,800,960,958,940,952,635,650,440,450,528,530,800,803) phase <- rep(c("phase_1","phase_2"),10) type <- c("A","B","B","A","A","B","A","B","B","A","B","A","A","B","B","A","A","B","B","A") testid <- rep(1:10,each=2) mydata <- data.frame(testid,phase,type,contain)str(mydata) ## 'data.frame': 20 obs. of 4 variables: ## $ testid : int 1 1 2 2 3 3 4 4 5 5 ... ## $ phase : chr "phase_1" "phase_2" "phase_1" "phase_2" ... ## $ type : chr "A" "B" "B" "A" ... ## $ contain: num 760 770 860 855 568 602 780 800 960 958 ...mydata$testid <- factor(mydata$testid)

數據一共4列，第一列是受試者id，第二列是不同階段，第三列是測定方法，第四列是測量值。

進行兩階段交叉設計資料方差分析：

mydata %>% anova_test(contain ~ phase + type + testid) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 phase 1 8 9.925 1.40e-02 * 0.554 ## 2 type 1 8 4.019 8.00e-02 0.334 ## 3 testid 9 8 1240.195 1.32e-11 * 0.999

結果和課本一致！

析因設計資料的方差分析

使用課本例11-1的數據，自己手動摘錄：

df11_1 <- data.frame(x1 = rep(c("外膜縫合","束膜縫合"), each = 10),x2 = rep(c("縫合1個月","縫合2個月"), each = 5),y = c(10,10,40,50,10,30,30,70,60,30,10,20,30,50,30,50,50,70,60,30) )str(df11_1) ## 'data.frame': 20 obs. of 3 variables: ## $ x1: chr "外膜縫合" "外膜縫合" "外膜縫合" "外膜縫合" ... ## $ x2: chr "縫合1個月" "縫合1個月" "縫合1個月" "縫合1個月" ... ## $ y : num 10 10 40 50 10 30 30 70 60 30 ...

數據一共3列，第1列是縫合方法，第2列是時間，第3列是軸突通過率。

進行析因設計資料的方差分析：

df11_1 %>% anova_test(y ~ x1 * x2) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 x1 1 16 0.600 0.450 0.036 ## 2 x2 1 16 8.067 0.012 * 0.335 ## 3 x1:x2 1 16 0.067 0.800 0.004

正交設計資料的方差分析

使用課本例11-4的數據

df11_4 <- data.frame(a = rep(c("5度","25度"),each = 4),b = rep(c(0.5, 5.0), each = 2),c = c(10, 30),d = c(6.0, 8.0,8.0,6.0,8.0,6.0,6.0,8.0),x = c(86,95,91,94,91,96,83,88) )df11_4$a <- factor(df11_4$a) df11_4$b <- factor(df11_4$b) df11_4$c <- factor(df11_4$c) df11_4$d <- factor(df11_4$d)str(df11_4) ## 'data.frame': 8 obs. of 5 variables: ## $ a: Factor w/ 2 levels "25度","5度": 2 2 2 2 1 1 1 1 ## $ b: Factor w/ 2 levels "0.5","5": 1 1 2 2 1 1 2 2 ## $ c: Factor w/ 2 levels "10","30": 1 2 1 2 1 2 1 2 ## $ d: Factor w/ 2 levels "6","8": 1 2 2 1 2 1 1 2 ## $ x: num 86 95 91 94 91 96 83 88

進行正交設計資料的方差分析：

df11_4 %>% anova_test(x ~ a + b + c + d + a * b) ## Coefficient covariances computed by hccm() ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 a 1 2 3.2 0.216 0.615 ## 2 b 1 2 7.2 0.115 0.783 ## 3 c 1 2 24.2 0.039 * 0.924 ## 4 d 1 2 1.8 0.312 0.474 ## 5 a:b 1 2 20.0 0.047 * 0.909

重復測量數據兩因素兩水平的方差分析

使用課本例12-1的數據，直接讀取：

df12_1 <- foreign::read.spss("E:/各科資料/醫學統計學/研究生課程/析因設計重復測量/9重復測量18-9研/12-1.sav", to.data.frame = T)str(df12_1) ## 'data.frame': 20 obs. of 5 variables: ## $ n : num 1 2 3 4 5 6 7 8 9 10 ... ## $ x1 : num 130 124 136 128 122 118 116 138 126 124 ... ## $ x2 : num 114 110 126 116 102 100 98 122 108 106 ... ## $ group: Factor w/ 2 levels "處理組","對照組": 1 1 1 1 1 1 1 1 1 1 ... ## $ d : num 16 14 10 12 20 18 18 16 18 18 ... ## - attr(*, "variable.labels")= Named chr [1:5] "編號" "治療前血壓" "治療后血壓" "組別" ... ## ..- attr(*, "names")= chr [1:5] "n" "x1" "x2" "group" ... ## - attr(*, "codepage")= int 936

數據一共5列（第5列是自己算出來的，其實原始數據只有4列），第1 列是編號，第2列是治療前血壓，第3例是治療后血壓，第4列是分組，第5列是血壓前后差值。

進行重復測量數據兩因素兩水平的方差分析前，先把數據轉換一下格式：

suppressMessages(library(tidyverse)) df12_11 <- df12_1[,1:4] %>% pivot_longer(cols = 2:3,names_to = "time",values_to = "hp") %>% mutate_if(is.character, as.factor)df12_11$n <- factor(df12_11$n)str(df12_11) ## tibble [40 x 4] (S3: tbl_df/tbl/data.frame) ## $ n : Factor w/ 20 levels "1","2","3","4",..: 1 1 2 2 3 3 4 4 5 5 ... ## $ group: Factor w/ 2 levels "處理組","對照組": 1 1 1 1 1 1 1 1 1 1 ... ## $ time : Factor w/ 2 levels "x1","x2": 1 2 1 2 1 2 1 2 1 2 ... ## $ hp : num [1:40] 130 114 124 110 136 126 128 116 122 102 ...

進行重復測量數據兩因素兩水平的方差分析:

hp是因變量，time是測量時間（治療前和治療后各測量一次），group是分組因素（兩種治療方法），n是受試者編號

df12_11 %>% anova_test(hp ~ time * group + Error(n/time)) ## ANOVA Table (type II tests) ## ## Effect DFn DFd F p p<.05 ges ## 1 group 1 18 1.574 2.26e-01 0.071 ## 2 time 1 18 55.008 7.08e-07 * 0.278 ## 3 group:time 1 18 18.771 4.01e-04 * 0.116

這個結果也是和課本一樣的，只不過沒有顯示組內誤差的情況！

重復測量數據兩因素多水平的分析

使用課本例12-3的數據，直接讀取：

df12_3 <- foreign::read.spss("E:/各科資料/醫學統計學/研究生課程/析因設計重復測量/9重復測量18-9研/例12-03.sav",to.data.frame = T)str(df12_3) ## 'data.frame': 15 obs. of 7 variables: ## $ No : num 1 2 3 4 5 6 7 8 9 10 ... ## $ group: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 2 2 2 2 2 ... ## $ t0 : num 120 118 119 121 127 121 122 128 117 118 ... ## $ t1 : num 108 109 112 112 121 120 121 129 115 114 ... ## $ t2 : num 112 115 119 119 127 118 119 126 111 116 ... ## $ t3 : num 120 126 124 126 133 131 129 135 123 123 ... ## $ t4 : num 117 123 118 120 126 137 133 142 131 133 ... ## - attr(*, "variable.labels")= Named chr [1:7] "序號" "組別" "" "" ... ## ..- attr(*, "names")= chr [1:7] "No" "group" "t0" "t1" ...

數據一共7列，第1列是患者編號，第2列是誘導方法（3種），第3-7列是5個時間點的血壓。

首先轉換數據格式：

library(tidyverse)df12_31 <- df12_3 %>% pivot_longer(cols = 3:7, names_to = "times", values_to = "hp")df12_31$No <- factor(df12_31$No) df12_31$times <- factor(df12_31$times)str(df12_31) ## tibble [75 x 4] (S3: tbl_df/tbl/data.frame) ## $ No : Factor w/ 15 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 2 ... ## $ group: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ... ## $ times: Factor w/ 5 levels "t0","t1","t2",..: 1 2 3 4 5 1 2 3 4 5 ... ## $ hp : num [1:75] 120 108 112 120 117 118 109 115 126 123 ...

進行重復測量兩因素多水平的方差分析：

df12_31 %>% anova_test(hp ~ times * group + Error(No/(times))) ## ANOVA Table (type II tests) ## ## $ANOVA ## Effect DFn DFd F p p<.05 ges ## 1 group 2 12 5.783 1.70e-02 * 0.430 ## 2 times 4 48 106.558 3.02e-23 * 0.659 ## 3 group:times 8 48 19.101 1.62e-12 * 0.409 ## ## $`Mauchly's Test for Sphericity` ## Effect W p p<.05 ## 1 times 0.293 0.178 ## 2 group:times 0.293 0.178 ## ## $`Sphericity Corrections` ## Effect GGe DF[GG] p[GG] p[GG]<.05 HFe DF[HF] p[HF] ## 1 times 0.679 2.71, 32.58 1.87e-16 * 0.896 3.59, 43.03 4.65e-21 ## 2 group:times 0.679 5.43, 32.58 4.26e-09 * 0.896 7.17, 43.03 2.04e-11 ## p[HF]<.05 ## 1 * ## 2 *

這個結果也是和課本一樣的，而且直接給出了球形檢驗的結果！非常方便。

相關分析

以R語言自帶的mtcars數據集為例。

mydata <- mtcars %>% select(mpg, disp, hp, drat, wt, qsec)head(mydata) ## mpg disp hp drat wt qsec ## Mazda RX4 21.0 160 110 3.90 2.620 16.46 ## Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 ## Datsun 710 22.8 108 93 3.85 2.320 18.61 ## Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 ## Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 ## Valiant 18.1 225 105 2.76 3.460 20.22

兩個變量

mydata %>% cor_test(wt, mpg, method = "pearson") ## # A tibble: 1 x 8 ## var1 var2 cor statistic p conf.low conf.high method ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 wt mpg -0.87 -9.56 1.29e-10 -0.934 -0.744 Pearson

一個變量和其他所有變量

mydata %>% cor_test(mpg, method = "pearson") ## # A tibble: 5 x 8 ## var1 var2 cor statistic p conf.low conf.high method ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 mpg disp -0.85 -8.75 9.38e-10 -0.923 -0.708 Pearson ## 2 mpg hp -0.78 -6.74 1.79e- 7 -0.885 -0.586 Pearson ## 3 mpg drat 0.68 5.10 1.78e- 5 0.436 0.832 Pearson ## 4 mpg wt -0.87 -9.56 1.29e-10 -0.934 -0.744 Pearson ## 5 mpg qsec 0.42 2.53 1.71e- 2 0.0820 0.670 Pearson

所有變量間兩兩相關性

mydata %>% cor_test(method = "pearson") ## # A tibble: 36 x 8 ## var1 var2 cor statistic p conf.low conf.high method ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> ## 1 mpg mpg 1 Inf 0 1 1 Pearson ## 2 mpg disp -0.85 -8.75 9.38e-10 -0.923 -0.708 Pearson ## 3 mpg hp -0.78 -6.74 1.79e- 7 -0.885 -0.586 Pearson ## 4 mpg drat 0.68 5.10 1.78e- 5 0.436 0.832 Pearson ## 5 mpg wt -0.87 -9.56 1.29e-10 -0.934 -0.744 Pearson ## 6 mpg qsec 0.42 2.53 1.71e- 2 0.0820 0.670 Pearson ## 7 disp mpg -0.85 -8.75 9.38e-10 -0.923 -0.708 Pearson ## 8 disp disp 1 Inf 0 1 1 Pearson ## 9 disp hp 0.79 7.08 7.14e- 8 0.611 0.893 Pearson ## 10 disp drat -0.71 -5.53 5.28e- 6 -0.849 -0.481 Pearson ## # ... with 26 more rows

可以看到所有的輸出都是整潔格式的，而且都給出了詳細的統計值。

總結

以上是生活随笔為你收集整理的R语言“优雅地“进行医学统计分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：直播视频app源码的靓号可以怎样实现？
下一篇：基于SSM社区养老院服务系统

编程问答

R语言“优雅地“进行医学统计分析

文章目錄

主要函數

描述性統計

比較均值

增強R中的ANOVA

事后檢驗（post-hoc）

比較比例

比較方差

計算效應量

相關性分析

計算相關性

重塑相關矩陣

相關矩陣取子集

可視化相關矩陣

添加P值和顯著性標記

提取統計信息

數據處理輔助函數

其他

安裝和加載

描述性統計

t檢驗

單樣本t檢驗

配對t檢驗

兩樣本t檢驗

分組后進行比較

多組間的兩兩比較

方差分析

完全隨機設計方差分析

隨機區組設計資料的方差分析

拉丁方設計方差分析

兩階段交叉設計資料方差分析

析因設計資料的方差分析

正交設計資料的方差分析

重復測量數據兩因素兩水平的方差分析

重復測量數據兩因素多水平的分析

相關分析

兩個變量

一個變量和其他所有變量

所有變量間兩兩相關性

相關矩陣

總結