# Install packages
if (!requireNamespace("ggpubr", quietly = TRUE)) {
install.packages("ggpubr")
}if (!requireNamespace("ggthemes", quietly = TRUE)) {
install.packages("ggthemes")
}
# Load packages
library(ggpubr)
library(ggthemes)
Boxplot
The box plot is a method of visualizing the distribution characteristics of a set of data by means of a quartile graph.
Setup
System Requirements: Cross-platform (Linux/MacOS/Windows)
Programming language: R
Dependent packages:
ggpubr
;ggthemes
Data Preparation
The loaded data is data set (data on treatment outcomes of different treatment regimens).
# Load data
<- read.table("files/Hiplot/015-boxplot-data.txt", header = T)
data
# convert data structure
<- unique(data[, 2])
groups <- combn(groups, 2, simplify = FALSE)
my_comparisons <- lapply(my_comparisons, as.character)
my_comparisons
# View data
head(data)
Value Group1 Group2
1 4.2 low treat1
2 11.5 low treat1
3 7.3 low treat1
4 5.8 low treat1
5 6.4 low treat1
6 10.0 low treat1
Visualization
# Boxplot
<- ggboxplot(data, x = "Group1", y = "Value", notch = F, facet.by = "Group2",
p add = "point", color = "Group1", xlab = "Group2", ylab = "Value",
palette = c("#e04d39","#5bbad6","#1e9f86"),
title = "Box Plot") +
stat_compare_means(comparisons = my_comparisons, label = "p.format",
method = "t.test") +
scale_y_continuous(expand = expansion(mult = c(0.1, 0.1))) +
theme_stata() +
theme(text = element_text(family = "Arial"),
plot.title = element_text(size = 12,hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 1),
legend.position = "right",
legend.direction = "vertical",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10))
p

The abscissa represents several different sets of data, and the ordinate represents the quartile of each set of data respectively. The upper, middle and lower horizontal lines of the box represent the upper, median and lower quartile respectively; The values represented by the upper and lower line segments respectively exponential the maximum and minimum values of the data, and the points outside the box represent outliers. The above figure indicates the P value between two variables. It can be considered that in treatment plan 1, there is a significant difference in efficacy between the middle-dose group and the low-dose group, and so on.