# Install packages
if (!requireNamespace("FactoMineR", quietly = TRUE)) {
install.packages("FactoMineR")
}if (!requireNamespace("factoextra", quietly = TRUE)) {
install.packages("factoextra")
}
# Load packages
library(FactoMineR)
library(factoextra)
PCA2
Hiplot website
This page is the tutorial for source code version of the Hiplot PCA2
plugin. You can also use the Hiplot website to achieve no code ploting. For more information please see the following link:
Principal component analysis (PCA) is a data processing method with βdimension reductionβ as the core, replacing multi-index data with a few comprehensive indicators (PCA), and restoring the most essential characteristics of data.
Setup
System Requirements: Cross-platform (Linux/MacOS/Windows)
Programming language: R
Dependent packages:
FactoMineR
;factoextra
Data Preparation
The loaded data are set (gene name and corresponding gene expression value) and sample information (sample name and grouping).
# Load data
<- read.delim("files/Hiplot/135-pca2-data1.txt", header = T)
data <- read.delim("files/Hiplot/135-pca2-data2.txt", header = T)
sample_info
# Convert data structure
row.names(sample_info) <- sample_info[,1]
<- sample_info[colnames(data)[-1],]
sample_info ## tsne
rownames(data) <- data[, 1]
<- as.matrix(data[, -1])
data <- PCA(t(as.matrix(data)), scale.unit = TRUE, ncp = 5, graph = FALSE)
pca_data
# View data
head(data)
M1 M2 M3 M4 M5 M6 M7
GBP4 6.599344 5.226266 3.693288 3.938501 4.527193 9.308119 8.987865
BCAT1 5.760380 4.892783 5.448924 3.485413 3.855669 8.662081 8.793320
CMPK2 9.561905 4.549168 3.998655 5.614384 3.904793 9.790770 7.133188
STOX2 8.396409 8.717055 8.039064 7.643060 9.274649 4.417013 4.725270
PADI2 8.419766 8.268430 8.451181 9.200732 8.598207 4.590033 5.368268
SCARNA5 7.653074 5.780393 10.633550 5.913684 8.805605 5.890120 5.527945
M8 M9 M10
GBP4 7.658312 8.666038 7.419708
BCAT1 8.765915 8.097206 8.262942
CMPK2 7.379591 7.938063 6.154118
STOX2 3.542217 4.305187 6.964710
PADI2 4.136667 4.910986 4.080363
SCARNA5 3.822596 4.041078 7.956589
Visualization
# PCA2
<- fviz_pca_ind(pca_data, geom.ind = "point", pointsize = 6, addEllipses = TRUE,
p mean.point = F, col.ind = sample_info[,"Group"]) +
ggtitle("Principal Component Analysis") +
scale_fill_manual(values = c("#00468BFF","#ED0000FF")) +
scale_color_manual(values = c("#00468BFF","#ED0000FF")) +
theme_bw() +
theme(text = element_text(family = "Arial"),
plot.title = element_text(size = 12,hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 1),
legend.position = "right",
legend.direction = "vertical",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10))
p

Different colors represent different samples, which can explain the relationship between principal components and original variables. For example, M1 has a greater contribution to PC1, while M8 has a greater negative correlation with PC1.