PCA2

Authors

[Editor] Hu Zheng;

[Contributors]

Note

Hiplot website

This page is the tutorial for source code version of the Hiplot PCA2 plugin. You can also use the Hiplot website to achieve no code ploting. For more information please see the following link:

https://hiplot.cn/basic/pca2?lang=en

Principal component analysis (PCA) is a data processing method with “dimension reduction” as the core, replacing multi-index data with a few comprehensive indicators (PCA), and restoring the most essential characteristics of data.

Setup

System Requirements: Cross-platform (Linux/MacOS/Windows)
Programming language: R
Dependent packages: FactoMineR; factoextra

# Install packages
if (!requireNamespace("FactoMineR", quietly = TRUE)) {
  install.packages("FactoMineR")
}
if (!requireNamespace("factoextra", quietly = TRUE)) {
  install.packages("factoextra")
}

# Load packages
library(FactoMineR)
library(factoextra)

Data Preparation

The loaded data are set (gene name and corresponding gene expression value) and sample information (sample name and grouping).

# Load data
data <- read.delim("files/Hiplot/135-pca2-data1.txt", header = T)
sample_info <- read.delim("files/Hiplot/135-pca2-data2.txt", header = T)

# Convert data structure
row.names(sample_info) <- sample_info[,1]
sample_info <- sample_info[colnames(data)[-1],]
## tsne
rownames(data) <- data[, 1]
data <- as.matrix(data[, -1])
pca_data <- PCA(t(as.matrix(data)), scale.unit = TRUE, ncp = 5, graph = FALSE)

# View data
head(data)

              M1       M2        M3       M4       M5       M6       M7
GBP4    6.599344 5.226266  3.693288 3.938501 4.527193 9.308119 8.987865
BCAT1   5.760380 4.892783  5.448924 3.485413 3.855669 8.662081 8.793320
CMPK2   9.561905 4.549168  3.998655 5.614384 3.904793 9.790770 7.133188
STOX2   8.396409 8.717055  8.039064 7.643060 9.274649 4.417013 4.725270
PADI2   8.419766 8.268430  8.451181 9.200732 8.598207 4.590033 5.368268
SCARNA5 7.653074 5.780393 10.633550 5.913684 8.805605 5.890120 5.527945
              M8       M9      M10
GBP4    7.658312 8.666038 7.419708
BCAT1   8.765915 8.097206 8.262942
CMPK2   7.379591 7.938063 6.154118
STOX2   3.542217 4.305187 6.964710
PADI2   4.136667 4.910986 4.080363
SCARNA5 3.822596 4.041078 7.956589

Visualization

# PCA2
p <- fviz_pca_ind(pca_data, geom.ind = "point", pointsize = 6, addEllipses = TRUE,
                  mean.point = F, col.ind = sample_info[,"Group"]) +
  ggtitle("Principal Component Analysis") +
  scale_fill_manual(values = c("#00468BFF","#ED0000FF")) +
  scale_color_manual(values = c("#00468BFF","#ED0000FF")) +
  theme_bw() +
  theme(text = element_text(family = "Arial"),
        plot.title = element_text(size = 12,hjust = 0.5),
        axis.title = element_text(size = 12),
        axis.text = element_text(size = 10),
        axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 1),
        legend.position = "right",
        legend.direction = "vertical",
        legend.title = element_text(size = 10),
        legend.text = element_text(size = 10))

p

Different colors represent different samples, which can explain the relationship between principal components and original variables. For example, M1 has a greater contribution to PC1, while M8 has a greater negative correlation with PC1.