# Install packages
if (!requireNamespace("umap", quietly = TRUE)) {
install.packages("umap")
}if (!requireNamespace("ggpubr", quietly = TRUE)) {
install.packages("ggpubr")
}
# Load packages
library(umap)
library(ggpubr)
UMAP
Hiplot website
This page is the tutorial for source code version of the Hiplot UMAP
plugin. You can also use the Hiplot website to achieve no code ploting. For more information please see the following link:
UMAP is a nonlinear dimensionality reduction algorithm suitable for high-dimensional data reduction to two or three dimensions and visualization. The algorithm can make the t distribution of points with greater similarity closer in the lower dimensional space. For low similarity points, the t distribution is farther away in the low dimensional space.
Setup
System Requirements: Cross-platform (Linux/MacOS/Windows)
Programming language: R
Dependent packages:
umap
;ggpubr
Data Preparation
The loaded data are the data set (gene name and corresponding gene expression value) and sample information (sample name and grouping).
# Load data
<- read.delim("files/Hiplot/176-umap-data1.txt", header = T)
data1 <- read.delim("files/Hiplot/176-umap-data2.txt", header = T)
data2
# convert data structure
<- data2
sample.info rownames(data1) <- data1[, 1]
<- as.matrix(data1[, -1])
data1 ## umap
set.seed(123)
<- umap(t(data1))
umap_info colnames(umap_info$layout) <- c("UMAP_1", "UMAP_2")
# handle data
<- data.frame(
umap_data sample = colnames(data1),
$layout
umap_info
)<- sample.info[match(colnames(data1), sample.info[, 1]), "Species"]
colorBy <- factor(colorBy, level = colorBy[!duplicated(colorBy)])
colorBy $colorBy = colorBy
umap_data<- NULL
shapeBy
# View data
head(data1[,1:5])
M1 M2 M3 M4 M5
Sepal.Length 5.1 4.9 4.7 4.6 5.0
Sepal.Width 3.5 3.0 3.2 3.1 3.6
Petal.Length 1.4 1.4 1.3 1.5 1.4
Petal.Width 0.2 0.2 0.2 0.2 0.2
head(data2)
Samples Species
1 M1 setosa
2 M2 setosa
3 M3 setosa
4 M4 setosa
5 M5 setosa
6 M6 setosa
Visualization
# umap
<- ggscatter(data = umap_data, x = "UMAP_1", y = "UMAP_2", size = 2,
p palette = "lancet", color = "colorBy") +
labs(color = "group") +
ggtitle("UMAP Plot") +
theme_classic() +
theme(text = element_text(family = "Arial"),
plot.title = element_text(size = 12,hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10),
axis.text.x = element_text(angle = 0, hjust = 0.5,vjust = 1),
legend.position = "right",
legend.direction = "vertical",
legend.title = element_text(size = 10),
legend.text = element_text(size = 10))
p

Different colors represent different samples, which is the same as PCA (principal component analysis) graphic interpretation. The difference lies in the visualization effect. For dissimilar points in T-SNE, a small distance will generate a large gradient to repel them.