Oversample the reference data and perform GSEA, topic modelling. In order to avoid class imbalance while training a classifier, we oversampled the training set. In other words, we randomly selected cells from clusters of a particular cell type to form new clusters, reaching the situation that each cell type owns the same number of clusters.

oversample_ref(
  reference_SeuratObj,
  number_clusters = NULL,
  group_by = "cellType",
  cluster_by = "seurat_clusters",
  species = "Homo sapiens",
  by = "GO",
  k = NULL,
  method = "VEM"
)

Arguments

reference_SeuratObj

reference data

number_clusters

goal number of clusters that you need to oversample to

group_by

the column of reference_SeuratObj@meta.data that indicates cell type

cluster_by

the column of reference_SeuratObj@meta.data that indicates cluster

species

species of the reference data

by

database used to perform GSEA. GO KEGG Reactome MSigDb WikiPathways DO NCG DGN.

k

number of topics.

method

method used for fitting a LDA model; currently "VEM" or "Gibbs" are supported.

Value

a Seurat object with oversampled expression matrix and topic-model result.

Examples

if (FALSE) { reference_SeuratObj <- oversample_ref(reference_SeuratObj, number_clusters = 10, group_by = 'cellType', cluster_by = 'seurat_clusters', species = "Homo sapiens", by = 'GO', k = NULL, method = "VEM") }